Jump to content

Extension talk:External Data

Add topic
From mediawiki.org
Latest comment: 1 day ago by JeremiPlazas in topic Issue with caching of data


Iterating over one variable while keeping another one in the same state

[edit]

I retrieve data from JSON files for a certain Host. Among that data is a list (1 item or more) mac addresses and the host name (1 item for sure).

I then try to iterate over the list of mac addresses to call a template that uses the mac address to look up the ip addresses on that interface.

However i need to supply the host name to the template too, so it knows what json file to look up the values in. I can’t get this to work properly.

This is an annotated example of the calls I’ve tried:


 {{#for_external_table:|
  ### this produces and empty hostname, if macaddresses has more than one item
  {{Interface|hostname={{{hostname}}}|hwaddress={{{macaddresses}}}}}

  ### this also produces an empty hostname, if macadresses has more than one item but will also fail if macaddresses has only one item, because external value will be evaluated to an empty string on the second iteration (bug?)
  {{Interface|hostname={{#external_value:hostname}}|hwaddress={{{macaddresses}}}}}

  ### this also doesn’t work, because the external value will be empty the same moment that name is empty, because they’re both iterated over
  {{Interface|hostname={{{name|{{#external_value:name}}}}}|hwaddress={{{macaddresses|}}}}}
 }}
 

Is there a way to achieve this or will this have to be implemented first?

I could imagine something like {{{hostname.keep}}} to mark those variables that don’t get iterated or just returning the same value over and over if the iteration variable isn’t an iterable

That's a tricky one. You may have to use the Variables extension, to set some variable to the value of #external_value, and then use that variable within #for_external_table. Although maybe the best solution is to move all this code into a Lua module, using Scribunto - it will give you much more flexibility. Yaron Koren (talk) 12:59, 1 May 2024 (UTC)Reply
How to deal with the data columns of different height could be an interesting topic for discussion some time ago. But now, the obvious solution for this and some other issues is Lua. Variables should not now be used at all, for the parsing order is now uncertain.
Alexander Mashin talk 08:17, 13 May 2024 (UTC)Reply

Using multiple JOIN ON parameters in ED

[edit]

Hi, I am having trouble getting a query to work

I have translated my problem to the public RFAM-Database ( docs.rfam. org/en/latest/databa se.h tml) so you can reproduce the problem.

This is the query that i'm aiming for:


SELECT family.description,clan.description FROM clan
JOIN clan_membership on clan.clan_acc = clan_membership.clan_acc
JOIN family ON family.rfam_acc = clan_membership.rfam_acc;

It gives me 544 rows ( i copied the Rfam-DB a while ago, so there might be more rows now)

This is the get_db_data line that i have on a page:


 {{#get_db_data: db = Rfam
  |join on=clan_membership.clan_acc = clan.clan_acc
  |join on=family.rfam_acc = clan_membership.rfam_acc
  |from=clan
  |data=clan_description=clan.description,family_description=family.description
 }}
 

i have also tried this as data line: |data=clan_description=clan.description AS cdesc,family_description=family.description AS fdesc the AS makes it into the query, i checked. but it doesnt’ help

I then call this to list the results:


 {{#for_external_table:|
 ;{{{clan_description}}}
  : {{{family_description}}}
 }}
 

The list is empty.

The SQL that is being produced by ED looks like this:

SELECT clan.description,family.description FROM `clan`,family`
JOIN `clan_membership` ON ((family.rfam_acc = clan_membership.rfam_acc))

So this query only has one JOIN and the "((" "))" which i have never seen before (i'm not a sql pro though, so that might not mean anything) This query selects clan and family which, again, i'm not sure if it’s right to explicitly state that in a join statement. The query returns two columns called "description" unless i add an alias for the columns in the |data= param, but that doesn’t make it work. running this query on the DB directly gives 66430 rows ( with a lot of duplicates) but limiting to a few rows with LIMIT=25 doesn’t make it work either.

So, how do i make this work?

Edit: This is the current Code i have, after discussing with Yaron Koren


 {{#get_db_data: db = Rfam
  |join on=clan.clan_acc=clan_membership.clan_acc,join on=family.rfam_acc = clan_membership.rfam_acc
  |from=clan
  |data=clan_description=clan.description,family_description=family.description <!-- empty result -->
 }}
 
I think it would just need to be join on=clan_membership.clan_acc = clan.clan_acc, family.rfam_acc = clan_membership.rfam_acc. Yaron Koren (talk) 19:10, 29 April 2024 (UTC)Reply
Alright, that gives me the correct query in the logs, but no results appear in for_external_table 24.134.95.253 20:50, 29 April 2024 (UTC)Reply
some values appear, when i remove the table names in the data:-line, but then i will get duplicates, because there is a clan_acc row in each table 24.134.95.253 21:00, 29 April 2024 (UTC)Reply
If you run that correct query directly in the database, does it return results? Yaron Koren (talk) 01:44, 30 April 2024 (UTC)Reply
yes, running the query in the DB directly returns two columns called "description" and the corresponding values, so the joins do work, but now i cannot get the data into mediawiki because of the identical column names. 77.22.6.114 09:14, 30 April 2024 (UTC)Reply
Can't you do "clan_description=clan.description, family_description=family.description"? Yaron Koren (talk) 13:42, 30 April 2024 (UTC)Reply
i mean yes, that’s what i had first, but it doesn’t show any data. 77.22.6.114 19:39, 30 April 2024 (UTC)Reply
Okay, good news! We just checked in a fix, here, so that now the "AS" keyword is handled correctly, which is what was needed here. So now if you do "|data=clan_description=clan.description AS clandesc, family_description=family.description AS familydesc", it should (hopefully) work. Note that the "AS" aliases don't matter, as long as they exist and they're different from one another. Yaron Koren (talk) 16:20, 3 May 2024 (UTC)Reply

Deprecated : strtolower(): Passing null to parameter #1 ($string) of type string is deprecated

[edit]
Setup
  • MediaWiki 1.39.6 (0e03068) 2024-03-11T16:26:30
  • PHP 8.1.2-1ubuntu2.14 (apache2handler)
  • MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
  • External Data 3.4-alpha (20a6b7f) 2024-03-15T09:43:16
Issue

Deprecated : strtolower(): Passing null to parameter #1 ($string) of type string is deprecated in /../w/extensions/ExternalData/includes/EDParsesParams.php on line 117

-- [[kgh]] (talk) 17:49, 15 March 2024 (UTC)Reply

I need the wikicode of the parser function you call and the relevant $wgExternalDataSource[] (with sensitive information censored out). Also, when did the warning appear: after upgrading MediaWiki, External Data, or adding a new data source or parser function call?
Alexander Mashin talk 03:42, 16 March 2024 (UTC)Reply
The extensions $wgExternalDataSource configuration parameter is at its default. On pages with this issue we are using the extensions with calls like this one:
{{#display_external_table:
   source=https://example.org/w/images/9/9c/Export_slice_123.csv
  |format=CSV with header
  |header lines=1
  |start line=3
  |end line=10
  |data= mynr=recordnumber, priref=orig_performance_ref, pc=productcode
  |template=Test Template
}}
|}
I cannot tell it this was an issue before the upgrade since I did not enable logging for the wiki before.
Backtrace
[4b34db6f537fa5e035bcd426] /wiki/Test_Page   PHP Deprecated: strtolower(): Passing null to parameter #1 ($string) of type string is deprecated
#0 [internal function]: MWExceptionHandler::handleError()
#1 /../w/extensions/ExternalData/includes/EDParsesParams.php(117): strtolower()
#2 /../w/extensions/ExternalData/includes/EDParsesParams.php(55): EDConnectorBase::paramsFit()
#3 /../w/extensions/ExternalData/includes/connectors/EDConnectorBase.php(234): EDConnectorBase::getMatch()
#4 /../w/extensions/ExternalData/includes/connectors/EDConnectorBase.php(248): EDConnectorBase::getConnectorClass()
#5 /../w/extensions/ExternalData/includes/EDParserFunctions.php(88): EDConnectorBase::getConnector()
#6 /../w/extensions/ExternalData/includes/EDParserFunctions.php(115): EDParserFunctions::get()
#7 /../w/extensions/ExternalData/includes/EDParserFunctions.php(204): EDParserFunctions::fetch()
#8 /../w/extensions/ExternalData/includes/EDParserFunctions.php(428): EDParserFunctions::emulateGetExternalData()
#9 /../w/extensions/ExternalData/includes/EDParserFunctions.php(487): EDParserFunctions::actuallyDisplayExternalTable()
#10 /../w/includes/parser/Parser.php(3439): EDParserFunctions::doDisplayExternalTable()
#11 /../w/includes/parser/Parser.php(3124): Parser->callParserFunction()
#12 /../w/includes/parser/PPFrame_Hash.php(275): Parser->braceSubstitution()
#13 /../w/includes/parser/Parser.php(2953): PPFrame_Hash->expand()
#14 /../w/includes/parser/Parser.php(1609): Parser->replaceVariables()
#15 /../w/includes/parser/Parser.php(723): Parser->internalParse()
#16 /../w/includes/content/WikitextContentHandler.php(301): Parser->parse()
#17 /../w/includes/content/ContentHandler.php(1721): WikitextContentHandler->fillParserOutput()
#18 /../w/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput()
#19 /../w/includes/Revision/RenderedRevision.php(266): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput()
#20 /../w/includes/Revision/RenderedRevision.php(237): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached()
#21 /../w/includes/Revision/RevisionRenderer.php(221): MediaWiki\Revision\RenderedRevision->getSlotParserOutput()
#22 /../w/includes/Revision/RevisionRenderer.php(158): MediaWiki\Revision\RevisionRenderer->combineSlotOutput()
#23 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}()
#24 /../w/includes/Revision/RenderedRevision.php(199): call_user_func()
#25 /../w/includes/poolcounter/PoolWorkArticleView.php(91): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#26 /../w/includes/poolcounter/PoolWorkArticleViewCurrent.php(97): PoolWorkArticleView->renderRevision()
#27 /../w/includes/poolcounter/PoolCounterWork.php(162): PoolWorkArticleViewCurrent->doWork()
#28 /../w/includes/page/ParserOutputAccess.php(299): PoolCounterWork->execute()
#29 /../w/includes/page/Article.php(714): MediaWiki\Page\ParserOutputAccess->getParserOutput()
#30 /../w/includes/page/Article.php(528): Article->generateContentOutput()
#31 /../w/includes/actions/ViewAction.php(78): Article->view()
#32 /../w/includes/MediaWiki.php(542): ViewAction->show()
#33 /../w/includes/MediaWiki.php(322): MediaWiki->performAction()
#34 /../w/includes/MediaWiki.php(904): MediaWiki->performRequest()
#35 /../w/includes/MediaWiki.php(562): MediaWiki->main()
#36 /../w/index.php(50): MediaWiki->run()
#37 /../w/index.php(46): wfIndexMain()
#38 {main}
[[kgh]] (talk) 17:08, 18 March 2024 (UTC)Reply

Warning : Cannot modify header information - headers already sent by

[edit]
Setup
  • MediaWiki 1.39.6 (0e03068) 2024-03-11T16:26:30
  • PHP 8.1.2-1ubuntu2.14 (apache2handler)
  • MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
  • External Data 3.4-alpha (20a6b7f) 2024-03-15T09:43:16
Issue

Warning : Cannot modify header information - headers already sent by (output started at /var/www/html/w/extensions/ExternalData/includes/EDParsesParams.php:117) in /../w/includes/WebResponse.php on line 75

-- [[kgh]] (talk) 17:51, 15 March 2024 (UTC)Reply

I assume this error message is just due to the one you reported above. Yaron Koren (talk) 18:08, 15 March 2024 (UTC)Reply
I cannot tell. The wiki is using Tweeki and the respective pages look really messy. [[kgh]] (talk) 18:11, 15 March 2024 (UTC)Reply
It could be connected though the issue does not appear everytime I get the above reported deprecation issue. Here is a backtrace:
#0 [internal function]: MWExceptionHandler::handleError()
#1 /../w/includes/WebResponse.php(75): header()
#2 /../w/includes/OutputPage.php(2732): WebResponse->header()
#3 /../w/includes/OutputPage.php(2891): OutputPage->sendCacheControl()
#4 /../w/includes/MediaWiki.php(922): OutputPage->output()
#5 /../w/includes/MediaWiki.php(562): MediaWiki->main()
#6 /../w/index.php(50): MediaWiki->run()
#7 /../w/index.php(46): wfIndexMain()
#8 {main}
Cheers --[[kgh]] (talk) 17:17, 18 March 2024 (UTC)Reply

Accessing identically named subkeys

[edit]

Hi! I'm trying to use External Data to get data from an API that returns(part of) it's data as {"midweekMeetingTime":{"weekday":2,"time":"18:30:00"},"weekendMeetingTime":{"weekday":7,"time":"12:00:00"}}, where the subkey names are the same in different arrays. Is there a way to specify which value to set a variable to (ie weekend_day=weekendMeetingTime.time,midweek_day=midweekMeetingTime.time without using a Lua module, as while I believe it would work in Lua, I'd prefer to keep it in wikitext, as most other parameters work and making a module for just this seems like a waste. Thanks all in advanced! PixDeVl (talk) 20:37, 1 May 2024 (UTC)Reply

It looks like the "use jsonpath" parameter would be helpful for this case; see here. Yaron Koren (talk) 21:03, 1 May 2024 (UTC)Reply
Oh, thanks! Would you happen to know any existing instances of the extension using this to reference? There seems to be some weirdness going on with the parser showing the expression working as intended(ie $..midweekMeetingTime.time, the double dot is there since the API returns the array in a list, [{...}]) and the extension giving an undefined error, so I want to refer it to the syntax used by others, although I'll be the first to admit I may be simply be misunderstanding or writing something. Thanks again for the tip and making such a useful extension! PixDeVl (talk) 22:18, 1 May 2024 (UTC)Reply
Sure - all three example queries here use JSONPath. I hope this helps uncover the problem... Yaron Koren (talk) 13:06, 2 May 2024 (UTC)Reply

#store_external_table rendering raw

[edit]
Setup
  • MediaWiki 1.39.7 (2ba7e95) 2024-05-13T11:45:16
  • PHP 8.1.2-1ubuntu2.17 (apache2handler)
  • MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
  • External Data 3.4-alpha (c23dc0d) 2024-05-18T10:11:41
Issue

The #store_external_table parser function is rendering raw, i.e., instead of not visibly being shown on a page the user sees {{#store_external_table:Is fruit in |Has name={{{name}}} |Has color={{{color}}} |Has shape={{{shape}}} }}. Despite this the parser function still does it's job, i.e., stores the subobjects holding the annotations. [[kgh]] (talk) 18:25, 5 June 2024 (UTC)Reply

Yes, the #store_external_table parser function was removed from External Data - I need to release version 3.4, to make it an official removal. I don't think it's storing data - my guess is that that SMW data you're seeing was already there beforehand. Yaron Koren (talk) 19:04, 5 June 2024 (UTC)Reply
Thanks for the info. How is information stored for SMW in new releases, i.e., what replaced this parser function? Is #get_web_data doing this now or do I need to create a template that maps the data to the properties? [[kgh]] (talk) 19:09, 5 June 2024 (UTC)Reply
You now need to do it via a template, yes. Yaron Koren (talk) 19:18, 5 June 2024 (UTC)Reply
Ah, ok. Thank you for confirming. It looks like one needs to use #display_external_table for this. [[kgh]] (talk) 19:20, 5 June 2024 (UTC)Reply

Suggestion: add cookies to getWebData in Lua

[edit]

Hi. I have played around with getWebData, especially to see the potential of calling my private wiki's API. I already use getDbData, but I think both can be complimentary.

The problem is that anonymous users cannot read my private wiki (they have to create an account) and getWebData only allows for anonymous connections. I have tried to login to my wiki using getWebData the same way my custom bot does, but it doesn't work. I get an error "Unable to continue login. Your session most likely timed out." and I reckon this is because the session ID is passed as a cookie which is ignored here.

Furthermore, once you are logged in, you have a cookie to avoid logging in every time. That could also work with other sites.

So my suggestion/request is a way to handle cookies with getWebData in Lua (I don't think it would be useful in the template version). I propose the following:

  • The Lua getWebData would return three values instead of two: result, error and cookies (so that it doesn't break existing code which take only result and error)
  • The Lua getWebData would accept a new field in its table argument named "cookies" as a table of cookies which would be passed along with the HTTP request.

That way, we could get cookies from an HTTP request and send them to another request, if necessary, without breaking any existing code.

Thanks. Steff-X (talk) 13:21, 16 June 2024 (UTC)Reply

  • You could try to declare a ExternalDataBeforeWebCall hook to get somehow your cookie and put it into $options['headers']['Cookie'] .= ";your_cookie=$cookie";.
    Note to self: add or remember a less global way to add callbacks to External Data sources.
    Alexander Mashin talk 14:18, 16 June 2024 (UTC)Reply
    Thanks Alex, that worked.
    For other people interested, here's how to do it:
    • In Firefox or Chrome, connect to the wiki you're interested in
    • Display the cookies. The method depends on your browser.
    • Copy the name and value of the cookies ending with UserID, UserName and Token
    • In LocalSettings.php, add the following code, substituting the cookies' name and value by yours (don't miss the .= (dot-equal) sign and the final semi-colon):
    *:$wgHooks['ExternalDataBeforeWebCall'][] =
    *:function ( string $method, string &$url, array &$options, array &$errors ): bool {
    *:	if ( $url === 'https://your.wiki/api.php' ) ) {
    *:        $options['headers']['Cookie'] .= ";mediawikiUserID=5;mediawikiUserName=myUserName;mediawikiToken=(censored)";
    *:		return true;
    *:	}
    *:};
    
    Unfortunately, anything more complicated than that is beyond my knowledge in PHP. That is why I still think that an easy way to handle cookies would be beneficial.
    Thanks again for the tip. Steff-X (talk) 12:11, 18 June 2024 (UTC)Reply
    • If you get your cookie outside of MediaWiki, your method is overcomplicated. Just set $wgExternalDataSources['https://your.wiki/api.php']['options']['headers']['Cookie'] = 'mediawikiUserID=5;mediawikiUserName=myUserName;mediawikiToken=(censored)';
      Alexander Mashin talk 12:56, 18 June 2024 (UTC)Reply
      Well I can't get it to work.
      I always get error: Error sending API request: SyntaxError: JSON Parse error: Unrecognized token '<'
      And I don't know where it comes from because the error shows up even if I disable "format" = "json" in the request and everything related to json in my code. Steff-X (talk) 14:32, 18 June 2024 (UTC)Reply

This extension documentation really lacks some examples in Lua

[edit]

Hi everyone. This extension is great in its capabilities but I think it really lacks some code examples in Lua/Scribunto.

The usage notice says that "there is one-to-one correspondence between parser functions retrieving data and Lua functions evident from their names". OK cool, but the syntax is entirely different between the two, especially when you just start using Scribunto/Lua because your wiki template keeps failing!

I think the documentation should have some advanced examples in Lua/Scribunto (to show the power of Lua vs Templates) and I'm ready to participate. Steff-X (talk) 13:47, 16 June 2024 (UTC)Reply

  • A couple of examples, perhaps, too advanced:
    • Module:Chrono used to show this. Funnily, it drills into Cargo tables;
    • Module:Tzdata, which accesses timedatectl and is used to parse data / time strings;
    • Module:WikiList that parses a JSON from GitHub;
    • My own example from Phabricator;
    • Module:External_data/new -- this module assembles external data split over several web pages with links to each oher, e.g. a long list, from only one URL, linking the first page (or any page, from which there is a path to other pages).
Alexander Mashin talk 14:34, 16 June 2024 (UTC)Reply

Issue with caching of data

[edit]

Hi there,

Great extension, we use it a lot. We have somewhat recently adopted LUA to fetch data from another private wiki onto a public wiki, via a whitelisted Special:Ask page and the use of the mw.ext.externalData.getExternalData function. It's been working well until recently. We realized that any changes on the private wiki were taking quite a lot of time to refresh on the public wiki (many hours, to a few days). We've tried everything we could think of to reduce any kind of caching to 0, but no luck. And it seems to be compounded. ExtData has some weird invisible level of caching and then LUA does too, it seems. We tried:

  • wgExternalDataSources['*']['min cache seconds'] = 0
  • We've tried the standalone, non-lua function: #external_value with cache seconds=0
  • In LUA we first had mw.LoadData() with a big call to mw.ext.externalData.getExternalData to bring the data once to the page and then parse into a large template.
  • Then in LUA we tried individual little calls to mw.ext.externalData.getExternalData without using mw.LoadData().
  • We've turned off all kinds of cache everywhere we could think of, including client side, etc.

No luck! Any ideas?

Jeremi Plazas (talk) 16:03, 22 July 2024 (UTC)Reply

    1. Setting cache period to zero would be a bad idea, because it will make both wikis vulnerable to DOS attacks. Set, at least, to one minute, or more if it takes Special:Ask more time to run;
    2. Whether or not you use Lua or mw.LoadData() is not relevant.
    3. I suggest that you set wgExternalDataSources['(Full URL of Special:Ask with query; or that wiki's hostname, e.g., example.org)']['min cache seconds'] = 60;. That should be enough.
    4. The only additional caching level that External Data introduces is the table ed_url_cache. It can contain cache entries with expitation time in the future, set when the cache expiration period was longer. You may want to have them removed.
Alexander Mashin talk 02:01, 23 July 2024 (UTC)Reply
Thanks for the reply! We tried your suggestions but our problem persists. We set wgExternalDataSources as per your example and also truncated the ed_url_cache table. Still we have pages with data that is stale of a couple of days. We can't quite figure out at what level this is happening. DB? Jeremi Plazas (talk) 09:39, 23 July 2024 (UTC)Reply
Our current config is:
wfLoadExtension( 'ExternalData' );
$wgExternalDataSources['https://research.tsadra.org/index.php']['min cache seconds'] = 5;
$wgExternalDataSources['*']['always allow stale cache'] = false;
But even with 5 seconds, the entry in ed_url_cache does not get updated.
Jeremi Plazas (talk) 10:29, 23 July 2024 (UTC)Reply
This may be an issue of MediaWiki cache. To make sure, open https://your-wiki.org/wiki/Page_with_ED?action=purge.
Alexander Mashin talk 11:50, 23 July 2024 (UTC)Reply
Also, the key to $wgExternalDataSources should be either the full precise url of the data source, including what goes after ?, or the domain name (e.g., research.tsadra.org) or the second-level domain (tsadra.org) or asterisk.
Alexander Mashin talk 11:53, 23 July 2024 (UTC)Reply
Thanks, yes we tried purging the page many times. Also we tried all the different URL types, full, just domain, nothing does it. It looks to be a deeper lever of caching going on possibly the ed_url_cache table not updating...
So we have a little test page that we whitelisted on this wiki site we're working on to illustrate:
https://bca.tsadra.org/index.php/Test_external_data
If it helps.
Jeremi Plazas (talk) 14:18, 23 July 2024 (UTC)Reply