Extension talk:External Data

accessing the internal variable (__json)
I need some help. Background: I'm working on an extension that uses a webcall to retrieve a JSON object and converts it into concise html code. This extension now is working properly, but lacks caching. This JSON object is complex: containing multiple nested arrays with varying number of elements and it took me a while to program it in php.

My current line of thought is to transfer the entire json object (that is stored in the internal variable __json) to my extension.

Question 1) how does one access the internal variables?? The code below doesn't work.

Question 2) do you have any comments / suggestions on my approach?


 * first bullit:
 * second bullit:

Harro Kremer (talk) 21:20, 3 January 2023 (UTC) Alexander Mashintalk 03:46, 9 January 2023 (UTC)
 * Since  is not a string but a Lua table, it is only accessible from Lua (Scribunto is needed) and not from wikitext:

Using External Data in a Template
I want to fill a custom Infobox Template with data from ExternalData. The data will reside in a CSV file matching the non-namespace part of the page importing.

However, this tries to fetch data from a CSV file named "myinfobox" instead What is a better way to achieve this?


 * Is the template called "infobox myinfobox", or just "myinfobox"? And do you see the problem on the "mypage" page, or right on the template page? Yaron Koren (talk) 00:34, 10 January 2023 (UTC)
 * my template page is called "Vorlage:Infobox_VM". The problem occurs on "mypage" as the template page doesn’t really have any visible content nor has it the CSV providing information. "mypage" tries to read from a csv called Vorlage:Infobox_VM.csv instead if mypage.csv.

Alexander Mashintalk 02:11, 10 January 2023 (UTC)
 * This looks like a strange way to invoke a template: . Why not just  ? And, as said above, fetching data from   is what to be expected on the template page itself, unless the code is wrapped with.


 * yes you’re right, i tried being verbose for clarity. in my page i have only and that seems to work well enough. I'm not really concerned what csv is being read on the template page itself, as that page isn’t really supposed to be looked at. I want "mypage" to look at mypage.csv through the template. that’s what doesn’t work.


 * What are the settings for the data source  in   ?
 * that path and all the files in it are world readable. that stuff works in principle. i am able to pull info from csv. just not via the page name in the template.
 * that path and all the files in it are world readable. that stuff works in principle. i am able to pull info from csv. just not via the page name in the template.

Alexander Mashintalk 10:59, 14 January 2023 (UTC)
 * Could not reproduce the issue at my MediaWiki installation. Looks like there you have some sophisticated wiki code that substituted  too early on.
 * I solved the problem this way:  on "mypage" i call  and on the template page i use that variable to find the right file. 24.134.95.253 15:51, 17 January 2023 (UTC)

Slightly different csv data files, only one works
I have 2 48 line csv files, < 10k in size with a minor difference, but only one can be read in ExternalData in 1.39

Simplified case https://johnbray.org.uk/expounder/Extdataproblem1 uses

get_web_data: url=https://files.johnbray.org.uk/Documents/Expounder/Q/532/9928/datagood and then get_web_data: url=https://files.johnbray.org.uk/Documents/Expounder/Q/532/9928/databad

checking that has something from a line of the csv file. datagood works, but databad does not. The difference between the files is a few characters on one line, and the good file is actually longer than the bad

< "item",+1996-04-05T00:00:00Z,+1996-04-08T00:00:00Z,"","in Heathrow wit h, , , , , ,","","","","","","",51.4673,-0.4529

> "item",+1996-04-05T00:00:00Z,+1996-04-08T00:00:00Z,"","in Heathrow wit h, , , , AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBA,","","","","","","",51.4673,-0.4529

Both files are < 10k in size, 48 lines, and the good file is actually longer than the bad

wc datagood databad 48  338  9245 datagood 48  341  9180 databad Vicarage (talk) 14:05, 13 January 2023 (UTC) Alexander Mashintalk 10:45, 14 January 2023 (UTC)
 * Neither of the CSV files is formed perfectly from PHP's point of view. The problems start at the line 7 (one-based). For  this causes the automatically recognised delimiter to be   rather than  . You can overcome this by adding.
 * Thanks for the quick response. That cured the problem, both for my trivial and full cases. Vicarage (talk) 11:00, 16 January 2023 (UTC)

What is the best practice to fetch many values (>300) from the same place?
Is it better to use the legacy method, like #get_web_data, and fetch all the values at once, then display using #external_value; or is it better to use the new method, #external_value with source parameter, 300 times? What performance consideration might there be?

Jeremi Plazas (talk) 17:51, 17 January 2023 (UTC) Alexander Mashintalk 03:22, 18 January 2023 (UTC)
 * If you use caching, the difference is not that big. Using  will save cache lookups, but the legacy mode will stop working once MediaWiki is upgraded to use Parsoid, since it does not guarantee parsing order. The optimal solution is to handle data fetching and display with one Lua function, where you can save the fetched data into a variable and later display it.
 * Thanks, we'll look into Lua. We do have caching setup so the standalone method might be fine, now that you've helped us iron out the kinks. Thanks again for the help! Jeremi Plazas (talk) 17:54, 19 January 2023 (UTC)

Some JSONPATH doesn’t retrieve results
I have a JSON file with Information about network interfaces in it. I am retrieving this information with get_file_data and two jsonpath instructions. However, only one of them seems to be executed/filled with data.

This is my json: and this is the template code i use to retrieve the data: i call this template in the following fashion: where netnames is just an array, usually only one entry like  and filename points to the correct file

You could try a regular expression:. Also, you can get the bulleted list without a template: Alexander Mashintalk 06:23, 19 January 2023 (UTC)
 * If  is an array, I don't know how transcluding it within single quotes in a JsonPath query could work. Neither , nor   is a working JsonPath.   would be. I would suggest replacing   with  , but this does not seem to be implemented.

Strange bug parsing CSV with pipe character
I have a field called  which is called in. All of the values have at least one pipe character, and the page cuts off everything before and including the first pipe character. But the strange thing is that it only happens if is located in a specific place.

The page can be seen at https://comprehensibleinputwiki.org/wiki/Mandarin_Chinese/Videos and the external data is at https://comprehensibleinputwiki.org/wiki/Data:Mandarin_Chinese/Videos. If you look at the wiki source of the first link, I have twice, one of them in a hidden div. If you view the source of the page, the first one is missing part of the value, while the hidden one is complete. Dimpizzy (talk) 20:08, 22 January 2023 (UTC) Alexander Mashintalk 02:01, 23 January 2023 (UTC) UPD: Or, you can add a second  before. Alexander Mashintalk 07:21, 23 January 2023 (UTC)
 * Add  to.
 * It didn't seem to change anything. I changed it to:
 * Dimpizzy (talk) 03:00, 23 January 2023 (UTC)
 * At least, the videos are displayed now. If the current problem is the trimmed titles in the "Title" column, this is not related directly to the extension. The beginning of the title is treated as attributes to the  tag by MediaWiki parser. Wrap   with , like this:  , to see the first chunk of the title.
 * That worked, thanks! I didn't notice any issues on my end with the videos not displaying before, but good to know! Dimpizzy (talk) 09:36, 23 January 2023 (UTC)

Parameter parsing problems
Hi. I use External Data to retrieve data from a PostgreSQL database. In most cases I use prepared statements and I noticed that the passing of parameters seem not to work correctly. Here a self contained test to visualize what I mean.

In the database I have a table with a single column of type text:

SELECT * FROM public.test; txt

a simple text another simple text a text with, a comma in it

Notice that the lines contain spaces and in one case a comma.

Then I have a search function that receives a parameter of type text and returns a set of text:

SETOF TEXT public.mw_test(p_search TEXT)

in the configuration LocalSettings.php for this looks like this:

$wgExternalDataSources['wikidoc'] = [ 'server' => 'xxx', 'type' => 'postgres', 'name' => 'xxx', 'user' => 'xxx', 'password' => 'xxx', 'prepared' => [ 'test' => 'SELECT mw_test FROM public.mw_test($1);' ] ];

In the wikipage the snippet is as follow:

It simply displays what it finds on a line.

What happens is the the list of parameters cannot contain a comma. The snippet as is above works fine and returns:

a text with, a comma in it

But something like this not:

The error is "Fehler: Es wurden keine Rückgabewerte festgelegt."

It is clear that a comma is used to separate parameters and that is the reason why this does not work. My question is how can I pass the whole string "with, a" as a single parameter.

I tried enclosing it in single and double quotes, but this did not help. It leads to this exception:

[6882cc0e426ebdb6cf6911bc] /w/index.php?title=IT/IT_Infrastructure/KOFDB_Uebersicht&action=submit TypeError: EDParserFunctions::formatErrorMessages: Argument #1 ($errors) must be of type array, null given, called in /home/wiki/application/w/extensions/ExternalData/includes/EDParserFunctions.php on line 98

Any idea what I could do to solve this? Help is very appreciated. Thanks

It looks like I found a way to solve this. I can enclose the whole string in round parenthesis and it works.

Alexander Mashintalk 06:10, 11 February 2023 (UTC)
 * An intresting workaround; however, upgrade to be able to use double quotes.

composer.json
When trying to install dependencies via composer, the current requirement says  on REL1_39 branch. The latest version is currently 2.5.4 which doesn't meet this requirement. Could this be updated to a more permissive requirement? Prod (talk) 18:29, 24 February 2023 (UTC)


 * This is being addressed in T330485. Prod (talk) 19:39, 13 March 2023 (UTC)

Not pulling the correct result from the json
I have a json file that validates and have used the following code in the page:

I have tried all sorts of iterations code to get External data to bring back the result for ['Wanted'] but it will only bring back the first entry in the json file, not the ['Wanted'] entry. The patch checks out and I have checked it on a validator so I am not sure what I am doing wrong.

Any help would be appreciated. SyrinxCat (talk) 23:22, 25 April 2023 (UTC)


 * The correct syntax seems to be:
 * A simpler variant may also work, depending on the JSON contents:
 * Alexander Mashintalk 12:47, 26 April 2023 (UTC)
 * Alexander Mashintalk 12:47, 26 April 2023 (UTC)




 * Thanks for those. The first example returns the error
 * ID: Error: no local variable "id" has been set.
 * Type: Error: no local variable "type" has been set.
 * Description(s): Error: no local variable "description" has been set.
 * I was calling the results with
 * ID:
 * Type:
 * Description(s):
 * The second example I had already tried and it has the same problem. SyrinxCat (talk) 16:52, 26 April 2023 (UTC)
 * Then an example of the JSON would be helpful. Alexander Mashintalk 08:13, 28 April 2023 (UTC)

Read BSON array from MongoDB
submit TypeError EDConnectorMongodb::getValueFromJSONArray: Argument #1 ($origArray) must be of type array, MongoDB\Model\BSONDocument given

When I put data to recieve error above 93.153.250.62 09:47, 18 July 2023 (UTC) Alexander Mashintalk 15:27, 18 July 2023 (UTC) Alexander Mashintalk 07:20, 19 July 2023 (UTC) Alexander Mashintalk 13:02, 19 July 2023 (UTC) Alexander Mashintalk 14:06, 19 July 2023 (UTC) Alexander Mashintalk 07:59, 21 July 2023 (UTC)
 * You can now upgrade and check if the error is still there.
 * yes, the same error 93.153.250.62 07:13, 19 July 2023 (UTC)
 * What is the Special:Version line for External Data?
 * 3.3-alpha 93.153.250.62 12:30, 19 July 2023 (UTC)
 * Yes, but what is after "3.3-alpha"?
 * 3.3-alpha (67b5813) 11:15, 3 июля 2023 93.153.250.62 13:42, 19 July 2023 (UTC)
 * And the bug was fixed on the eighteenth of July.
 * Do I need update extension? 93.153.250.62 07:33, 21 July 2023 (UTC)
 * Yes, you do.

Selecting data via a view is not working anymore and an encoding issue
During testing on MW-1.39.4 (coming from MW-1.35) we noticed that selecting data via a view with  the below code stopped working.

The below two examples do exactly the same thing but the one that selects the data from a view gives "Error: no local variable "TestString" has been set." The actual view(code) for testview is:.





The result is:

* Test String 2


 * Error: no local variable "TestString" has been set.

It seems that in both cases the data is selected from the database because when I add a print statement for  (ExternalData\includes\connectors\EDConnectorDb.php on line 159 ) the value for both queries is shown. But the  does not print the value for the query where the data is coming from the view.

Another problem seems to be the  statement in the same file. When running the same query as above where the result of TestString contains a special character the php  prints   and the the following happens:

Internal error

[7f74c5f50b2707f31ceeb4ba] /wiki/Sandbox  ValueError: mb_convert_encoding: Argument #3 ($from_encoding) must specify at least one encoding

Backtrace:

from C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\connectors\EDConnectorDb.php(159) #0 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\connectors\EDConnectorDb.php(159): mb_convert_encoding #1 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\connectors\EDConnectorDb.php(140): EDConnectorDb::processField #2 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\connectors\EDConnectorDb.php(102): EDConnectorDb->processRows #3 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\EDParserFunctions.php(90): EDConnectorDb->run #4 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\EDParserFunctions.php(113): EDParserFunctions::get #5 C:\Program Files\Apache\htdocs\internalwiki\extensions\ExternalData\includes\ExternalDataHooks.php(24): EDParserFunctions::fetch #6 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\Parser.php(3437): ExternalDataHooks::{closure} #7 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\Parser.php(3122): Parser->callParserFunction #8 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\PPFrame_Hash.php(275): Parser->braceSubstitution #9 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\Parser.php(2951): PPFrame_Hash->expand
 * 1) 10 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\Parser.php(1609): Parser->replaceVariables
 * 2) 11 C:\Program Files\Apache\htdocs\internalwiki\includes\parser\Parser.php(723): Parser->internalParse
 * 3) 12 C:\Program Files\Apache\htdocs\internalwiki\includes\content\WikitextContentHandler.php(301): Parser->parse
 * 4) 13 C:\Program Files\Apache\htdocs\internalwiki\includes\content\ContentHandler.php(1721): WikitextContentHandler->fillParserOutput
 * 5) 14 C:\Program Files\Apache\htdocs\internalwiki\includes\content\Renderer\ContentRenderer.php(47): ContentHandler->getParserOutput
 * 6) 15 C:\Program Files\Apache\htdocs\internalwiki\includes\Revision\RenderedRevision.php(266): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput
 * 7) 16 C:\Program Files\Apache\htdocs\internalwiki\includes\Revision\RenderedRevision.php(237): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached
 * 8) 17 C:\Program Files\Apache\htdocs\internalwiki\includes\Revision\RevisionRenderer.php(221): MediaWiki\Revision\RenderedRevision->getSlotParserOutput
 * 9) 18 C:\Program Files\Apache\htdocs\internalwiki\includes\Revision\RevisionRenderer.php(158): MediaWiki\Revision\RevisionRenderer->combineSlotOutput
 * 10) 19 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}
 * 11) 20 C:\Program Files\Apache\htdocs\internalwiki\includes\Revision\RenderedRevision.php(199): call_user_func
 * 12) 21 C:\Program Files\Apache\htdocs\internalwiki\includes\poolcounter\PoolWorkArticleView.php(91): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput
 * 13) 22 C:\Program Files\Apache\htdocs\internalwiki\includes\poolcounter\PoolWorkArticleViewCurrent.php(97): PoolWorkArticleView->renderRevision
 * 14) 23 C:\Program Files\Apache\htdocs\internalwiki\includes\poolcounter\PoolCounterWork.php(162): PoolWorkArticleViewCurrent->doWork
 * 15) 24 C:\Program Files\Apache\htdocs\internalwiki\includes\page\ParserOutputAccess.php(299): PoolCounterWork->execute
 * 16) 25 C:\Program Files\Apache\htdocs\internalwiki\includes\page\Article.php(714): MediaWiki\Page\ParserOutputAccess->getParserOutput
 * 17) 26 C:\Program Files\Apache\htdocs\internalwiki\includes\page\Article.php(528): Article->generateContentOutput
 * 18) 27 C:\Program Files\Apache\htdocs\internalwiki\includes\actions\ViewAction.php(78): Article->view
 * 19) 28 C:\Program Files\Apache\htdocs\internalwiki\includes\MediaWiki.php(542): ViewAction->show
 * 20) 29 C:\Program Files\Apache\htdocs\internalwiki\includes\MediaWiki.php(322): MediaWiki->performAction
 * 21) 30 C:\Program Files\Apache\htdocs\internalwiki\includes\MediaWiki.php(904): MediaWiki->performRequest
 * 22) 31 C:\Program Files\Apache\htdocs\internalwiki\includes\MediaWiki.php(562): MediaWiki->main
 * 23) 32 C:\Program Files\Apache\htdocs\internalwiki\index.php(50): MediaWiki->run
 * 24) 33 C:\Program Files\Apache\htdocs\internalwiki\index.php(46): wfIndexMain
 * 25) 34 {main}

When also printing  it is empty so it seems that the correct encoding is not detected for the string with   on line 158. When adding some code to see what the encoding is it comes back with "Quoted-Printable". I will keep trying to investigate further but this is it for now.

I tested this on a clean install of MW-1.39.4 and where only ExternalData ( version: 3.3-alpha ) is enabled, no other extensions. Thank you. Felipe (talk) 13:04, 27 July 2023 (UTC)

Alexander Mashintalk 02:51, 28 July 2023 (UTC)
 * Upgrade the extension and try again.
 * Hello Alexander, thank you for the quick responds. The SELECT from a view fault is fixed now but the encoding fault was not. After some testing I found out that on some table columns (which hold the data retrieved by ExternalData) my encoding was set to latin1 and not to utf8. In the above mentioned string there was not a normal space but a "non breaking space". The combination of latin1 and trying to check for UTF8 made   fail to return any value. When the column is encoded as UTF8 it works just fine. If I am correct I can fix this completely by using   instead of  . But I need to update MariaDB first to be able to do this.
 * As for   it will still not return UTF-8 when the check "fails". When you instead force $encoding to UTF-8 and $value is not utf8 encoded you get another fault further "upstream" which is probably not what you want. It is probably better to fail in   because $encoding is empty. But that is my humble opinion. Thanks again, Felipe (talk) 11:28, 28 July 2023 (UTC)

Retrieve a wiki page's revision fails with 3.x
With v2.0.1 it was possible to retrieve a wiki page's revision as follows:

This does not work anymore with v3.2 independent of setting  to true or false. Did I miss something? Planetenxin (talk) 18:28, 13 September 2023 (UTC) While I was writing this, I got it. It is a caching issue. The page that you had put the API call on, is new, and the API response, without any revisions yet, is stuck in the cache. You may want to reduce the caching time in, but it will not be lower than the corresponding configuration setting; so, you may need to set low caching time just for API calls:. The alternative is to switch miser mode off and use. Alexander Mashintalk 04:17, 18 September 2023 (UTC)
 * Works with HTTPS, yet there seem to be problems with HTTP, but only if, rather than a constant page name, is used. It's srange; I'll investigate it further.

Upgrade to 1.39.4 produces exception on data retrieval
Using get_program_data in the same condition that worked in version 1.35.4, this call throws an exception now. (omitting the paths, because mediawiki thinks it’s preventing me from spamming links...)

This seems to indicate, that no argument is available to the run function that line 184 is part of. Is this due to some sort of caching? What can i do to fix this?

$wgExternalDataSources['phone_number_formatter'] = [ 'command'      => '/usr / bin/ python3  /opt  /scripts  / prettyPhoneNumber.py $number$', 'params'       => [ 'number'], 'param filters' => [ 'number' => '/^[0-9+ ]+$/' ], ]; Alexander Mashintalk 03:02, 20 September 2023 (UTC)
 * I am glad to read that someone else uses ,
 * although using an external program to format telephone numbers seems an overkill,
 * I also thank you for testing the extension under MediaWiki 1.39, for I have not had a chance to do it,
 * this error is caused by broken backward compatibility in MediaWiki 1.39,
 * and is triggered by the fact that you pass the phone number to the python script as a parametre, not as standard input, and is this the right way?
 * I have submitted a patch to fix the issue.


 * Thanks for the quick help. Will the patch find its way into the tarball distributed by the extension distributor?
 * The script also adds a flag depending on the country the number is from ;) I may change the script to be a webserver instead. it’s kinda slow to launch python so many times.
 * When i apply the patch i can look at my other usages of this plugin, namely the mysql connector and ldap to see if they work correctly. will report back.
 * Out of curiosity, how would i pass the number to STDIN and why do you think it’s better? i can do it either way. 77.21.209.173 10:20, 20 September 2023 (UTC)

Alexander Mashintalk 10:48, 20 September 2023 (UTC)
 * If the tarball is downloaded from gerrit or github, and not from the extension distributor, I think, it will be updated. Use the master branch.
 * I meant that this processing could be performed by a Lua script, especially aided by semantic data (flags, for example),
 * will send the value of the parser function parameter  to program's standard input. This is, more or less, the only choice for long multi-line parameters like dot code for GraphViz. But, perhaps, it is not optimal for short data, like a phone number.
 * Ah i see. I will apply the patch 11 hours from now. I'm using a python library to do the heavy lifting. It’s just quick and dirty and works for us, to time to reimplement the thing in lua.  77.21.209.173 11:07, 20 September 2023 (UTC)