Extension talk:External Data

From mediawiki.org
Latest comment: 12 days ago by Prod in topic composer.json

accessing the internal variable (__json)[edit]

I need some help. Background: I'm working on an extension that uses a webcall to retrieve a JSON object and converts it into concise html code. This extension now is working properly, but lacks caching. This JSON object is complex: containing multiple nested arrays with varying number of elements and it took me a while to program it in php.

My current line of thought is to transfer the entire json object (that is stored in the internal variable __json) to my extension.

Question 1) how does one access the internal variables?? The code below doesn't work.

Question 2) do you have any comments / suggestions on my approach?

{{#get_web_data:url=https://jisho.org/api/v1/search/words?keyword="先生"|format=json}}

  • first bullit: {{#external_value:__json}}
  • second bullit:{{#external_value:meta}}

{{#myextension:{{#external_value:__json}}|param1|param2}} Harro Kremer (talk) 21:20, 3 January 2023 (UTC)Reply[reply]

  • Since __json is not a string but a Lua table, it is only accessible from Lua (Scribunto is needed) and not from wikitext:
local parsed, errors = mw.ext.externalData.getExternalData{ url = 'https://jisho.org/api/v1/search/words?keyword="先生"' }
if parsed then
    local json = parsed.__json
end

Alexander Mashin talk 03:46, 9 January 2023 (UTC)Reply[reply]


Using External Data in a Template[edit]

I want to fill a custom Infobox Template with data from ExternalData. The data will reside in a CSV file matching the non-namespace part of the page importing.

mynamespace:mypage content: {{template:infobox myinfobox}}"
template:myinfobox content: {{infobox | header1 = header | data1 = {{#external_value:value_name|source=data|format=csv with header|delimiter=;|file name={{PAGENAME}}.csv}}
However, this tries to fetch data from a CSV file named "myinfobox" instead What is a better way to achieve this?

Is the template called "infobox myinfobox", or just "myinfobox"? And do you see the problem on the "mypage" page, or right on the template page? Yaron Koren (talk) 00:34, 10 January 2023 (UTC)Reply[reply]
my template page is called "Vorlage:Infobox_VM". The problem occurs on "mypage" as the template page doesn’t really have any visible content nor has it the CSV providing information. "mypage" tries to read from a csv called Vorlage:Infobox_VM.csv instead if mypage.csv.
  • This looks like a strange way to invoke a template: {{template:infobox myinfobox}}. Why not just {{myinfobox}}? And, as said above, fetching data from myinfobox.csv is what to be expected on the template page itself, unless the code is wrapped with <includeonly>...</includeonly>.
    Alexander Mashin talk 02:11, 10 January 2023 (UTC)Reply[reply]
yes you’re right, i tried being verbose for clarity. in my page i have only {{Infobox VM}} and that seems to work well enough. I'm not really concerned what csv is being read on the template page itself, as that page isn’t really supposed to be looked at.
I want "mypage" to look at mypage.csv through the template. that’s what doesn’t work.
  • What are the settings for the data source data in LocalSettings.php ($wgExternalDataSources['data'])?
$wgExternalDataSources['data']['path'] = '/opt/infradb-data/vm/';
that path and all the files in it are world readable. that stuff works in principle. i am able to pull info from csv. just not via the page name in the template.
  • Could not reproduce the issue at my MediaWiki installation. Looks like there you have some sophisticated wiki code that substituted {{PAGENAME}} too early on.
    Alexander Mashin talk 10:59, 14 January 2023 (UTC)Reply[reply]
    I solved the problem this way: on "mypage" i call <code><nowiki>{{Infobox Host | filename={{PAGENAME}} }} </nowiki></code> and on the template page i use that variable to find the right file. 24.134.95.253 15:51, 17 January 2023 (UTC)Reply[reply]

Slightly different csv data files, only one works[edit]

I have 2 48 line csv files, < 10k in size with a minor difference, but only one can be read in ExternalData in 1.39

Simplified case https://johnbray.org.uk/expounder/Extdataproblem1 uses

get_web_data: url=https://files.johnbray.org.uk/Documents/Expounder/Q/532/9928/datagood and then
get_web_data: url=https://files.johnbray.org.uk/Documents/Expounder/Q/532/9928/databad 

checking that {{#external_value:ddescription}} has something from a line of the csv file. datagood works, but databad does not. The difference between the files is a few characters on one line, and the good file is actually longer than the bad

< "item",+1996-04-05T00:00:00Z,+1996-04-08T00:00:00Z,"{{link|Q111529509|Evolution}}","in Heathrow wit h {{link|Q312405|Vernor Vinge}}, {{link|Q472872|Jack Cohen}}, {{link|Q2927188|Bryan Talbot}}, {{link| Q7151782|Paul Kincaid}}, {{link|Q742918|Colin Greenland}}, {{link|Q62625219|Maureen Kincaid Speller}} ,","","","","","","",51.4673,-0.4529

> "item",+1996-04-05T00:00:00Z,+1996-04-08T00:00:00Z,"{{link|Q111529509|Evolution}}","in Heathrow wit h {{link|Q312405|Vernor Vinge}}, {{link|Q472872|Jack Cohen}}, {{link|Q2927188|Bryan Talbot}}, {{link| Q7151782|Paul Kincaid}}, {{link|Q742918|Colin Greenland}}AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBA,","","","","","","",51.4673,-0.4529

Both files are < 10k in size, 48 lines, and the good file is actually longer than the bad

wc datagood databad

 48   338  9245 datagood 
 48   341  9180 databad Vicarage (talk) 14:05, 13 January 2023 (UTC)Reply[reply]
  • Neither of the CSV files is formed perfectly from PHP's point of view. The problems start at the line 7 (one-based). For databad this causes the automatically recognised delimiter to be | rather than ,. You can overcome this by adding delimiter=,.
    Alexander Mashin talk 10:45, 14 January 2023 (UTC)Reply[reply]
    Thanks for the quick response. That cured the problem, both for my trivial and full cases. Vicarage (talk) 11:00, 16 January 2023 (UTC)Reply[reply]

What is the best practice to fetch many values (>300) from the same place?[edit]

Is it better to use the legacy method, like #get_web_data, and fetch all the values at once, then display using #external_value; or is it better to use the new method, #external_value with source parameter, 300 times? What performance consideration might there be?

Jeremi Plazas (talk) 17:51, 17 January 2023 (UTC)Reply[reply]

  • If you use caching, the difference is not that big. Using {{#get_web_data:}} will save cache lookups, but the legacy mode will stop working once MediaWiki is upgraded to use Parsoid, since it does not guarantee parsing order. The optimal solution is to handle data fetching and display with one Lua function, where you can save the fetched data into a variable and later display it.
    Alexander Mashin talk 03:22, 18 January 2023 (UTC)Reply[reply]
    Thanks, we'll look into Lua. We do have caching setup so the standalone method might be fine, now that you've helped us iron out the kinks. Thanks again for the help! Jeremi Plazas (talk) 17:54, 19 January 2023 (UTC)Reply[reply]


Some JSONPATH doesn’t retrieve results[edit]

I have a JSON file with Information about network interfaces in it. I am retrieving this information with get_file_data and two jsonpath instructions. However, only one of them seems to be executed/filled with data.

This is my json:
[{"ip-addresses": [{"prefix": 24, "ip-address": "1.1.1.1", "ip-address-type": "ipv4"}, {"ip-address-type": "ipv6", "ip-address": "fe80::aaaa:aaaa:aaaa:aaaa", "prefix": 64}], "hardware-address": "ab:ab:ab:ab:ab:ab", "name": "eth0"}, {"ip-addresses": [{"ip-address": "1.1.1.2", "ip-address-type": "ipv4", "prefix": 24}, {"ip-address-type": "ipv6", "ip-address": "fe80::ccff:ceff:feff:ceff", "prefix": 64}], "hardware-address": "ce:ff:ce:ff:ce:ff", "name": "eth1"}]
and this is the template code i use to retrieve the data:
{{#get_external_data: source=host_data |file name={{{{{ucfirst:{{{hostname}}}}}|dummy}}}_net.json |data=mymacaddress=$[?(@.name == '{{{interfacename}}}')].hardware-address, adressen=$[?(@.name == '{{{interfacename}}}')][ip-addresses][*].ip-address |use jsonpath |format=json }} {{{hostname}}} {{{interfacename}}} ({{#external_value:mymacaddress|invalid}}) {{#display_external_table: template=bulleted list|data=1=adressen}}
i call this template in the following fashion:
{{#display_external_table: template=Interface |data=interfacename={{{netnames}}},hostname={{{filename}}} }}
where netnames is just an array, usually only one entry like ["ens18"] and filename points to the correct file

  • If {{{netnames}}} is an array, I don't know how transcluding it within single quotes in a JsonPath query could work. Neither $[?(@.name == '["ens18"]')].hardware-address, nor $[?(@.name == '["eth1"]')].hardware-address is a working JsonPath. mymacaddress=$[?(@.name == 'eth1')].hardware-address would be. I would suggest replacing == with in, but this does not seem to be implemented.
    You could try a regular expression: $[?(@.name =~ 'eth0|eth1')].ip-addresses.[*].ip-address.
    Also, you can get the bulleted list without a template:
{{#for_external_table:|
 * {{{adressen}}}}}

Alexander Mashin talk 06:23, 19 January 2023 (UTC)Reply[reply]

Strange bug parsing CSV with pipe character[edit]

I have a field called title which is called in #for_external_table. All of the values have at least one pipe character, and the page cuts off everything before and including the first pipe character. But the strange thing is that it only happens if {{{title}}} is located in a specific place.

The page can be seen at https://comprehensibleinputwiki.org/wiki/Mandarin_Chinese/Videos and the external data is at https://comprehensibleinputwiki.org/wiki/Data:Mandarin_Chinese/Videos. If you look at the wiki source of the first link, I have {{{title}}} twice, one of them in a hidden div. If you view the source of the page, the first one is missing part of the value, while the hidden one is complete. Dimpizzy (talk) 20:08, 22 January 2023 (UTC)Reply[reply]

  • Add delimiter=, to {{#get_web_data:}}.
    Alexander Mashin talk 02:01, 23 January 2023 (UTC)Reply[reply]
    It didn't seem to change anything. I changed it to:
    {{#get_web_data:url={{fullurl:Data:Mandarin_Chinese/Videos|action=raw}}|format=csv with header|data=language=Language, title=Title, videoId=Video ID, service=Service, level=Level, channel=Channel, index=Index|delimiter=,}} Dimpizzy (talk) 03:00, 23 January 2023 (UTC)Reply[reply]
      • At least, the videos are displayed now. If the current problem is the trimmed titles in the "Title" column, this is not related directly to the extension. The beginning of the title is treated as attributes to the <td> tag by MediaWiki parser. Wrap {{{title}}} with nowiki, like this: {{#tag:nowiki|{{{title}}}}}, to see the first chunk of the title.
        UPD: Or, you can add a second {{!}} before {{{title}}}.
        Alexander Mashin talk 07:21, 23 January 2023 (UTC)Reply[reply]
        That worked, thanks! I didn't notice any issues on my end with the videos not displaying before, but good to know! Dimpizzy (talk) 09:36, 23 January 2023 (UTC)Reply[reply]


Parameter parsing problems[edit]

PHP 8.2.1
MediaWiki 1.38.4
PostgreSQL 14.6
External Data 3.2 (5d30e60) 08:38, 2. Nov. 2022

Hi. I use External Data to retrieve data from a PostgreSQL database. In most cases I use prepared statements and I noticed that the passing of parameters seem not to work correctly. Here a self contained test to visualize what I mean.

In the database I have a table with a single column of type text:

SELECT * FROM public.test;
            txt
----------------------------
 a simple text
 another simple text
 a text with, a comma in it

Notice that the lines contain spaces and in one case a comma.

Then I have a search function that receives a parameter of type text and returns a set of text:

SETOF TEXT public.mw_test(p_search TEXT)

in the configuration LocalSettings.php for this looks like this:

$wgExternalDataSources['wikidoc'] = [
    'server' => 'xxx',
    'type' => 'postgres',
    'name' => 'xxx',
    'user' => 'xxx',
    'password' => 'xxx',
    'prepared'  => [
       'test' => 'SELECT mw_test
                  FROM public.mw_test($1);'
    ]
];

In the wikipage the snippet is as follow:

{{#get_db_data:
  db         = wikidoc
| query      = test
| parameters = with
| data       = documentation=mw_test
}} 
{{#for_external_table:<nowiki />
{{{documentation}}}
}}
{{#clear_external_data:}}

It simply displays what it finds on a line.

What happens is the the list of parameters cannot contain a comma. The snippet as is above works fine and returns:

a text with, a comma in it

But something like this not:

{{#get_db_data:
  db         = wikidoc
| query      = test
| parameters = with, a
| data       = documentation=mw_test
}} 
{{#for_external_table:<nowiki />
{{{documentation}}}
}}
{{#clear_external_data:}}

The error is "Fehler: Es wurden keine Rückgabewerte festgelegt."

It is clear that a comma is used to separate parameters and that is the reason why this does not work. My question is how can I pass the whole string "with, a" as a single parameter.

I tried enclosing it in single and double quotes, but this did not help. It leads to this exception:

[6882cc0e426ebdb6cf6911bc] /w/index.php?title=IT/IT_Infrastructure/KOFDB_Uebersicht&action=submit TypeError: EDParserFunctions::formatErrorMessages(): Argument #1 ($errors) must be of type array, null given, called in /home/wiki/application/w/extensions/ExternalData/includes/EDParserFunctions.php on line 98

Any idea what I could do to solve this? Help is very appreciated. Thanks

It looks like I found a way to solve this. I can enclose the whole string in round parenthesis and it works.

{{#get_db_data:
  db         = wikidoc
| query      = test
| parameters = (with, a)
| data       = documentation=mw_test
}} 
{{#for_external_table:<nowiki />
{{{documentation}}}
}}
{{#clear_external_data:}}

composer.json[edit]

When trying to install dependencies via composer, the current requirement says "composer/installers": "~2.1" on REL1_39 branch. The latest version is currently 2.5.4 which doesn't meet this requirement. Could this be updated to a more permissive requirement? Prod (talk) 18:29, 24 February 2023 (UTC)Reply[reply]

This is being addressed in phab:T330485. Prod (talk) 19:39, 13 March 2023 (UTC)Reply[reply]