Extension talk:External Data/Archive 2017 to 2018

JSON error or choice
Thanks for this extension. It works great and easy. But I discovered an error, or may be you could call it a choice.

Reading json from a webservice, I saw that some data from one records ended up in an other record. The records were mixed up. Looking in the source code I figured out that de extension builds an array containing the external data ($external_values). New values (records) are appended at the end of that array. If a value doesn't exists in a record, there wil be an empty space, that will be filled with data from the next record.

The power of Json is that you don't have to send non-existing data. The plugin unfortunately expects a json array in which the records are all of the same size.

So, the important question is, is this an error or a choice ?

In case you want to see what is happening, use the code below in a wiki page with the extension ExternalData. You have to understand a bit of dutch topography in order to see that the City-name is mixed up at the end of the table.

(the first call is to get the actual code for geogeneration, that is needed in the second call)

Alfred


 * It's a bug, though one I don't know how to fix. Yaron Koren (talk) 14:24, 8 August 2018 (UTC)

That is a fair answer. I tried to figure it out. You'll need at least the array $mappings in de parseTree function. --Alfredpel (talk) 16:05, 9 August 2018 (UTC)

Maybe the code will solve the problem: /* 	alfred's version of parseTree mappings is needed. therefore two extra changes has to be made in de file ED_Utils.php: getDataFromText: return self::getJSONData( $contents, $prefixLength, $mappings ); getJSONData: static function getJSONData( $json, $prefixLength, $mappings="" ) { all key in mapping are provided if not available in json, it gts the value "no data" (empty string would be better but less easy for debugging 	Something odd happens with the keys in lower or uppercase. therefore alle keys are translated to lowercase 	Changes work with my data. I am not sure whether of nog it conflicts with other data. static function parseTree ($tree,&$retrieved_values, $mappings) {	if (!is_array($tree)) return ;	$tree = array_change_key_case ( $tree,CASE_LOWER);	// loop through all values of $tree, if one is an array then only process that value, $hasArray keeps track of that	$hasArray=false;	foreach ($tree as $key => $val) {		if (is_array($val)) {			$hasArray = true;			self::parseTree($val,$retrieved_values, $mappings);		}	} //	if no array was found in tree, then map the values on the output array $retrieved_values	if (!$hasArray) {		foreach ($mappings as $key => $mapping) {			if (isset($tree[$mapping])) { if (!is_array($tree[$mapping])) { $retrieved_values[$mapping][] = $tree[$mapping]; }			} else { // if the key in mappings does not exist, add a "no data" value (make this a empty string after testing) $retrieved_values[$mapping][] = "no data"; }		}	}	return ; }

--Alfredpel (talk) 20:22, 9 August 2018 (UTC)

Is the usage of SQL group by possible?
I would consider "group by" extremly useful since it would be possible to create statistics for database content really easily. --WolfgangFahl (talk) 12:11, 10 January 2017 (UTC)
 * 1) Is the usage of SQL "group by" possible?
 * 2) if yes how?
 * 3) if no as it was pointed out in Extension_talk:External_Data/Archive_2013 where would a CR be posted?


 * No, it's still not possible, though I can see how it would be useful. What's a CR? Yaron Koren (talk) 14:35, 10 January 2017 (UTC)

Yaron - thank you for getting back to this - a CR ist a change request. I had used group by successfully with my Mediawiki 1.23 wiki and and old version of SMW and the External Data extension 1.8.2. The markup i used was:

unfortunately this now gives me a regression with invalid SQL. SELECT DATE_FORMAT(from_unixtime(timestamp/1000), '%Y-%m-%d %T'),d.channel_id,max(wert),p.value FROM `(select channel_id`,`timestamp`,round(value)/1000 as wert from data order by timestamp desc limit 200) as d left join entities e on d.channel_id `e.id left join properties p on e.id=p.entity_id and e.type='electric meter' and p.pkey='title'` WHERE d.channel_id in ('1','2','5','7') GROUP BY channel_id ORDER BY channel_id LIMIT 100 There seem two be two issues now: Going back to version 1.8.2 fixes the problem :-( --Seppl2013 (talk) 15:41, 24 September 2017 (UTC)
 * the usage of backticks at places where they shouldn't be
 * the stricter interpreteation of the group by clause by mySQL
 * Version 1.8.3 completely breaks MySQL JOIN queries. Moved back to version 1.8.2 for now. Hope this gets fixed soon. --Wikimantis (talk) 01:12, 25 May 2018 (UTC)


 * Are you using "group by", or are you talking about something else? Yaron Koren (talk) 02:39, 25 May 2018 (UTC)

How to pass header value to request?
With some API we need to pass in the headers some value (for example: https://docs.gitlab.com/ee/api/README.html#authentication) I tried url=--header "PRIVATE-TOKEN: xxxxxxxxxxxxxx" https://myserver/api/v1/list/all or url=https://myserver/api/v1/list/all,array('PRIVATE-TOKEN'=>'xxxxxxxxxxxxxxxx') But not chance. How can we achieve this? if this is possible, can we have an example of how?


 * The #get_web_data function only works with RESTful APIs; #get_soap_data might work in this case. Yaron Koren (talk) 15:52, 17 February 2017 (UTC)

Dash character in data
Hello

We found when displaying csv file data with a) cells that contain just a dash character, or b) cells that contain text and start with a dash character that any attempt to render this data into a table will cause rows to break at that cell. This is presumably because the parser inserts | for a column break and then the - from the string, resulting in |- which is a special string which means 'new row'.

To resolve this I edited the function GetCSVData in the file ED_Utils.php. In the foreach loop just before the return statement, I added a check for the dash character which replaces it with a dash wrapped in tag. This resolves the issue. Note that this issue likely still exists when other parsers like get_db_data are used, since I only resolved it in the GetCSVData function.

Note that I have had to replace the word nowiki with n0wiki below, because it was impossible to display this code since it was closing nowiki brackets.


 * Great, thanks for this patch! Is that first "if" check necessary? Maybe everything can be handled with just the "if substr" call? Yaron Koren (talk) 12:19, 14 April 2017 (UTC)

PHP Notice in error logs
Hi I can see theses messages in the apache server error log

PHP Notice: Second parameter should not be a timeout. [Called from EDUtils::fetchURL in /opt/mediawiki/extensions/ExternalData/ED_Utils.php at line 882]

ExternalData 1.8.3 Mediawiki 1.27.3

Cheers

Ubibene

Daisy Chaining
Hey, is there a way to daisy-chain queries? As far as i can see one can only enter one URL to get data from. My usecase: get some data, create a table with it and then use coordinates from the first query in a secondary query for reverse geocoding (for each individual entry). Is this possible somehow or is further explainantion required?


 * If I understand the question correctly, you can just cycle through the data twice, by calling #for_external_table twice - the data won't get cleared after the first call. Or am I misunderstanding something? Yaron Koren (talk) 03:20, 31 May 2017 (UTC)
 * In my specific usecase i'm building a table from an overpass query which returns xml. I then want to geocode every coordinate of that table via nominatim (returns xml too) to a specific country.

Error: Did not find alias for special page 'GetData'.
Hi, i use Version 1.8.3 (6705c56) from 2017-07-17T22:03:53. This Error is already marked as fixed, but i still get these message. see commit on Phabricator/"Add file for special page aliases". My wiki installation is Version 1.29.0 in german language

--TomyLee (talk) 09:28, 28 July 2017 (UTC)


 * Yes, an "alias" file was created, but no aliases have been defined yet. Where are you seeing that error? I didn't know that languages for which an alias has not been defined produced an error message. Yaron Koren (talk) 15:21, 28 July 2017 (UTC)


 * I see it only on the page "Special pages", theis message is printet 2 times. In LocalSettings i have set "$wgDevelopmentWarnings = true" for my upgrade process. TomyLee (talk) 06:10, 31 July 2017 (UTC)

detect_encoding expects parameter 1 to be string (solved)
Hi, if i execute runjobs.php i get lots of notices for this extension. It says:

--TomyLee (talk) 09:55, 28 July 2017 (UTC)


 * That's very unexpected. If possible, could you add a line above that line that looks like "if (is_object( $dbField ) ) print_r( $dbField );", and let me know what it prints out? Yaron Koren (talk) 15:18, 28 July 2017 (UTC)


 * i Insert your code and also a "Debug" message that prints without condition. Here is one example output:

The error comes from a Query to Microsoft SQL Server with column datatype "date" with value "2011-10-13" --TomyLee (talk) 06:52, 31 July 2017 (UTC)
 * I have solved this error by insert a new check bevor the convert line. my ED_Utils.php is now:


 * Great find! Sorry, I think I missed your previous message here. I just checked in a patch based on this code - hopefully this problem is now solved. Yaron Koren (talk) 15:36, 11 August 2017 (UTC)

Undefined index: UseWindowsAuth and port (solved)
as a third error message if i execute runphp.php i get the notice:

i know, this is not part of ExternalData but i use Microsoft Server only with External Data.

Maybe you need thes two new(?) parameters for mssql. In "MWLBFactory.php" i found these part of code:


 * This sounds like something recently added to MediaWiki. What's MWLBFactory.php? Maybe I can copy the approach from there. Yaron Koren (talk) 15:22, 28 July 2017 (UTC)
 * I don't know the file. i search for these variables to see if there is maybe a new config variable for my LocalSettings i missed. But the only appereance i found was in classMWLBFactory
 * Did you set and ? --wargo (talk) 09:05, 31 July 2017 (UTC)
 * Yes --TomyLee (talk) 09:50, 31 July 2017 (UTC)
 * I have solved the problem for me. I had to added little bit of code. It seems that the parameters not used by MediaWiki, maybe in a later release? But the warning messages never comes up again:You have to add these two new Parameters for the connection definition. To get this: In Localsettings add these new variables (DbName ist the Name/Id of your connection): in ExternalData.php at Line 60 add these two lines:   in ED_Utils.php the static function getDBData has to be modified at 3 areas: Line 180 add   Line 193 add   Line 244,260 and 273 add to the array of parameters   --TomyLee (talk) 13:03, 11 August 2017 (UTC)

can't get to xml field from api query call
A query i added to a template looks like this:

on the final page though this returns the error: Undefined Index title

running the query in the sandbox returns this xml though with the fields title.

What am I missing?

Thanks Markus


 * That's strange - I was able to get essentially the same thing working, though with a hardcoded category name. Does this happen with all pages that call this template? Could you try calling it with a category whose name doesn't contain spaces or non-ASCII letters? Yaron Koren (talk) 01:12, 1 August 2017 (UTC)


 * Thanks Yaron! I should have tried this... Using a hardcoded Category name makes it work.

Addressing an Array
Is it possible to pull information out of an array? I need to pull information out of the statuses array.

Is is something like?

I understand that this isn't in any way scalable, but the vast majority of the time the array does not contain more than one member, and at this stage I am just trying to learn about ExternalData's capabilities.


 * Unfortunately, no - if this were stored in XML, you could do it with the "use xpath" parameter, but there's no such parameter for JSON. (There is a JSONpath technology, but it's poorly supported, from what I understand, and it's not supported by External Data.) Yaron Koren (talk) 01:10, 4 September 2017 (UTC)
 * Thanks, good to know

JSON GET header
Is it possible to send a JSON GET header with ExternalData?


 * You mean, have an API query string composed of JSON? I don't see why not... Yaron Koren (talk) 17:12, 13 September 2017 (UTC)


 * Sorry, I am rather new to JSON. The GET query I have in Postman requires the header section with a key value pair. I am not sure how define a header using ExternalData. Does that make sense?


 * Can you define this query as a single URL, with all the JSON in the query string? If so, you can just pass that in to #get_web_data. Yaron Koren (talk) 13:33, 14 September 2017 (UTC)

Accessing source files
Often enough I cannot access the XML directly but need to open the source file through the browser. In browsers such as Chrome and Firefox, the URL is typically prefixed by "view-source:". Is there any way we can make External Data query the source code? I did some test runs with the prefix but all I got so far is "Error: No contents found at URL view-source:http:..". Cavila 07:25, 5 November 2017 (UTC)


 * I don't understand - when you do a "view source", you're getting the same content, just displayed in a different way. How would that apply with External Data? Yaron Koren (talk) 01:17, 6 November 2017 (UTC)
 * Not necessarily. Just because ".xml" is in the URL doesn't necessarily mean that the web page gives you raw, queryable XML. Or you'll get a parse error because any XSL referred to is missing. Cavila 18:57, 6 November 2017 (UTC)
 * Alright, but, as far as I know, you'll always get the same content when using "view source". Or do you know of a case where doing "view source" gives true XML, but External Data's retrieval does not? Yaron Koren (talk) 20:09, 6 November 2017 (UTC)
 * Sure, this comes to mind. Cavila 20:33, 6 November 2017 (UTC)

That's a custom browser display, but #get_web_data seems to retrieve the right thing there - the following call works for me:

Yaron Koren (talk) 14:41, 7 November 2017 (UTC)

#for_external_table requires "nowiki"
My 1.30 Mediawiki didn't recognize the example provided for #for_external_table. The result was one "-" per each line of the file.

The solution was to change:

! Area {{#for_external_table:

for:

! Area {{#for_external_table:

And then it worked. Spent some hours debugging it. Roger Krolow (talk) 14:37, 29 December 2017 (UTC)


 * Thanks for pointing that out, and sorry for the long delay - and the initial problem. I just fixed this in the documentation. Yaron Koren (talk) 03:02, 18 January 2018 (UTC)

"edit source" link stops appearing from here forward
Don't know why. Flow bug? Maybe because topic above uses: nowiki> </nowiki


 * Yeah... I just fixed it. Yaron Koren (talk) 12:33, 23 May 2018 (UTC)

Variable MySQL table search
Hello. i'm using mediawiki 1.30.0. Have a test mysql database set on oracle virtualbox ubuntu 16.04. I'm creating a documentation system for a company that would fetch data on a template from their own mysql database by name (or number) of client company (this also would be the pagename). Idea is to create a template that fetches data from the database by the pagename (as the client amount might increase, automation is the goal) The code below does not work but to get the idea...

Problem is that mysql can't have variables as a search value (and I think the way commands are processed the BASEPAGENAME template wont even open inside the #get_db_data) So i'm forking the Extension/Externaldata/ED_parserfunctions.php with something like this... if ( array_key_exists( 'from', $args ) ) { if ( $args['from'] == 'word' ) { $args['from'] = -something?- $from = $args['from']; }			else { $from = $args['from']; } Any ideas, solutions, code locations to fork more, are all welcome!


 * Actually, "BASEPAGENAME" in the above example will get parsed correctly - I just tried it myself to be sure. And more generally, nested parser functions, templates, variables etc. always get parsed from the innermost one out, as far as I know. Could it be that there's just some issue with your #get_db_data call? Yaron Koren (talk) 12:27, 18 April 2018 (UTC)


 * It worked! i went back to trying the "BASEPAGENAME" template call and found out that i had a problem with mysql tables being written in all lower case letters. As the page comes with the first letter being capital, i just had to redo my tables for this to work. Thanks for the explanation, it saved me a ton of time ^^ best regards:OP, to be :nappula3 8:31 19 April 2018 (UTC)

Form to Create Record?
Is there an extension or native way to create/edit a record in a db using a form?


 * This doesn't sound like an External Data extension question, but the Cargo and Semantic MediaWiki extensions both let you do that. Yaron Koren (talk) 13:31, 11 May 2018 (UTC)


 * Thx!

Johnywhy (talk) 14:42, 23 May 2018 (UTC)

Use Logged-In User Credentials?
When i execute the following, data returned looks like this extension is NOT running with logged-in user credentials:

That url gives, for example:

But returns, which is the token for '''non-logged-in users. ''' "'For unregistered users, the token is always +\'          https://www.mediawiki.org/wiki/API:Edit#Unregistered_users"

had no effect.

Is there something wrong with my syntax?

Do i need to use https://www.mediawiki.org/wiki/Extension:NetworkAuth?

Johnywhy (talk) 10:48, 23 May 2018 (UTC)


 * I don't fully know the mechanics of tokens, but I would think it's a very good thing that External Data doesn't let you use your own MediaWiki token - otherwise, a malicious user could potentially put calls in some page to get admins who visit that page to unwittingly do all sorts of bad things. Yaron Koren (talk) 13:02, 23 May 2018 (UTC)


 * Only if the malicious user has access to the page containing the call. But, that can't happen if the page is protected. Seems this is not the only potential security risk that is completely under control of the admins. An admin could also post their password on a public page, and then a malicious user could gain admin access-- admins ought know not to do that. To troubleshoot this issue, can you plz comment whether my syntax above is correct? -thx Johnywhy (talk) 14:48, 23 May 2018 (UTC)


 * Okay, but what if the malicious user puts the call on an unprotected page? Or on every unprotected page? Yaron Koren (talk) 15:09, 23 May 2018 (UTC)


 * Plz see title of this topic: "Logged-In User Credentials"
 * Why aren't i receiving alerts about your comments here?

If extension respects logged in user credentials (as it should), then there's no risk. Johnywhy (talk) 11:36, 25 May 2018 (UTC)


 * On the contrary, I see tremendous risk: (1) malicious user goes to page "ABC", puts in a call to #get_web_data that calls some harmful MediaWiki API action which uses the logged-in user's credentials; (2) user with admin rights goes to page "ABC", the page calls that API URL and is successful because that user has the correct privileges. Do you see what I'm saying? Yaron Koren (talk) 13:33, 25 May 2018 (UTC)


 * Ok, i can see your point there. So, does that mean that all actions performed by this extension are done with anonymous user-rights? Thx


 * I suppose you could describe it that way, yes. Yaron Koren (talk) 23:47, 25 May 2018 (UTC)

From this page, it sounds like it IS possible to log into the API as a specific user. Correct? However, it sounds like it requires javascript, right? - Johnywhy (talk) 19:22, 2 July 2018 (UTC)


 * Yes, it's possible. I don't know if it would require JavaScript. Yaron Koren (talk) 22:31, 2 July 2018 (UTC)

Request: Adding Records
Awesome extension! Would also be awesome if the extension could add/edit records. Maybe could use built-in MW edit-token, from the API.


 * I'm not sure what this means, but External Data can be used in conjunction with either Cargo or Semantic MediaWiki to store the information it retrieves. Yaron Koren (talk) 18:33, 27 May 2018 (UTC)


 * Thx for reply. It's my understanding that Cargo can only do a store-once action for a page-- it can't add more records without overwriting the existing records for that page. https://www.mediawiki.org/wiki/Extension_talk:Cargo#How_to_Remove_Built-In_Fields?
 * Johnywhy (talk) 00:34, 2 June 2018 (UTC)


 * That's right. Yaron Koren (talk) 16:41, 3 June 2018 (UTC)

1.31 API Issue: #get_web_data Suddenly Failing
This code was returning correct section number of a named section

Now (on MW 1.31) returning: Error: no local variable "SecID" was set.

The url alone returns correct json: returns {"parse":{"title":"Portal:TagDescriptions","pageid":433,"sections":[{"toclevel":1,"level":"2","line":"Terrorism","number":"1","index":"1","fromtitle":"Portal:TagDescriptions","byteoffset":0,"anchor":"Terrorism"},{"toclevel":1,"level":"2","line":"SecondAmendment","number":"2","index":"2","fromtitle":"Portal:TagDescriptions","byteoffset":747,"anchor":"SecondAmendment"},{"toclevel":1,"level":"2","line":"Timeliness","number":"3","index":"3","fromtitle":"Portal:TagDescriptions","byteoffset":1275,"anchor":"Timeliness"},{"toclevel":1,"level":"2","line":"Money","number":"4","index":"4","fromtitle":"Portal:TagDescriptions","byteoffset":2317,"anchor":"Money"},{"toclevel":1,"level":"2","line":"CivilRights","number":"5","index":"5","fromtitle":"Portal:TagDescriptions","byteoffset":3038,"anchor":"CivilRights"},{"toclevel":1,"level":"2","line":"CommonGround","number":"6","index":"6","fromtitle":"Portal:TagDescriptions","byteoffset":3800,"anchor":"CommonGround"}]}}

Update: Exact same query works with a static json file (instead of API call). Json file contains same json as returned by API:

Something about the API has changed. https://www.mediawiki.org/wiki/Topic:Ug391jaupfj0mvnm

- Johnywhy (talk) 15:26, 1 July 2018 (UTC)


 * I think the issue is that the API now requires the user to be logged in. When I go to that API URL, I get a "You need read permission" error - and I assume the External Data code is getting the same thing. Yaron Koren (talk) 01:19, 2 July 2018 (UTC)

That sounds related. I found a way to cause the problem: Disabling anonymous reads with: $wgGroupPermissions['*']['read'] = false;

''This happens while logged in as admin. '' Disabling anonymous read when logged in as admin shouldn't break anything. - Johnywhy (talk) 17:59, 2 July 2018 (UTC)


 * You may be logged in, but the External Data code accessing the API is not. Yaron Koren (talk) 18:54, 2 July 2018 (UTC)

"the API now requires the user to be logged in" Since 1.31? @Anomie says the API didn't change. https://phabricator.wikimedia.org/T198577#4391434 - Johnywhy (talk) 19:26, 2 July 2018 (UTC)


 * Right, it's not due to a change in MediaWiki - I assume you changed the permissions on the wiki. Yaron Koren (talk) 22:33, 2 July 2018 (UTC)

Updating to 1.31 no longer access database tables
Hi, I have some tables in a MYSQL database which I can access perfectly fine under 1.30 with this extension but as soon as I update to 1.31 I can no longer access the database.

The error I get is as shown below:

Error: 1146 Table 'mraths48_meta. Metadata' doesn't exist (localhost)

Taken from a debug log file.

If I look in phpMyAdmin everything looks fine the tables are there and contain all the data.

Let me know how I can help to resolve this.

Regards

RogerA


 * Hi - that's unexpected. I don't know if this has to do with any change to the MediaWiki code in 1.31, or if this is just a result of something that happened during the upgrade process, or maybe it's just a coincidence that it happened at the same time as the upgrade... this page lists some things you can try when that error happens, like restarting MySQL. I would recommend trying one or more of those. Yaron Koren (talk) 11:12, 9 July 2018 (UTC)

Hi, I can access the tables from phpMyadmin and also form the MYSQL command line if I SSH onto the server so I am pretty sure the tables are all OK. I used the same settings that are in the Localsettings.php to access via MYSQL command line so they are all OK also. I also know that if I restore 1.30 from a backup everything will be fine. Is there a way to check if the connection to the database is being made? It is unlikely to be a MYSQL problem as I am using a shared server. I am happy to work through debugging steps if you let me know how to proceed?

It is as you say possible that some coincidence has occurred as a result of the update but if I so I need to track it down and I am not sure how to do that. Regards

RogerA


 * Ah - if changing back to MW 1.30 lets you access the table again, then I was wrong - it is something in the code. I don't know what, though. Is "Metadata" some kind of system table, or is it just a regular table with a clever name? Yaron Koren (talk) 14:07, 9 July 2018 (UTC)

Hi,

Metadata is nothing special and contains information about images e.g. dates, subject, copyright, photographer. It has 13 columns and 140000 rows and runs under InnoDB size 1.5MiB. I can also confirm that if I restore this table to an earlier version before the update to 1.31 I still get the same error so to me it seems unlikely that its the table. Let me know how I can help to debug.

Regards

RogerA

NB The database that contains the table Metadata is also on Localhost in case it is important

Hi, Just checking the error log for this exception and I can see a leading space in front of Metadata which might be relevant:

Query: SELECT UID,Origin_Date,Origin_Ref,Type,Image_Author,Subject,Copyright  FROM ` Metadata`    WHERE UID="002" Function: EDUtils::searchDB Error: 1146 Table 'mraths48_meta. Metadata' doesn't exist (localhost)

There is a leading space reported in the error. I can confirm this is not present in my call which simply has |from=Meatdata.

Maybe this is relevant.

Regards

RogerA

Hi, I tracked this down to line 491 in Ed_Utils.php where a space was being added to the table name. Once I removed this it worked fine. I also noticed that the latest version available for download which has the same version number 1.8.3 has this error corrected.

All resolved now.

Regards

RogerA


 * That's great! I'm very glad you were able to solve the problem, because I had no idea. And it's also good to hear that there's no bug in the current code. It's another reminder that a release of a new version is well-overdue - I hope to do that this week. Yaron Koren (talk) 15:51, 15 July 2018 (UTC)

SQLite no longer works
$wgSQLiteDataDir was removed from includes/db/DatabaseSqlite.php 4 years ago https://github.com/wikimedia/mediawiki/commit/46cf4f1e3010010e531340b99cbd59469529be0f and ExternalData Database::factory never sets a file path or directory in constructor instead.