Extension talk:External Data

From MediaWiki.org
Jump to navigation Jump to search

Is the usage of SQL group by possible?[edit]

  1. Is the usage of SQL "group by" possible?
  2. if yes how?
  3. if no as it was pointed out in Extension_talk:External_Data/Archive_2013#Why_get_db_data_does_not_contain_GROUP_BY.3F where would a CR be posted?

I would consider "group by" extremly useful since it would be possible to create statistics for database content really easily. --WolfgangFahl (talk) 12:11, 10 January 2017 (UTC)

No, it's still not possible, though I can see how it would be useful. What's a CR? Yaron Koren (talk) 14:35, 10 January 2017 (UTC)

Yaron - thank you for getting back to this - a CR ist a change request. I had used group by successfully with my Mediawiki 1.23 wiki and and old version of SMW and the External Data extension 1.8.2. The markup i used was:

{{#get_db_data:
db=vz
|from=(select channel_id,timestamp,round(value)/1000 as wert from data order by timestamp desc limit 200) as d left join entities e on d.channel_id = e.id left join properties p on e.id=p.entity_id and e.type='electric meter' and p.pkey='title' 
|where=d.channel_id in ('1','2','5','7') 
|limit=100
|order by=channel_id
|group by=channel_id
|data=datum=DATE_FORMAT(from_unixtime(timestamp/1000), '%Y-%m-%d %T'),kanal=d.channel_id,wert=max(wert),titel=p.value
}}
{| class="wikitable sortable"
! Kanal
! Datum
! Titel
! Wert{{#for_external_table:<nowiki/>
{{!}}-
{{!}} {{{kanal}}} 
{{!}} {{{datum}}} 
{{!}} {{{titel}}} 
{{!}}  style="text-align:right;font-family:monospace;font-size:125%;" {{!}} {{{wert}}} 
}}
|}

unfortunately this now gives me a regression with invalid SQL.

SELECT DATE_FORMAT(from_unixtime(timestamp/1000), '%Y-%m-%d %T'),d.channel_id,max(wert),p.value FROM `(select channel_id`,`timestamp`,round(value)/1000 as wert from data order by timestamp desc limit 200) as d left join entities e on d.channel_id `e.id left join properties p on e.id=p.entity_id and e.type='electric meter' and p.pkey='title'` WHERE d.channel_id in ('1','2','5','7') GROUP BY channel_id ORDER BY channel_id LIMIT 100 

There seem two be two issues now:

  • the usage of backticks at places where they shouldn't be
  • the stricter interpreteation of the group by clause by mySQL

Going back to version 1.8.2 fixes the problem :-( --Seppl2013 (talk) 15:41, 24 September 2017 (UTC)

Version 1.8.3 completely breaks MySQL JOIN queries. Moved back to version 1.8.2 for now. Hope this gets fixed soon. --Wikimantis (talk) 01:12, 25 May 2018 (UTC)
Are you using "group by", or are you talking about something else? Yaron Koren (talk) 02:39, 25 May 2018 (UTC)

How to pass header value to request?[edit]

With some API we need to pass in the headers some value (for example: https://docs.gitlab.com/ee/api/README.html#authentication) I tried

url=--header "PRIVATE-TOKEN: xxxxxxxxxxxxxx" https://myserver/api/v1/list/all
or 
url=https://myserver/api/v1/list/all,array('PRIVATE-TOKEN'=>'xxxxxxxxxxxxxxxx')

But not chance. How can we achieve this? if this is possible, can we have an example of how?

The #get_web_data function only works with RESTful APIs; #get_soap_data might work in this case. Yaron Koren (talk) 15:52, 17 February 2017 (UTC)

Dash character in data[edit]

Hello

We found when displaying csv file data with 
a) cells that contain just a dash character, or
b) cells that contain text and start with a dash character
that any attempt to render this data into a table will cause rows to break at that cell.
This is presumably because the parser inserts | for a column break and then the - from the string, resulting in |- which is a special string which means 'new row'.

To resolve this I edited the function GetCSVData in the file ED_Utils.php.
In the foreach loop just before the return statement, I added a check for the dash character which replaces it with a dash wrapped in <nowiki> tag. This resolves the issue.
Note that this issue likely still exists when other parsers like get_db_data are used, since I only resolved it in the GetCSVData function.

Note that I have had to replace the word nowiki with ''n0wiki'' below, because it was impossible to display this code since it was closing nowiki brackets.

<syntaxhighlight lang="php">
		// Now "flip" the data, turning it into a column-by-column
		// array, instead of row-by-row.
		$values = array();
		foreach ( $table as $line ) {
			for ( $i = 0; $i < $num_columns; $i++ ) {
				// This check is needed in case it's an
				// uneven CSV file (see above).
				if ( array_key_exists( $i, $line ) ) {
					$row_val = trim( $line[$i] );
				} else {
					$row_val = '';
				}
								
				//NEW - check for dash values
				if ($row_val == '-')
				{
					$row_val = '<n0wiki>-</n0wiki>';
				}
				else if (substr($row_val,0,1) == '-')
				{
					$row_val = '<n0wiki>-</n0wiki>' . substr($row_val,1);
				}
				//NEW - end check for dash values
				
				if ( $has_header ) {
					$column = strtolower( trim( $header_vals[$i] ) );
				} else {
					// start with an index of 1 instead of 0
					$column = $i + 1;
				}
				if ( array_key_exists( $column, $values ) ) {
					$values[$column][] = $row_val;
				} else {
					$values[$column] = array( $row_val );
				}
			}
		}
		return $values;
	}

</syntaxhighlight>
Great, thanks for this patch! Is that first "if" check necessary? Maybe everything can be handled with just the "if substr()" call? Yaron Koren (talk) 12:19, 14 April 2017 (UTC)

PHP Notice in error logs[edit]

Hi I can see theses messages in the apache server error log

PHP Notice:  Second parameter should not be a timeout. [Called from EDUtils::fetchURL in /opt/mediawiki/extensions/ExternalData/ED_Utils.php at line 882]

ExternalData 1.8.3 Mediawiki 1.27.3

Cheers

Ubibene

Daisy Chaining[edit]

Hey, is there a way to daisy-chain queries? As far as i can see one can only enter one URL to get data from. My usecase: get some data, create a table with it and then use coordinates from the first query in a secondary query for reverse geocoding (for each individual entry). Is this possible somehow or is further explainantion required?

If I understand the question correctly, you can just cycle through the data twice, by calling #for_external_table twice - the data won't get cleared after the first call. Or am I misunderstanding something? Yaron Koren (talk) 03:20, 31 May 2017 (UTC)
In my specific usecase i'm building a table from an overpass query which returns xml. I then want to geocode every coordinate of that table via nominatim (returns xml too) to a specific country.

Error: Did not find alias for special page 'GetData'.[edit]

Hi, i use Version 1.8.3 (6705c56) from 2017-07-17T22:03:53. This Error is already marked as fixed, but i still get these message. see commit on Phabricator/"Add file for special page aliases". My wiki installation is Version 1.29.0 in german language

--TomyLee (talk) 09:28, 28 July 2017 (UTC)

Yes, an "alias" file was created, but no aliases have been defined yet. Where are you seeing that error? I didn't know that languages for which an alias has not been defined produced an error message. Yaron Koren (talk) 15:21, 28 July 2017 (UTC)
I see it only on the page "Special pages", theis message is printet 2 times. In LocalSettings i have set "$wgDevelopmentWarnings = true" for my upgrade process. TomyLee (talk) 06:10, 31 July 2017 (UTC)

detect_encoding() expects parameter 1 to be string (solved)[edit]

Hi, if i execute runjobs.php i get lots of notices for this extension. It says:

PHP Warning: mb_detect_encoding() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 520

PHP Warning: utf8_encode() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 523

--TomyLee (talk) 09:55, 28 July 2017 (UTC)

That's very unexpected. If possible, could you add a line above that line that looks like "if (is_object( $dbField ) ) print_r( $dbField );", and let me know what it prints out? Yaron Koren (talk) 15:18, 28 July 2017 (UTC)
i Insert your code and also a "Debug" message that prints without condition. Here is one example output:
2017-07-31 06:19:06 SMW\UpdateJob PAGENAME pm= requestId=c49f317955c8be4610adbbcc (id=262115,timestamp=20170728095100) STARTING
Notice: Undefined index: port in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 65
Notice: Undefined index: UseWindowsAuth in ..\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG 
DateTime Object
(
    [date] => 2011-10-13 00:00:00.000000
    [timezone_type] => 3
    [timezone] => Europe/Berlin
)
Warning: mb_detect_encoding() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 524
Warning: utf8_encode() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 527
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG
Notice: Undefined index: port in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 65
Notice: Undefined index: UseWindowsAuth in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG

The error comes from a Query to Microsoft SQL Server with column datatype "date" with value "2011-10-13" --TomyLee (talk) 06:52, 31 July 2017 (UTC)

I have solved this error by insert a new check bevor the convert line. my ED_Utils.php is now:
$dbField = $row[$i];
//NEW CODE: check for DateTime
if (is_a( $dbField, 'DateTime')) { 
  $dbField = $dbField->format('Y-m-d H:i:s'); :::
} 
//END OF NEW CODE
if ( !function_exists( 'mb_detect_encoding' ) || 
  mb_detect_encoding( $dbField, 'UTF-8', true ) == 'UTF-8' ) {
  $new_row[$column_name] = $dbField;
} else {
  $new_row[$column_name] = utf8_encode( $dbField );
}
Great find! Sorry, I think I missed your previous message here. I just checked in a patch based on this code - hopefully this problem is now solved. Yaron Koren (talk) 15:36, 11 August 2017 (UTC)

Undefined index: UseWindowsAuth and port (solved)[edit]

as a third error message if i execute runphp.php i get the notice:

PHP Notice:  Undefined index: port in ...\includes\libs\rdbms\database\DatabaseMssql.php on line 65
PHP Notice:  Undefined index: UseWindowsAuth in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66

i know, this is not part of ExternalData but i use Microsoft Server only with External Data.

Maybe you need thes two new(?) parameters for mssql. In "MWLBFactory.php" i found these part of code:

80 elseif ( $server['type'] === 'mssql' ) {
81    $server += [
82         'port' => $mainConfig->get( 'DBport' ),
83         'useWindowsAuth' => $mainConfig->get( 'DBWindowsAuthentication' )
84                ];
This sounds like something recently added to MediaWiki. What's MWLBFactory.php? Maybe I can copy the approach from there. Yaron Koren (talk) 15:22, 28 July 2017 (UTC)
I don't know the file. i search for these variables to see if there is maybe a new config variable for my LocalSettings i missed. But the only appereance i found was in classMWLBFactory
Did you set $wgDBWindowsAuthentication and $wgDBport? --wargo (talk) 09:05, 31 July 2017 (UTC)
Yes --TomyLee (talk) 09:50, 31 July 2017 (UTC)
I have solved the problem for me. I had to added little bit of code. It seems that the parameters not used by MediaWiki, maybe in a later release? But the warning messages never comes up again:
You have to add these two new Parameters for the connection definition. To get this: In Localsettings add these new variables (DbName ist the Name/Id of your connection):
$edgDBPort['DbName'] = "80";
$edgDBUseWindowsAuth['DbName'] = false;
in ExternalData.php at Line 60 add these two lines:
$edgDBPort = array();
$edgDBUseWindowsAuth = array();
in ED_Utils.php the static function getDBData has to be modified at 3 areas:
Line 180 add
global $edgDBUseWindowsAuth;
global $edgDBPort;
Line 193 add
$db_port = self::getArrayValue( $edgDBPort, $dbID);
$db_useWindowsAuth = self::getArrayValue( $edgDBUseWindowsAuth, $dbID);
Line 244,260 and 273 add to the array of parameters
'port' => $db_port,
'UseWindowsAuth' => $db_useWindowsAuth
--TomyLee (talk) 13:03, 11 August 2017 (UTC)

can't get to xml field from api query call[edit]

A query i added to a template looks like this:

{{#get_web_data: url={{SERVER}}{{SCRIPTPATH}}/api.php?action=query&list=categorymembers&cmtitle=Category:{{{OPfamily|}}}&format=xml
  | format=XML
  | data=title=title}}
{{#display_external_table:template=catList
  |data=title=title}}

on the final page though this returns the error: Undefined Index title

running the query in the sandbox returns this xml though with the fields title.

<api batchcomplete="">
<continue cmcontinue="page|53454c454354204d4154|30" continue="-||"/>
<query>
<categorymembers>
<cm pageid="21" ns="0" title="Constant MAT"/>
<cm pageid="47" ns="0" title="ConstantMAT Class"/>
<cm pageid="22" ns="0" title="Depth MAT"/>
<cm pageid="23" ns="0" title="GLSL MAT"/>
<cm pageid="24" ns="0" title="In MAT"/>
<cm pageid="25" ns="0" title="Null MAT"/>
<cm pageid="26" ns="0" title="Out MAT"/>
<cm pageid="27" ns="0" title="PBR MAT"/>
<cm pageid="28" ns="0" title="Phong MAT"/>
<cm pageid="29" ns="0" title="Pointsprite MAT"/>
</categorymembers>
</query>
</api>

What am I missing?

Thanks Markus

That's strange - I was able to get essentially the same thing working, though with a hardcoded category name. Does this happen with all pages that call this template? Could you try calling it with a category whose name doesn't contain spaces or non-ASCII letters? Yaron Koren (talk) 01:12, 1 August 2017 (UTC)
Thanks Yaron! I should have tried this... Using a hardcoded Category name makes it work.

Addressing an Array[edit]

Is it possible to pull information out of an array? I need to pull information out of the statuses array.

[
 {
  "id": "id",
  "serviceName": "name",
  "statuses": [
    {
    "id": 0,
    "name": "network",
    "severity": 5,
    "severityDescription": "Severity Description"
    },
    {
     "id": 1,
     "name": "network",
     "severity": 2,
     "severityDescription": "Severity Description"
   }
  ],
 }
]

Is is something like?

{{#get_web_data:url=https://api.com|format=JSON|data=name=serviceName,status=statuses[0].name}}

I understand that this isn't in any way scalable, but the vast majority of the time the array does not contain more than one member, and at this stage I am just trying to learn about ExternalData's capabilities.

Unfortunately, no - if this were stored in XML, you could do it with the "use xpath" parameter, but there's no such parameter for JSON. (There is a JSONpath technology, but it's poorly supported, from what I understand, and it's not supported by External Data.) Yaron Koren (talk) 01:10, 4 September 2017 (UTC)
Thanks, good to know

JSON GET header[edit]

Is it possible to send a JSON GET header with ExternalData?

You mean, have an API query string composed of JSON? I don't see why not... Yaron Koren (talk) 17:12, 13 September 2017 (UTC)
Sorry, I am rather new to JSON. The GET query I have in Postman requires the header section with a key value pair. I am not sure how define a header using ExternalData. Does that make sense?
Can you define this query as a single URL, with all the JSON in the query string? If so, you can just pass that in to #get_web_data. Yaron Koren (talk) 13:33, 14 September 2017 (UTC)

Accessing source files[edit]

Often enough I cannot access the XML directly but need to open the source file through the browser. In browsers such as Chrome and Firefox, the URL is typically prefixed by "view-source:". Is there any way we can make External Data query the source code? I did some test runs with the prefix but all I got so far is "Error: No contents found at URL view-source:http:..". Cavila 07:25, 5 November 2017 (UTC)

I don't understand - when you do a "view source", you're getting the same content, just displayed in a different way. How would that apply with External Data? Yaron Koren (talk) 01:17, 6 November 2017 (UTC)
Not necessarily. Just because ".xml" is in the URL doesn't necessarily mean that the web page gives you raw, queryable XML. Or you'll get a parse error because any XSL referred to is missing. Cavila 18:57, 6 November 2017 (UTC)
Alright, but, as far as I know, you'll always get the same content when using "view source". Or do you know of a case where doing "view source" gives true XML, but External Data's retrieval does not? Yaron Koren (talk) 20:09, 6 November 2017 (UTC)
Sure, this comes to mind. Cavila 20:33, 6 November 2017 (UTC)

That's a custom browser display, but #get_web_data seems to retrieve the right thing there - the following call works for me:

{{#get_web_data:url=http://tarltonproject.org/TarltonJest_TEI1/content/Tarlton-Cittie_basic.xml|format=xml|data=title=title}}
{{#external_value:title}}

Yaron Koren (talk) 14:41, 7 November 2017 (UTC)

#for_external_table requires "nowiki"[edit]

My 1.30 Mediawiki didn't recognize the example provided for #for_external_table. The result was one "-" per each line of the file.

The solution was to change:

! Area {{#for_external_table:

for:

! Area {{#for_external_table:<nowiki/>

And then it worked. Spent some hours debugging it. Roger Krolow (talk) 14:37, 29 December 2017 (UTC)

Thanks for pointing that out, and sorry for the long delay - and the initial problem. I just fixed this in the documentation. Yaron Koren (talk) 03:02, 18 January 2018 (UTC)

"edit source" link stops appearing from here forward[edit]

Don't know why. Flow bug? Maybe because topic above uses:

nowiki><nowiki/></nowiki
Yeah... I just fixed it. Yaron Koren (talk) 12:33, 23 May 2018 (UTC)

Variable MySQL table search[edit]

Hello. i'm using mediawiki 1.30.0. Have a test mysql database set on oracle virtualbox ubuntu 16.04. I'm creating a documentation system for a company that would fetch data on a template from their own mysql database by name (or number) of client company (this also would be the pagename). Idea is to create a template that fetches data from the database by the pagename (as the client amount might increase, automation is the goal) The code below does not work but to get the idea...

{{#get_db_data:
|db=test
|from={{BASEPAGENAME}}
|data=Id=id,Company=company,Device=device,Ip4_address=ip4_adress,User=user,Password=password,Install_date=install_date
}}

Problem is that mysql can't have variables as a search value (and I think the way commands are processed the BASEPAGENAME template wont even open inside the #get_db_data) So i'm forking the Extension/Externaldata/ED_parserfunctions.php with something like this...

if ( array_key_exists( 'from', $args ) ) {
			if ( $args['from'] == 'word' ) {
				$args['from'] = -something?-
				$from = $args['from'];
			}
			else {
			$from = $args['from'];
			}

Any ideas, solutions, code locations to fork more, are all welcome!

Actually, "BASEPAGENAME" in the above example will get parsed correctly - I just tried it myself to be sure. And more generally, nested parser functions, templates, variables etc. always get parsed from the innermost one out, as far as I know. Could it be that there's just some issue with your #get_db_data call? Yaron Koren (talk) 12:27, 18 April 2018 (UTC)
It worked! i went back to trying the "BASEPAGENAME" template call and found out that i had a problem with mysql tables being written in all lower case letters. As the page comes with the first letter being capital, i just had to redo my tables for this to work. Thanks for the explanation, it saved me a ton of time ^^ best regards:OP, to be :nappula3 8:31 19 April 2018 (UTC)

Form to Create Record?[edit]

Is there an extension or native way to create/edit a record in a db using a form?

This doesn't sound like an External Data extension question, but the Cargo and Semantic MediaWiki extensions both let you do that. Yaron Koren (talk) 13:31, 11 May 2018 (UTC)
Thx!

Johnywhy (talk) 14:42, 23 May 2018 (UTC)

Use Logged-In User Credentials?[edit]

When i execute the following, data returned looks like this extension is NOT running with logged-in user credentials:

{{#get_web_data: 
url={{SERVER}}/api.php?action=query&meta=tokens&format=json
|format=JSON
|data=csrftoken=csrftoken
}}

That url gives, for example: {"batchcomplete":"","query":{"tokens":{"csrftoken":"992b1efd8912345dcec879106eb64b615b054636+\\"}}}

But

{{#external_value:csrftoken}}

returns +\, which is the token for non-logged-in users.

"For unregistered users, the token is always +\" https://www.mediawiki.org/wiki/API:Edit#Unregistered_users

$edgAllowSSL = false; had no effect.

Is there something wrong with my syntax?

Do i need to use https://www.mediawiki.org/wiki/Extension:NetworkAuth?

Johnywhy (talk) 10:48, 23 May 2018 (UTC)

I don't fully know the mechanics of tokens, but I would think it's a very good thing that External Data doesn't let you use your own MediaWiki token - otherwise, a malicious user could potentially put calls in some page to get admins who visit that page to unwittingly do all sorts of bad things. Yaron Koren (talk) 13:02, 23 May 2018 (UTC)
Only if the malicious user has access to the page containing the call. But, that can't happen if the page is protected. Seems this is not the only potential security risk that is completely under control of the admins. An admin could also post their password on a public page, and then a malicious user could gain admin access-- admins ought know not to do that. To troubleshoot this issue, can you plz comment whether my syntax above is correct? -thx Johnywhy (talk) 14:48, 23 May 2018 (UTC)
Okay, but what if the malicious user puts the call on an unprotected page? Or on every unprotected page? Yaron Koren (talk) 15:09, 23 May 2018 (UTC)
Plz see title of this topic: "Logged-In User Credentials"
Why aren't i receiving alerts about your comments here?

If extension respects logged in user credentials (as it should), then there's no risk. Johnywhy (talk) 11:36, 25 May 2018 (UTC)

On the contrary, I see tremendous risk: (1) malicious user goes to page "ABC", puts in a call to #get_web_data that calls some harmful MediaWiki API action which uses the logged-in user's credentials; (2) user with admin rights goes to page "ABC", the page calls that API URL and is successful because that user has the correct privileges. Do you see what I'm saying? Yaron Koren (talk) 13:33, 25 May 2018 (UTC)
Ok, i can see your point there. So, does that mean that all actions performed by this extension are done with anonymous user-rights? Thx
I suppose you could describe it that way, yes. Yaron Koren (talk) 23:47, 25 May 2018 (UTC)

Request: Adding Records[edit]

Awesome extension! Would also be awesome if the extension could add/edit records. Maybe could use built-in MW edit-token, from the API.

I'm not sure what this means, but External Data can be used in conjunction with either Cargo or Semantic MediaWiki to store the information it retrieves. Yaron Koren (talk) 18:33, 27 May 2018 (UTC)
Thx for reply. It's my understanding that Cargo can only do a store-once action for a page-- it can't add more records without overwriting the existing records for that page. https://www.mediawiki.org/wiki/Extension_talk:Cargo#How_to_Remove_Built-In_Fields?
Johnywhy (talk) 00:34, 2 June 2018 (UTC)
That's right. Yaron Koren (talk) 16:41, 3 June 2018 (UTC)