Extension talk:External Data/Archive 2017 to 2018

Error: no local variable "X" was set
Hi Yaron, I' ve just upgraded to the latest ED 1.8.2 (8e1d91c) and now almost all my wiki pages are full of "no local variable set" errors. See it in a wiki of your own: http://discoursedb.org/wiki/Germany Note that I don't use local variables in bare form but rather enclosed in if's like this:  and I still get "Error: no local variable "X" was set". Am I missing something? --Ioannis Protonotarios 02:05, 2 January 2016 (UTC)


 * Thanks for pointing that out - the issue wasn't with External Data, but with the data URLs being used; it appears that semanticweb.org no longer supports querying. I updated the data URLs, and now everything seems to be working again. I don't know what the issue you're seeing is - are the data URLs accessible? Yaron Koren (talk) 16:05, 3 January 2016 (UTC)


 * Exactly! The issue appears when the data URLs are not accessible at all. In previous version of ED that was no problem because if there was no response from the URL then the variables would just have NULL values. So an  would be enough. But now, instead of a NULL value I get this error message for every nonexistent value and I don't know how to handle it. I have about 10,000 pages that used to work fine and now they look more or less like this! --Ioannis Protonotarios 00:02, 4 January 2016 (UTC)


 * Do you know what version of ED you were using before? I don't think there have been any recent changes related to this error message. In any case, you can turn it off by adding "$edgExternalValueVerbose = false;" to LocalSeettings.php. Yaron Koren (talk) 00:14, 4 January 2016 (UTC)


 * Yes, I was using ED 1.6.2 (1f1de17). The old site is still live so you can see the older version of the same page here.
 * OK! Problem solved! "$edgExternalValueVerbose = false;" did it! Now pages are back to normal! A million thanks for your support and for this great extension (and all of your extensions that I am heavily using for so long!) :) --Ioannis Protonotarios 00:27, 4 January 2016 (UTC)


 * Okay. That's great! Yaron Koren (talk) 01:54, 4 January 2016 (UTC)

Accessing complex objects
"Because MongoDB returns values in JSON that may be complex, and contain compound values, you can get data that is stored in such a way by separating field names with dots"

Is there any way (or any plans to support a way) of doing this for other complex values. For example, say I have this XML snippet:

 

Is there some way I could map all the attributes to dot-delineated field names? The XPath query I might use is. I'd really like to be able to access  by just referring to the external value   for example, instead of having to manually specify each attribute.

Lewiseason (talk) 21:03, 14 January 2016 (UTC)


 * It would be nice if that were possible, though the code would have to change. The XML-handling code in ED doesn't parse the XML code; it just looks for the fields that have been specified. So the XML-handling code would have to start parsing everything, and turning the whole file read into an object - which is possible, but there's no plan to do it. Yaron Koren (talk) 16:45, 15 January 2016 (UTC)

Include data in search results
Is there any way to make the parsed external data qualify in search results? I found this extension: https://www.mediawiki.org/wiki/Extension:Lucene-search but it seems difficult to install and the instructions are for Linux and we're running Windows. Perhaps there are other search extension alternatives?


 * If you want to search external data then you should store them locally first using either Semantic MediaWiki or Cargo. After that you can do all kinds of semantic search. That's the general idea more or less. --Ioannis Protonotarios 21:44, 21 January 2016 (UTC)


 * I don't understand how Semantic MediaWiki would be able to store the data locally but I have been playing with Cargo and I understand the gist of it. I have some dummy data in there but is there a way to do a semantic search or query on it automatically when the user uses the regular search box? I want the page with the external content to be searchable like any other normal page. --Patrick


 * The only problem I see here is the "regular search box". I don't think that regural search is customizable. You need to create an alternate search page. If you do that then you can do any kind of crazy search. But I don't think you can integrate semantic search into the regural MediaWiki search box. Unless you use some kind of search extension but we are passed that already. This is what I do. I can describe you my workflow but very roughly because semantic storing inside MediaWiki is a whole new world. There are several things you need to do to make it work. So the general idea (very very general) is that first you define a concept (e.g. books) which consists of a category, a template, a form, and several book properties. Inside the template is where the actual storing occurs. In there is where you put the external data function that takes data from an external source (such as a web api for example). So, instead of giving each book values manually, you retrieve them externally first, and then you assign them to properties (which is the semantic equivalent of variables), and all this happens inside the template. If you do that then each book page, besides the actual text content that may have, it also has some semantic values stored in (such as writer, year, etc). Once you've done that then you can query these values in many ways. One of those ways that concerns us here is to create a query form. This form would be the alternate search page I mentioned earlier! I hope you got the general idea. I tried in a few words to describe something that actually needs several pages to be fully understood. P.S. Semantic values cannot be seen inside page text because they are stored somewhere else inside a page, not in the content/text area. So the regular search cannot see them, as users cannot see them, as not even Google can see them. The only way to get those values is to query them. That's why regular search is useless. I hope this makes it more clear. --Ioannis Protonotarios 14:14, 26 January 2016 (UTC)


 * I think I may understand now, cargo takes the data and physically copies it into a new table where it can be queried. The semantic method is different because the external data isn't being directly stored/queried but instead feeding a system of tags and it is the tags that can be queried. Here is a super simple example, please tell me if this is accurate. I could make a "person" template and it reads in a list of users from a csv file. Here is the super simplified part: I could then write the semantic template to tag all rows and all columns with a property of "external data value". Then I would have a "People" page that uses the template and lists all the person info in a table. On the semantic search page if I searched for "John" with a property of "external data value" I would find the "People" page based on any field including first, middle, or last name... Right? I realize I am selling myself short if I tag everything with "external data value" but I'm just trying to wrap my head around it. In real life it would be properties like "name", "birthday", "address", etc. Can you semantic search *all* properties? --Patrick
 * Edit: I am not able to "load" the external data via Cargo. I think it's because I can't use the #cargo_store function inside the #for_external_table function. How do I get my data into the Cargo tables so I can query it? I plan on posting this on the Cargo talk page too. --Patrick


 * Yes, you can store each field with a different field/property. See here for how to store External Data's data via Cargo. Yaron Koren (talk) 18:23, 27 January 2016 (UTC)


 * Aha! I didn't understand how #display_external_table and the template parameter worked. For some reason I thought that example was showing you how to use Cargo to create a data source then feed it back to the external data extension. However, looking at that example again knowing it's what I wanted helped me get it to work. I created my table template and when I made a page with the CSV all the rows showed up! Very exciting stuff thank you so much for your help, I would never have put all these pieces together without your guidance. I think the hard part is over, now I need to query it. :) --Patrick


 * (I started writing this answer before Yaron replied so, since I wrote it, here you are, it may help you some more) Yes, it's something like that. Actually, Cargo and SMW are more or less the same. They both store to the db instead of page. Their difference is that a) Cargo works only with templates while SMW declarations can be used in any page, and b) SMW has more more basic queries while Cargo can create more complex queries like SQL (using an SQL-like syntax) (and Cargo is 2x faster). Besides that they do virtualy the same job. In your example, if you want to store persons then you will need a page for each person (and not one page for all. That would be the Category:Persons.) The page titles must be the person's names. Imagine that a page title is your most basic variable, the name. Each person page must contain the template call . This should be the minimum content. Now, inside the Template:Person you can either assign values manually to properties or get them from an external source, in your example from a csv file. You can do that using either Cargo or SMW, only the syntax will be different, the result will be the same. So, if you do it the manual way, then every template variable will be assigned to some property (and in that case, inside a persons page, e.g. Ioannis Protonotarios page will contain something like that:  ). If you do it with exteral data then inside the template you will use an external data parser function that will get the country of  (e.g. Ioannis Protonotarios) from the csv and assign it to the respective property. After you have created all persons pages then you can do any kind of queries you like. For example you can search for persons from Greece, aged from 20 to 40, with name starting with "P" (for this you may need to implement some string functions too, but you get the idea). --Ioannis Protonotarios 20:36, 27 January 2016 (UTC)


 * That makes sense, thank you both very much for the help. I am new to the MediaWiki world and I'm setting up a data dictionary for the Business Intelligence team. Just a few days ago these extensions (and templates in general) were very overwhelming but I'm starting to catch on and I think these tools will allow me to add functionality I had never even thought of. If I do manage to get my external data to show up in the regular search box perhaps I will write an extension and share with the community so that I can be as helpful as you all have been! --Patrick


 * Great! The only thing I would add to what Ioannis said is that Cargo is 1.5x faster than SMW (according to one set of tests), not 2x faster. Yaron Koren (talk) 01:34, 28 January 2016 (UTC)

get_db_data with special character in where clause not working
Hello, i use #get_db_data within an template with the pagename as where condition. On most of the pages this works fine, but on some pages with special characters there are no results.

This doesn't work on the Page Großbritannien. I think this is because of the Letter ß. Is there a debug-option to show me the sql-syntax generated by the extension? Any Ideas how to solve this?


 * There's no "debug" option, unfortunately, but you can do put in a typo in a field name or something, so you get an error message that includes the full SQL. That's what I tend to do. Yaron Koren (talk) 16:57, 22 February 2016 (UTC)


 * Your tip has helped narrow the error. The special character ß is transmitted inaccuracy as WHERE Country= 'Gro&Atilde;�britannien'. How can I fix this? TomyLee (talk) 07:17, 23 February 2016 (UTC)


 * So your guess was right. Switching from "PAGENAME" to "PAGENAMEE" might fix the problem, since the two are encoded differently. However, if you do switch, you'll probably have to add handling for spaces (for countries whose name contains a space), because PAGENAMEE turns those into "_". Calling REPLACE within the "where=" parameter might fix that problem. Yaron Koren (talk) 14:55, 23 February 2016 (UTC)


 * This don't help because the Name will be send as 'Gro%C3%9Fbritannien' to the sql-Server. I Test also without Pagename and write the country directly into the where-clause and it gives me the same wrong name. This hapens also with country with the german character ä ö ü like Ägypten (Egypt) TomyLee (talk) 16:00, 23 February 2016 (UTC)


 * I don't know, then... the only solution I can think of is to change the encoding of your 'page' table, from 'latin1' (which is what I assume it is now) to 'utf8' or something like that. It's probably a good idea to back up your database first, though, if you're going to do that. Yaron Koren (talk) 22:08, 23 February 2016 (UTC)


 * I Fix it! i go the long way the complete sourcecode starting from first php-call and insert a lot of prints ... into the code to see where the string goes craped. The resoulution is to insert the line  into the file ".\includes\db\DatabaseMssql.php" for the function "open" (for me it is on the line 110). TomyLee (talk) 07:27, 24 February 2016 (UTC)


 * Wow! That's great. I'm glad you were able to find a fix, based on very little information. I wonder if this is an MSSQL/SQLServer-only issue... Yaron Koren (talk) 14:59, 24 February 2016 (UTC)

Manage access to "general use" data
Hi. My name is Oscar and I'm building a wiki with a lot of information based on public data.

There some specific situations where you need "general use" data which can be properly stored and optimized to not duplicate data, for example. Just think of a list of cities, countries or states with the proper relationship, which can be used and maintained in order to save several thousands records, work, etc. I've installed the SemanticBundle package in order to manage this kind of data. So, I've set anonymous access to some data (which is perfectly legal, as indicated in http://dev.mysql.com/doc/refman/5.7/en/account-names.html) by specifying '' as the username part and no password (I've filtered the host part for the internal servers network). With these settings, I got the "incomplete" error for the ID. I've gone through the code and have commented the part in EDUtils where these were check, with this diff result:

187,188c187,188 <                      if ( $db_server ==  || $db_name ==  || <                               $db_username ==  || $db_password ==  ) { --- >                      if ( $db_server ==  || $db_name ==  ) { > //                           $db_username == ' ' || $db_password == '' ) { so I can access my data, and I don't think this is a risk (you can, of course, disagree). So, I've set these user and pass:

$edgDBUser['t'] = "''"; $edgDBPass['t'] = ""; and it works for me. Please consider the change.

Improve management and/or security
Just a contribution to help manage sources for ED. It's just to provide a specific separate file for extra private data, i.e. ExternalData_private.php. This is the diff for ExternalData.php:

61a62,66 > > // A place to include private data > if ( file_exists(dirname(__FILE__). '/ExternalData_private.php') ) { >      include_once dirname(__FILE__). '/ExternalData_private.php'; > }

As an example, this file could have "w" pointing at mediawiki db itself, together with (or replaced by) any DB to be used:

&lt;?php

$edgDBServer['w']    = $wgDBserver; $edgDBServerType['w'] = $wgDBtype; $edgDBName['w']      = $wgDBname; $edgDBUser['w']      = $wgDBuser; $edgDBPass['w']      = $wgDBpassword;
 * 1) external data configuration