Extension talk:Data Transfer

From mediawiki.org
Jump to navigation Jump to search

Adding prefix to title?[edit]

I'm looking to import a CSV file with at least 1,000 entries in it, but I want them to go into a separate namespace and provide transclusion only. Is there any way to achieve this?

What do you mean by transclusion? Yaron Koren (talk) 14:35, 18 October 2019 (UTC)
Like this: Transclusion. I don't want the actual page searchable in the wiki, but I want the data available for other pages to use. So i'm creating a custom namespace for "Asset" where all the assets data will get dumped into, and stored into cargo tables and then the actual article pages will run a query to find their relevant assets data.
Oh, I thought the transclusion thing was related to the data transfer part. You just need to add the namespace to every title in the CSV file. There are various ways to do that - one is by editing the data in a spreadsheet, then saving it back to CSV. Within a spreadsheet, you can create a separate column with just the namespace (and colon), then merge that column and the title column into one. Yaron Koren (talk) 16:47, 18 October 2019 (UTC)
That's what I was afraid of, was hoping I could apply something at time of import. But I think I can get awk to do what I need it to do. Thanks!

error with MW 1.34[edit]

/Special:ImportCSV Error from line 54 of ...extensions/DataTransfer/specials/DT_ImportCSV.php: Cannot access private property ImportStreamSource::$mHandle

Backtrace:

#0 ...extensions/DataTransfer/specials/DT_ImportCSV.php(29): DTImportCSV->importFromUploadAndModifyPages()
#1 ...includes/specialpage/SpecialPage.php(575): DTImportCSV->execute()
#2 ...includes/specialpage/SpecialPageFactory.php(611): SpecialPage->run()
#3 ...includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath()
#4 ...includes/MediaWiki.php(900): MediaWiki->performRequest()
#5 ...includes/MediaWiki.php(527): MediaWiki->main()
#6 ...index.php(44): MediaWiki->run()
#7 {main}
Sorry about that - I just checked in what I think is a fix for this. Please let me know if there's still a problem! Yaron Koren (talk) 22:48, 7 January 2020 (UTC)
Thanks. I am now able to import the data, but there's an unrelated problem having to do with LF handling.Acnetj (talk) 00:59, 8 January 2020 (UTC)
What is the appropriate version to use for 1.34.x? REL1_34 results in v1.0.1 (1fc1c61) 04:42, 20 September 2019 - Revansx (talk) 23:41, 5 May 2020 (UTC)
You should never use the REL version of any of my extensions. You should either use the most recent version, or just the latest code. Yaron Koren (talk) 23:44, 5 May 2020 (UTC)
roger that .. now the trick is to remember to always check to see if an extension is one of yours. thx - Revansx (talk) 00:24, 6 May 2020 (UTC)
It's not just my extensions; it's any extension that has the "master" compatibility policy. Yaron Koren (talk) 02:18, 6 May 2020 (UTC)
Gotcha. I just learned something new. cool. - Revansx (talk) 02:59, 6 May 2020 (UTC)

LF handling[edit]

With the older version. a CSV field can include a new line (LF) and it is not parsed as a separate entry (it is to include free text for the wiki which has multiple lines. The current master (fixing the error above) however handles LF as if it is CRLF.

I changed this on DT_ImportCSV.php and the handling is correct:

line 133: 		$table = str_getcsv( $csvString, "\n" );

line 133: 		$table = str_getcsv( $csvString, "\r\n" );

But it ignored extra LF on the field. Acnetj (talk) 01:31, 8 January 2020 (UTC)

This fix works for me in this particular instance (because I use control-enter for extra line in LibreOffice). I don't know for other instances. I think it should somehow respect the double quotation marks like for the commas.

Acnetj (talk) 01:52, 8 January 2020 (UTC)

Sorry about that! What a strange bug in PHP. I just checked in code that I think works better. Yaron Koren (talk) 19:35, 8 January 2020 (UTC)
I am getting
Unable to construct a valid title from "".

with the latest update. Acnetj (talk) 02:13, 10 January 2020 (UTC)

What encoding is the file in, do you know? Yaron Koren (talk) 04:12, 10 January 2020 (UTC)
UTF-8. Just a simple file with one line of data plus header. Acnetj (talk) 19:18, 10 January 2020 (UTC)
I can't reproduce that problem. Was this exact file working before? Yaron Koren (talk) 20:21, 10 January 2020 (UTC)
Sorry I just found it was a problem on my end. Just bad data. Things are working as it should for now.Acnetj (talk) 21:40, 10 January 2020 (UTC)
Great, that's a relief! Yaron Koren (talk) 21:43, 10 January 2020 (UTC)

Problem with overwriting fields of the template[edit]

MW 1.31, using UTF-16 LE with signature CSV Import

I try to update some content for a specific template.

For example, existing content for page "Acilius":

{{WoodhouseENELnames
 |Text=[[File:woodhouse_999.jpg|thumb
 |link={{filepath:woodhouse_999.jpg}}]]Ἀκύλιος, ὁ.
 }}

Or it could look like this:

{{WoodhouseENELnames
 |Text=[[File:woodhouse_999.jpg|thumb|link={{filepath:woodhouse_999.jpg}}]]Ἀκύλιος, ὁ.
 }}

To update I use this file content (using option "Overwrite only fields contained in the file", other templates also existed in page):

 Title,WoodhouseENELnames[Text]
 "Acilius","[[Ἀκύλιος]], ὁ."

Result:

{{WoodhouseENELnames
 |Text=[[Ἀκύλιος]], ὁ.|thumb
 |link=&# 123;&# 123;filepath:woodhouse_999.jpg&# 125;&# 125;]]Ἀκύλιος, ὁ.
  }}

(I deliberately added space in the middle of each entity so that it will not get parsed here) So, for some strange reason the thumbnail, link and old text are maintained and some text is corrupted and/or turned into html entities. When I select "Overwrite existing content" or "Append to existing content" no such problems exist, but I cannot do that as other templates exist in those pages.

I suspect that the "|link=" bit is parsed as an extra field, when in fact it isn't.

I even tried removing first this bit from the content:

[[File:woodhouse_999.jpg|thumb
 |link={{filepath:woodhouse_999.jpg}}]]

And then trying the import. And then I got this error:

Error: the column 0 header, 'ÿþTitle', must be either 'Title', 'Free Text' or of the form 'template_name[field_name]'

Edit: I have managed to partly resolve the import into field corruption by commenting out some regexes in: DataTransfer\includes\DT_PageStructure.php

// $page_contents = preg_replace( '/{{(#.+)}}/', '&# 123;&# 123;$1&# 125;&# 125;', $page_contents );
// escape out transclusions, and calls like "DEFAULTSORT"
// $page_contents = preg_replace( '/{{(.*:.+)}}/', '&# 123;&# 123;$1&# 125;&# 125;', $page_contents );

But still, I cannot find how to stop breaking the line with the image link from

[[File:woodhouse_999.jpg|thumb|link={{filepath:woodhouse_999.jpg}}]]

To

[[File:woodhouse_999.jpg|thumb
 |link={{filepath:woodhouse_999.jpg}}]]

Or when the existing text contains instances like

[[comparing|compare]] 

They get turned into

[[comparing
 |compare]] 

or even mixed with other words.

For example this text:

[[leave in the lurch]]: [[prose|P.]] and [[verse|V.]] [[λείπω]], [[λείπειν]] 

Looked like this when imported:

 [[leave in the lurch]]: [[prose
 |προλείπειν]], ἀμείβειν (Plat. but rare P.), V. ἐξαμείβειν, ἐκλιμπάνειν.

Also, when importing text that has headers (many equals signs, as for example a 3rd level header), line breaks are lost and the header joins the text on top and text gets mangled. For example, when importing something like:

 [[of what kind]]? [[prose|P.]] and [[verse|V.]] [[ποῖος]]; indirect: [[prose|P.]] and [[verse|V.]] [[οἷος]], [[ὁποῖος]]. 
  ===adjective===
 
  [[prose|P.]] and [[verse|V.]] [[πρᾶος]], [[ἤπιος]]

The result is:

 of what kind? P. and V. ποῖος; indirect: P. and [[verse |V.]] οἷος, ὁποῖος.===adjective===

So I am guessing that there is some pre-processing going on before the import which is affected by what the currently existing text looks like, which is quite strange.


Im having similar problems. MW 1.33.1 / Data Transfer 1.1.1. Using ImportCSV with the option Overwrite only fields contained in the file, the output has brackets like { in my existing content replaced by &# 123; etc. Commenting out the relevant lines in DataTransfer\includes\DT_PageStructure.php introduces new lines and hence white spaces. Am I using the feature incorrectly? Is there any workaround? Any comments? --Fehlerx (talk) 17:29, 30 May 2020 (UTC)
In the file DT_WikiTemplate.php line 33
                //if ( $multi_line_template )
                //      $text .= "\n";

Comment out those two lines, and it should take care of the unwanted line breaks. I had the same issue, and it was REALLY annoying. 05:42, 5 October 2020 (UTC)

I believe this problem has now finally been fixed in the Data Transfer code. Sorry for the very long delay. Yaron Koren (talk) 17:22, 18 February 2021 (UTC)

Internal error on Special:ImportSpreadsheet in MW 1.34.x[edit]

  • MediaWiki 1.34.1 (b1f6480) 18:15, 30 April 20
  • PHP 7.2.30 (apache2handler)
  • Data Transfer 1.1.1 (1fc1c61) 04:42, 20 September 2019
  • phpoffice/phpexcel dev-master
[XrIAjzTNAOBUFE21T0p@MQAAAAo] /test/Special:ImportSpreadsheet PHPExcel_Reader_Exception from line 73 of /opt/htdocs/mediawiki/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php: Could not open for reading! File does not exist.

Backtrace:

#0 /opt/htdocs/mediawiki/vendor/phpoffice/phpexcel/Classes/PHPExcel/IOFactory.php(281): PHPExcel_Reader_Excel2007->canRead(NULL)
#1 /opt/htdocs/mediawiki/vendor/phpoffice/phpexcel/Classes/PHPExcel/IOFactory.php(191): PHPExcel_IOFactory::createReaderForFile(NULL)
#2 /opt/htdocs/mediawiki/extensions/DataTransfer/specials/DT_ImportSpreadsheet.php(42): PHPExcel_IOFactory::load(NULL)
#3 /opt/htdocs/mediawiki/extensions/DataTransfer/specials/DT_ImportCSV.php(60): DTImportSpreadsheet->importFromFile(ImportStreamSource, NULL, array)
#4 /opt/htdocs/mediawiki/extensions/DataTransfer/specials/DT_ImportCSV.php(29): DTImportCSV->importFromUploadAndModifyPages()
#5 /opt/htdocs/mediawiki/includes/specialpage/SpecialPage.php(575): DTImportCSV->execute(NULL)
#6 /opt/htdocs/mediawiki/includes/specialpage/SpecialPageFactory.php(611): SpecialPage->run(NULL)
#7 /opt/htdocs/mediawiki/includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#8 /opt/htdocs/mediawiki/includes/MediaWiki.php(900): MediaWiki->performRequest()
#9 /opt/htdocs/mediawiki/includes/MediaWiki.php(527): MediaWiki->main()
#10 /opt/htdocs/mediawiki/index.php(44): MediaWiki->run()
#11 {main}

- Revansx (talk) 00:22, 6 May 2020 (UTC)

Here's the pertinent lines from the debug data:

Unstubbing $wgLang on call of $wgLang::_unstub from ParserOptions->__construct

[error] [XrIKvf9myczl4IVZkwmjCgAAAAs] /test/Special:ImportSpreadsheet ErrorException from line 39 of /opt/htdocs/mediawiki/extensions/DataTransfer/specials/DT_ImportSpreadsheet.php: PHP Warning: stream_get_meta_data() expects parameter 1 to be resource, object given

[exception] [XrIKvf9myczl4IVZkwmjCgAAAAs] /test/Special:ImportSpreadsheet PHPExcel_Reader_Exception from line 73 of /opt/htdocs/mediawiki/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php: Could not open for reading! File does not exist.

- Revansx (talk) 01:01, 6 May 2020 (UTC)

Any plans to switch to phpspreadsheet? (versus phpexcel)[edit]

  • MediaWiki 1.34.1 (b1f6480) 18:15, 30 April 20
  • PHP 7.2.30 (apache2handler)
  • Data Transfer 1.1.1 (1fc1c61) 04:42, 20 September 2019
  • phpoffice/phpexcel dev-master

I upgraded phpexcel to phpspreadsheet and Data Transfer said no and that phpexcel is required, however all the documentation for phpexcel says that it is obsolete. Is there any talk of getting Data Transfer to use phpspreadsheet soon? - Revansx (talk) 00:22, 6 May 2020 (UTC)

The code has been updated 5 months ago to include phpspreadsheet. It checks if one of them( phpspreadsheet or phpexcel ) have been installed. Sen-Sai (talk) 12:50, 12 November 2020 (UTC)

Problems with importing code with {{[edit]

I wanted to import a bunch of articles that already had the template code in them Template:My templates. I do not want to have to separate them all out into separate items, because I want the template results to be in a list. But when I run "Import XML", I have 6000 pages and it says that I only have 11 pages. So obviously something is wrong. I wish there was a way to simply say to do a direct import without messing with any template stuff.

I finally figured out to properly setup an XML file, so it imports properly.
<Pages>
  <Page Title="My mom">
    <Free_Text>
* {{he-l|שלום|t=Hello World}}

    </Free_Text>
  </Page>
</Pages>