Manual talk:Grabbers

From mediawiki.org
Latest comment: 3 years ago by Daniel K. Schneider in topic grabImage.php broken for MW 2021 and other suggestions

Funny, I found this page from [1] linked from [2]. --Nemo 11:56, 16 December 2013 (UTC)Reply

m:WikiTeam/Dumpgenerator rewrite[edit]

I wonder if these scripts cover all WikiTeam's needs for API-enabled wikis, making the need for a rewrite partially obsolete. --Nemo 12:00, 16 December 2013 (UTC)Reply

grabImage.php broken for MW 2021 and other suggestions[edit]

I don't know where to report this.

  • Lines 91 and 184 are broken for MediaWiki 1.35.1 (on April 2021)
uid156278@h2web220:~/web/w/grabbers$ php grabImages.php --url https://edutechwiki.unige.ch/fmediawiki/api.php --from "z.png"
The directory where images will be stored in is: /tmp/image-grabber/https://edutechwiki
Going to query the URL https://edutechwiki.unige.ch/fmediawiki/api.php?action=query&format=json&list=allimages&aiprop=url%7Csha1&ailimit=500&aifrom=z.png&*
In do loop (instance 0)...
in foreach (instance 0)... 
[455a5f2102d2022c4a7984ce] [no req]   TypeError from line 63 of /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/includes/http/Http.php: Argument 2 passed to Http::get() must be of the type array, int given, called in /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/grabbers/grabImages.php on line 184
Backtrace:
#0 /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/grabbers/grabImages.php(184): Http::get(string, integer)
#1 /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/grabbers/grabImages.php(123): GrabImages->saveFile(string)
#2 /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/maintenance/doMaintenance.php(107): GrabImages->execute()
#3 /home/clients/ece75502a0e1a000382f8192e2e08b2c/web/w/grabbers/grabImages.php(224): require_once(string)
#4 {main}

I substituted 'default' with the array [100,100] without really knowing what it does, the first is probably some timeout.

  • There should be a parameter to define the output directory, since tmp can fill quickly (e.g. happened to me with a shared host)

Workaround: replace the directory with a symbolic link

  • Under Unix, creating a directory with unescaped http://yourwiki will create an http: directory followed by yourwiki .....

- greetings ! DKS — Preceding unsigned comment added by Daniel K. Schneider (talk • contribs) Ciencia Al Poder (talk) 15:55, 11 April 2021 (UTC)Reply

This script hasn't been used for a very long time. I've fixed it in https://gerrit.wikimedia.org/r/c/mediawiki/tools/grabbers/+/678399/ (someone still needs to merge it into master so you can download it). You can download it from there in the meantime. Note that this script should be used only to download files but not for importing them. Use grabFiles.php if you want to download and import those files. --Ciencia Al Poder (talk) 15:55, 11 April 2021 (UTC)Reply
Thanx, I will try ! I actually do prefer grabimage over grabfile since I wanted to create a new wiki with only parts of the old one. I did not want to clone all the pages and users (there is privacy/consent issue with the latter). To move pages I used the regular export facility available in the special pages, but since it does not export pictures, grabimage saved a lot of my time :) -Daniel K. Schneider (talk) 15:43, 21 April 2021 (UTC)Reply