Exporting all the files of a wiki
From MediaWiki.org
Exporting all the files of a wiki can be done in a few different ways. If you have FTP access to the wiki, then you can move the files by following the procedure at Manual:Moving a wiki. If you lack such access, as can happen for instance if a wiki is abandoned by its site owner, then you will probably need to use workarounds. This procedure can semi-automate the task of downloading all the files, but you will still have to figure out a way to upload them to your wiki.
Contents |
[edit] Step 1
- Follow the procedure at m:Help:Export#1._Get_the_names_of_pages_to_export to use a Python script to get the names of all the files on the wiki. When you go to Special:AllPages, you will be selecting the File namespace.
- Select the names that the Python script spits out, and copy and paste them into Column A of your favorite spreadsheet application (e.g. Microsoft Excel, or OpenOffice.org Spreadsheet if you need free software). You should now have a bunch of cells that say, e.g., File:Ayn Rand.png
- In Cell B1, put this formula:
="*[["&A1&"]]" - Copy that formula and paste it into the rest of Column B. (Make sure you don't try to paste cell B1 onto itself, or you'll get an error like "You are pasting data into cells that already contain data") Each cell in Column B should now say something like *[[File:Ayn Rand.png]]
- Go to the wiki you got the filenames from and create a new page, e.g. User:JoeSchmoe/All files. Copy and paste column B into that page, and save.
- The page will now load; it may take awhile since you are loading everything. You should see a listing that looks like this:
- ... (etc.)...
[edit] Step 2
- Now use a perl program to generate a script to give you the urls:
-
use strict;
-
use warnings;
-
use LWP::Simple;
-
use LWP::UserAgent;
-
use HTTP::Request;
-
use HTTP::Response;
-
-
my $url="http://libertarianwiki.org/User:Joe Schmoe/All_files_2";
-
my $agentName="User:Tisane (http://www.mediawiki.org/wiki/User:Tisane) grabbing some
-
data using FileNameExtract.pl";
-
my $browser = LWP::UserAgent->new();
-
$browser->timeout(500);
-
my $request = HTTP::Request->new(GET => $url);
-
my $response = $browser->request($request);
-
if ($response->is_error()) {printf "%s\n", $response->status_line;}
-
my $contents = $response->content();
-
my $delimiter="\n";
-
-
my $string='title="File:';
-
my $endString='"';
-
my $position=0;
-
my $endPosition=0;
-
-
$position=index($contents,$string,$position)+length($string);
-
$endPosition=index($contents,$endString,$position);
-
my $firstFileName=substr($contents,$position,$endPosition-$position);
-
print '$myFileName[0]="'.$firstFileName.'";'.$delimiter;
-
$position=$endPosition;
-
my $fileNumber=0;
-
-
while ($position!=-1){
-
$fileNumber++; -
$position=index($contents,$string,$position)+length($string); -
if ($position!=-1){ -
$endPosition=index($contents,$endString,$position); -
my $fileName=substr($contents,$position,$endPosition-$position); -
if ($fileName ne $firstFileName){ -
print '$myFileName['.$fileNumber.']="'.$fileName.'";'.$delimiter; -
$position=$endPosition; -
} else { -
$position=-1; -
} -
} -
}
[edit] Step 3
- This should generate a list that you can incorporate into another script:
-
use strict;
-
use warnings;
-
use LWP::Simple;
-
use LWP::UserAgent;
-
use HTTP::Request;
-
use HTTP::Response;
-
-
my @myFileName=('');
-
$myFileName[0]="01-gold-bar.jpg";
-
$myFileName[1]="100px-Massachusetts state flag.png";
-
$myFileName[2]="100px-New York state flag.png";
-
$myFileName[3]="128px-Padlock-red.svg.png";
-
... -
... -
... -
$myFileName[415]="WilliamGodwin.jpg";
-
$myFileName[416]="Wirtland Coat of Arms.png";
-
$myFileName[417]="Wirtland crane.png";
-
my $agentName="User:Tisane (http://www.mediawiki.org/wiki/User:Tisane) grabbing some
-
data using ExtractImages.pl";
-
my $browser = LWP::UserAgent->new();
-
$browser->timeout(500);
-
my $string='images/';
-
my $endString='"';
-
my $position=0;
-
my $endPosition=0;
-
#my $prefix='http://libertarianwiki.org/wiki/images/'; -
my $prefix='';
-
my $delimiter="\n";
-
my $reject1='LibertarianWiki.gif);';
-
my $reject2='icons/fileicon-pdf.png';
-
my $newArrayIndex=0;
-
-
for (my $count=0; $count<=417; $count++){
-
my $url="http://libertarianwiki.org/File:".$myFileName[$count]; -
my $request = HTTP::Request->new(GET => $url); -
my $response = $browser->request($request); -
if ($response->is_error()) {printf "%s\n", $response->status_line;} -
my $contents = $response->content(); -
$position=index($contents,$string,0)+length($string); -
$endPosition=index($contents,$endString,$position); -
my $fileName=substr($contents,$position,$endPosition-$position); -
if ($position!=-1 && $fileName ne $reject1 && $fileName ne $reject2){ -
#print $prefix.$fileName.$delimiter; -
print '$myFileName['.$newArrayIndex.']="'.$fileName.'";'.$delimiter; -
$newArrayIndex++; -
} -
}
[edit] Step 4
This in turn will generate a list that you can load into yet another script, e.g.:
-
use strict;
-
use warnings;
-
use LWP::Simple;
-
use LWP::UserAgent;
-
use HTTP::Request;
-
use HTTP::Response;
-
-
my @myFileName=('');
-
$myFileName[0]="7/78/01-gold-bar.jpg";
-
$myFileName[1]="5/53/100px-New_York_state_flag.png";
-
$myFileName[2]="8/81/128px-Padlock-red.svg.png";
-
... -
... -
... -
$myFileName[349]="a/a6/WilliamGodwin.jpg";
-
$myFileName[350]="b/b1/Wirtland_Coat_of_Arms.png";
-
$myFileName[351]="f/f5/Wirtland_crane.png";
-
my $agentName="User:Tisane (http://www.mediawiki.org/wiki/User:Tisane) grabbing some
-
data using DownloadImages.pl";
-
my $browser = LWP::UserAgent->new();
-
$browser->timeout(500);
-
my $string='';
-
my $endString='"';
-
my $position=0;
-
my $endPosition=0;
-
my $prefix='';
-
my $reject1='skip me';
-
my $newArrayIndex=0;
-
my $delimiter="\n";
-
my $FILE='myhandle';
-
-
for (my $count=0; $count<=351; $count++){
-
my $url="http://libertarianwiki.org/wiki/images/".$myFileName[$count]; -
#my $request = HTTP::Request->new(GET => $url); -
#my $response = $browser->request($request); -
#if ($response->is_error()) {printf "%s\n", $response->status_line;} -
#my $contents = $response->content(); -
my $contents = get($url); -
-
my $newFileName=substr($myFileName[$count],5,length($myFileName[$count])-5); -
print $url.$delimiter; -
print $newFileName.$delimiter; -
sysopen(FILE, $newFileName,0755); -
print FILE $contents; -
close FILE; -
}
[edit] Step 5
- You should now have all the files downloaded. Uploading them is another issue; perhaps try Extension:MultiUpload if you can get it to work.