Convert Socialtextwiki to Mediawiki

This artile describes the process how to convert a Socialtextwiki to Mediawiki using linux. It is based on a single conversion and by no means exhaustive; it was tested with a wiki comprising only a few hundred pages and files and could be improved a lot.

Socialtextwiki is similar to Kwiki.

The procedure described below can and can not:
 * convert pages, retaining the essential syntax
 * convert files
 * convert histories of pages and files
 * convert tables (reedit them manually, mostly adding only start and end syntax)
 * most other features of Socialtextwiki
 * convert user-association of edits
 * and much more

Introduction
Our socialtext wiki is stored as files located in a directory named data The tree contains one directory per page (below data/{WORKSPACE}), with one index.txt containing the current version of the page and several {date}.txt files containing older revisions. (Workspaces are separate branches of socialtext wiki.)

The files are located within a directory named plugin

Put all the following files and dirs (except for the new wiki) into one working directory and proceed as follows.

install mediawiki
$wgEnableUploads = true; $wgStrictFileExtensions = false; $wgCheckFileExtensions = false; post_max_size = 32M upload_max_filesize = 32M
 * install a current mediawiki
 * allow upload of all files
 * modify php.ini and reload apache2 (to be able to upload bigger files)

copy the original files to the new host
Copy these directories (use scp, not rsync, since we don't want symlinks; index.php are symlinks):
 * data
 * plugin

script to convert a single page
create a script conv.py to convert a single page. It takes the file name of the page as arg1. import re import sys filename = sys.argv[1] f = open(filename, "r") text = f.read (header,content) = text.split('\n\n',1) lines = content.split('\n') lines2 = [] for line in lines: lines2.append(line.lstrip.rstrip) content = '\n'.join(lines2) p = re.compile('^\^\^\^\^(.*)$', re.M) content = p.sub('====\\1 ====', content) p = re.compile('^\^\^\^(.*)$', re.M) content = p.sub('===\\1 ===', content) p = re.compile('^\^\^(.*)$', re.M) content = p.sub('==\\1 ==', content) p = re.compile('^\^(.*)$', re.M) content = p.sub('=\\1 =', content) p = re.compile('([^\*]+)\*([^\*]+)\*', re.M) content = p.sub('\\1\'\'\'\\2\'\'\'', content) p = re.compile('\[([^\]]+)\]', re.M) content = p.sub('\\1', content) p = re.compile('{file: ([^}]+)}', re.M) content = p.sub('[[Media:\\1]]', content) p = re.compile('{image: ([^}]+)}', re.M) content = p.sub('Bild:\\1', content) p = re.compile('\342\200\242\011', re.M) content = p.sub('* ', content) p = re.compile('[^\n]\|', re.M) content = p.sub('\n|', content) p = re.compile('\|\s*\|', re.M) content = p.sub('|-\n|', content) p = re.compile('[\/]{15,200}', re.M) content = p.sub('', content) p = re.compile('[\*]{15,200}', re.M) content = p.sub('', content) p = re.compile('[\+]{15,200}', re.M) content = p.sub('', content) p = re.compile('\"([^\"]+)\"\s*\n', re.M) content = p.sub('[http\\2 \\1]\n\n', content) p = re.compile('\"([^\"]+)\"', re.M) content = p.sub('[http\\2 \\1]', content) content += '\n' header_lines = header.split('\n') for line in header_lines: if re.match('^[Cc]ategory: ', line): category = re.sub('^[Cc]ategory: (.*)$', '\\1', line) content += '\n' if re.match('data/zsi-fe', filename): content += '\n' if re.match('data/zsi-ac', filename): content += '\n' if re.match('data/zsi-tw', filename): content += '\n' print content Test it like this: ./conv.py data/{WORKSPACE}/{PAGENAME}/{REVISION} Just copy the resulting wiki text into a page of the new mediawiki and use preview.
 * 1) !/usr/bin/python
 * 1) trim content lines
 * 1) headings
 * 1) bold
 * 1) link
 * 1) file
 * 1) image
 * 1) item level 1
 * 1) table, only partially, do the rest manually!
 * 2) you have to add {|..., |} , and check for errors due to empty cells
 * 1) lines with many / * + symbols were used as separator lines...
 * 1) external links
 * 1) add categories
 * 1) departments / workspaces

Adapt the python script to your needs until most pages are translated correctly.

script to upload a single file
The mediawiki API does not yet have action=upload. Get upload.pl.

The file has to be modified to use our new server instead of mediawiki.blender.org. Also edit username and password. Create a directory called 'upload', put content there and test uploading.

script to migrate pages
Use this script (which calls ./conv.py) to migrate pages. They will be uploaded in chronological order: wikiurl="http://NAME.OF.NEW.SERVER/mediawiki/api.php" lgname="WikiSysop" lgpassword="*************" login=$(wget -q -O - --no-check-certificate --save-cookies=/tmp/converter-cookies.txt \             --post-data "action=login&lgname=$lgname&lgpassword=$lgpassword&format=json" \              $wikiurl) edittoken=$(wget -q -O - --no-check-certificate --save-cookies=/tmp/converter-cookies.txt \             --post-data "action=query&prop=info|revisions&intoken=edit&titles=Main%20Page&format=json" \              $wikiurl) token=$(echo $edittoken | sed -e 's/.*edittoken.:.\([^\"]*\)...\".*/\1/') token="$token""%2B%5C" find data -not -path "data/help*" -type f -and -not -name ".*" | sort | while read n; do    pagedir=$(echo $n | sed -e 's/.*\/\(.*\)\/index.txt/\1/') if "`grep -q $pagedir excludes; echo $?`" == "0" ; then echo "omitting $pagedir" else echo "parsing  $pagedir" workspace=$(echo $n | sed -e 's/.*\/\(.*\)\/[^\/]\+\/index.txt/\1/') pagename=$(egrep '^Subject:' $n | head -n 1 | sed -e 's/^Subject: \(.*\)/\1/') pagedate=$(egrep '^Date:' $n | head -n 1 | sed -e 's/^Date: \(.*\)/\1/') echo "$workspace $pagedir -- $pagename"; text=$(./conv.py $n) text1=$(php -r 'print urlencode($argv[1]);' "$text") pagename1=$(php -r 'print urlencode($argv[1]);' "$pagename") pagedate1=$(php -r 'print urlencode($argv[1]);' "$pagedate") cmd="action=edit&title=$pagename1&summary=$pagedate1+autoconverted+from+socialtextwiki&format=json&text=$text1&token=$token&recreate=1&notminor=1&bot=1" editpage=$(wget -q -O - --no-check-certificate --load-cookies=/tmp/converter-cookies.txt --post-data $cmd $wikiurl) #echo $editpage fi done
 * 1) !/bin/sh
 * 1) login
 * 1) echo $login
 * 1) get edittoken
 * 1) echo $edittoken
 * 1) echo $token
 * 1) test editing with a test page
 * 2) cmd="action=edit&title=test1&summary=autoconverted&format=json&text=test1&token=$token&recreate=1&notminor=1&bot=1"
 * 3) editpage=$(wget -q -O - --no-check-certificate --load-cookies=/tmp/converter-cookies.txt --post-data $cmd $wikiurl)
 * 4) echo $editpage
 * 5) exit
 * 1) loop over all pages except for dirs in the list of excludes

script to migrate files
Use this script (which calls ./upload.pl) to migrate files. The files will be uploaded in chronological order: find plugin -path 'plugin/zsi*/attachments/*.txt' | sort | while read f; do    if  "`grep -q 'Control: Deleted' $f; echo $?`" != "0" ; then d=${f/.txt} filenameNew=$(egrep '^Subject:' $f | sed -e 's/Subject: \(.*\)/\1/') filenameOrig=$(ls -1 $d | head -n 1) version=$(egrep '^Date: ' $f | sed -e 's/Date: \(.*\)/\1/') #echo "---" #echo $filenameOrig #echo "$filenameNew" rm upload/* cp $d/$filenameOrig "upload/$filenameNew" # prepare upload echo -e ">$filenameNew\n$filenameNew\n$version\n(autoconverted from socialtext wiki)" > upload/files.txt # upload ./upload.pl upload fi done
 * 1) !/bin/sh