Talk:Alternative parsers/Archive 1

Here is a dump of a early version of some doc User:Djbclark did about a survey of options for offline mediawiki use (Sat Jul 12, 2008) - I don't have time to incorporate it into a nice seperate page at the moment, but I thought it might be useful to some people as is, and also I need to reference it from a mailing list thread :-)

Feel free to edit it / move it to a more appropriate place - the Alternative parsers page seemed like the closest thing to a Using Mediawiki Offline page at the moment.

= mvs - A command line Mediawiki client = It would be really nice if this supported some form of recursion... All these tools are way to "you are only going to use this with wikipedia, so we can't possibly provide features that would be useful for smaller wikis" oriented...

Basic Use
Install: sudo aptitude install libwww-mediawiki-client-perl

Initial Setup: mvs login -d cluestick.office.fsf.org -u USERNAME -p 'PASSWORD' -w '/wiki'

Where USERNAME is your username (note that mediawiki autocapitalizes this, so for example this would be Dclark, not dclark) and PASSWORD is your mediawiki password (note that this is a very insecure way to pass a password to a program, and should only be used on systems where you are the only used or you trust all other users).

Example Checkout: mvs update User:Dclark.wiki

Google Gears
If you have Google Gears (BSD Licensed) installed, you will see a "gears localserver" box on the lower left-hand side of the cluestock mediawiki screen, under the "navigation", "search", and "toolbox" boxes. This is done with the Mediawiki LocalServer: Offline with Google Gears extention. The original version provides slightly more clear install doc. In general, put the .js files with the other .js files, in the common skins directory.

After creating the local store and waiting for it to finish downloading, you will be able to go offline and browse the wiki - however search and "Special:" pages will not work in Google Gears offline mode, and you will not be able to edit pages in offline mode.

Local Django-based server
The directions at Building a (fast) Wikipedia offline reader produce an environment that takes more time to set up than Google Gears, but is arguably a bit nicer (including local search of page titles - and shouldn't be that hard to extend that to full text).

sudo aptitude install apt-xapian-index xapian-tools libxapian-dev php5-cli wget http://users.softlab.ece.ntua.gr/~ttsiod/mediawiki_sa.tar.bz2 wget http://users.softlab.ece.ntua.gr/~ttsiod/offline.wikipedia.tar.bz2 populate wiki-splits with raw .xml.bz2 dump mv mediawiki_sa offline.wikipedia Edit Makefile to have line "XMLBZ2 = cluestick-articles.xml.bz2" Edit mywiki/gui/view.py 4th line to: return article(request, "Main Page") make wikipedia
 * 1) THESE ARE NOT STEP-BY-STEP INSTRUCTIONS... interpretation is required.
 * 1) (Then follow directions it spews)

TODO: Set up cron job to produce rsync-able cluestick-articles.xml.bz2 on a regular basis. Package this up.

PDF Book
You can download entire parts of the wiki as PDF books.
 * Cluestick-Main.pdf - Main Wiki
 * Cluestick-Talk.pdf - Talk Pages
 * Cluestick-User.pdf - User Pages

Note that the processing of subsections is broken (in that they look ugly / take too many pages) at the moment. This uses Extension:Pdf_Book plus the Whole Namespace Export patch.

This is a good way to read all of Cluestick.

Tried / Don't work or no doc
Some useful doc on how to make perl and python modules into debian packages however...

libmediawiki-spider-perl
CPAN > Emma Tonkin > Mediawiki-Spider-0.31 >  Mediawiki::Spider sudo aptitude install dh-make-perl fakeroot dpkg-dev build-essential sudo aptitude install libwww-perl libhtml-tree-perl libhtml-tree-perl libhtml-tree-perl sudo apt-file update

wget http://search.cpan.org/CPAN/authors/id/C/CS/CSELT/HTML-Extract-0.25.tar.gz tar -pzxvf HTML-Extract-0.25.tar.gz dh-make-perl HTML-Extract-0.25 cd HTML-Extract-0.25 fakeroot dpkg-buildpackage -uc -us cd .. sudo dpkg -i libhtml-extract-perl_0.25-1_all.deb

wget http://search.cpan.org/CPAN/authors/id/C/CS/CSELT/Mediawiki-Spider-0.31.tar.gz tar -pzxvf Mediawiki-Spider-0.31.tar.gz dh-make-perl Mediawiki-Spider-0.31 cd Mediawiki-Spider-0.31 fakeroot dpkg-buildpackage -uc -us cd .. sudo dpkg -i libmediawiki-spider-perl_0.31-1_all.deb

You need a script like this to use it:
 * 1) !/usr/bin/env perl

use Mediawiki::Spider; use Data::Dumper;

my $spider2=new Mediawiki::Spider; print "Now getting wikiwords\n"; my @wikiwords2=$spider2->getwikiwords("http://standards-catalogue.ukoln.ac.uk/"); $spider2->extension("html"); print "Got wikiwords:proceeding with d/l\n"; $spider2->makeflatpages("./$destinationdir/",1); $spider2->buildmenu; $spider2->printmenu("./$destinationdir/index.html","aword",@wikiwords);

However it only seems to work with older versions of mediawiki (or our mediawiki instance is "weird" in some way it doesn't expect).

fuse-mediawiki
Mediawiki FUSE filesystem: git clone git://repo.or.cz/fuse-mediawiki.git sudo aptitude install git-core gvfs-fuse fuse-utils fuse-module python-fuse git clone git://repo.or.cz/fuse-mediawiki.git cd fuse-mediawiki.git mkdir cluestick-fuse python fuse-mediawiki.py -u Dclark http://cluestick.office.fsf.org/wiki cluestick-fuse

This works, but brings up a nonsense file system that you can't cd into beyond one level or ls in. It seems to be under active development, so probably good to check back in a few months. See also Talk:Using_Cluestick_Offline.

wikipediafs
WikipediaFS - View and edit Wikipedia articles as if they were real files sudo aptitude install gvfs-fuse fuse-utils fuse-module python-fuse python-all-dev sudo easy_install stdeb wget http://internap.dl.sourceforge.net/sourceforge/wikipediafs/ tar xvfz wikipediafs-0.3.tar.gz cd wikipediafs-0.3 vi setup.py # Edit so version is correct stdeb_run_setup cd deb_dist/wikipediafs-0.3/ dpkg-buildpackage -rfakeroot -uc -us sudo dpkg -i ../python-wikipediafs_0.3-1_all.deb man mount.wikipediafs

This is sort of useless for the purpose of this section, as it requires the user to get a specific set of pages before going offline. Didn't spend enough time with it to see if it worked as advertised.

wikipediaDumpReader
Wikipedia Dump Reader - KDE App - Reads output of dumpBackup.php cd /usr/share/mediawiki/maintenance php dumpBackup.php --current | bzip2 > cluestick-articles.xml.bz2

Too wikipedia-specific. Didn't work with cluestick dump at all.

kiwix

 * 1) kiwix on freenode / svn co https://kiwix.svn.sourceforge.net/svnroot/kiwix kiwix
 * No documentation or response from IRC channel. No doc in svn.