User:OrenBochman/Installation

I have been trying to do some development on media wiki search. It turns out that setting up an environment is not so simple. I decided to document the process in case I need to do it again and for other peoples benefit.

=MediaWiki Development Environment Setup On Labs=

Docs Suggestions

 * 1) To start of there is a bunch of technology being used in labss that is a little intimidating.
 * 2) *So the tutorial (geared at developers) should start by letting them know what is what for example:
 * 3) *what is PUPPET
 * 4) *what is BASTION
 * 5) *what is an INSTANCE (an instance of)
 * 6) **next I tried to access both existing and non existing instances - but it was not successful :-(...
 * 7) Instance Creation - what are the option during an instance creation ? (even my helpers seemed confused what cache to use for my use case)
 * 8) *block diagrams would reduce my panic level almost as much as a panic button :-)
 * 9) *for me a page with
 * 10) **a block diagram (apache,php,mediawiki,cache,search extention,) on one machine + a script of how to realise it would be great.
 * 11) **also a diagram of the real search (sub) cluster's setup and how to set it up as an instance would be interesting in a week or two.

some issues which should be in the labs documentation are things like:


 * where to and how to get extentions?
 * where to find a dump, how to get it into the instance, how to import it, how to track the import?
 * wget it to where
 * run what command
 * how to check it's progress (the wiki's stats page v.s. a console)
 * how to back up an instance.
 * 1) how to set up java, ant, maven ....

and geting into the instance is realy difficult to get right.
 * another thing is that even though I used to work in security startups for 4 years - using SSH tunneling is now vauge


 * what is a security group - is it like port forwarding on my router ??? It could use an introduction like
 * "if you don't setup a security group all the ssh tunells you set up won't work since (... the port will be blocked - or the real reason)".
 * to set up the security group go to ... "Manage Security group list" and add rules like ...
 * also the Manage Security group list itself is bare and could give/reffer to some sample setutps.
 * I found the Instance console realy helpfull and even after a couple of tips to test from within the instance I got nowhere. so diagnostic tips which are obvious to an op are great for noobs. e.g.

use the Instance>Console Output if you cannot access port XXX. if it says ... refused, you need to set up a security group ...


 * I'm also worried to no end that setting up and working with an instance of servral machines like the real search cluster would be (which is mount improbable for me at this time) would be mission imppossible when adding in virtualization.

=MediaWiki Development Environment Setup On Ubuntu=

commands

 * ps -ef
 * tail
 * top
 * df - show info on file system
 * nohup command & - ignore terminal disconect

install utilities
sudo apt-get install 7zip mc

sudo aptitude mysql -p=puppet mysql -ppuppet nohup mysql -ppuppet data < simplewiki-latest-page_props.sql & tail nohup.out

Misc

 * mysql -p data < simplewiki-latest-protected_titles.sql
 * mysql -p data < simplewiki-latest-redirect.sql
 * mysql -p data < simplewiki-latest-page.sql
 * rm *sql


 * cd /var/www/w
 * nohup php maintenance/rebuildall.php &
 * tail nohup.out

w df ls cd /tmp ls rm simplewiki-latest-pages-meta-current.xml df rm simplewiki-latest-pages-articles.xml df 7z x simplewiki-latest-pages-meta-history.xml.7z ls rm *xml df 7z ls cp simplewiki-latest-pages-meta-history.xml.7z /mnt cd /mnt df ls mkdir petrb mkdir extract mv simplewiki-latest-pages-meta-history.xml.7z extract/ nohup php w/maintenance/update.php & df top df ls df top df cd w vi LocalSettings.php ls extensions/ vi LocalSettings.php cd .. nohup php w/maintenance/update.php & php w/maintenance/update.php vi w/LocalSettings.php php w/maintenance/update.php vi w/LocalSettings.php php w/maintenance/update.php df vi w/LocalSettings.php cd /tmp wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-category.sql.gz wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-page_props.sql.gz ls wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-interwiki.sql.gz gzip -d * wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-iwlinks.sql.gz ls mysql --password=puppet data < simplewiki-latest-interwiki.sql vi /var/www/w/LocalSettings.php top df whereis tomcat6 exit ls aptitude df ls /home ls df

=MediaWiki Development Environment Setup On Windows=

XAMPP Application Stack Installation
Installed latest XAMPP. That means:
 * XAMPP 1.7.4
 * Apache 2.2.17
 * MySQL 5.5.8
 * PHP 5.3.5
 * phpMyAdmin 3.3.9
 * FileZilla FTP Server 0.9.37
 * Tomcat 7.0.3 (with mod_proxy_ajp as connector)

Speeding Things Up
MediaWiki is one of the oldest and slowest WebApplicationFrameworks / Content Managment System. Users are rarely aware of this issue due to extensive use of hardware and a sophisticated cacheing strategy.

The good news is that it is possible to speed things up.

PHP accelerator - eaccelerator (skip to APC)
To enable eaccelerator edit php\php.ini and uncomment ";zend_extension = "\xampp\php\ext\php_eaccelerator.dll"

However the binary is not available in the XAMPP distribution and needs to be downloaded separately. This is no easy task. You need to check what type of php installation you have then. The answers is available from the PHP info page in the XAMPP main page.
 * what is the PHP version?
 * Is your version built as ThreadSafe?
 * which version of VisualStuio it was built with?

PHP accelerator - APC
However MediaWiki is best configured with APC and not eaccelerator. as before XAMPP does not bundle the php_apc.dll I searched the forums and came up with http://downloads.php.net/pierre/ of the various distrubution I was able to use php_apc-20110109-5.3-vc9-x86.zip.


 * To enable APC edit php\php.ini and add

"zend_extension = "\xampp\php\ext\php_apc.dll"


 * Next update MediaWiki LocalSettings.php to use APC by adding

$wgMainCacheType = CACHE_ACCEL;

Problem: Apache won't start

 * 1) Skype blocked port 80 (resolved)
 * 2) had to change User Account Control (UAC) via the control panel security settings which blocked control panel from starting Apache (resolved).
 * the database base (local link)

Production

 * For a production MediaWiki the fastest way to install MW is to:
 * decompressed MediaWiki software archive version 1.17 to D:\xampp\htdocs\mediawiki

Development
 Order allow,deny Allow from all  Alias /mwt "d:/ws/MediaWikiTrunk"
 * For a development MediaWiki instllation is is neccessary to (periodicaly) get the latest version of MW and Extentions from Subversion. Since my project is a java based extention I used the following setup.
 * set up an Eclipse workspace
 * Add One PHP project for MediaWikiTrunk (from subversion)
 * Add One PHP project for MediaWikiExtentions (from subversion)
 * Check out using svn+ssh a Java project for dev.
 * Check out using svn+ssh a Java project for making paches.
 * add to APACHE's httpd.conf the location and the Alias (url mapping) to the MediaWiki.


 * 1) browsed to http://localhost/mediawiki/ and followed instructions
 * 2) however it is necessary to use binary representation and not UTF=8 in db otherwise mwdumper will fail

Main & Status Pages

 * Main page(local)
 * Version Page (local)

Changing Capitalization Settings

 * 1) editing LocalSettings.php and adding at the end and then $wgCapitalLinks=false;
 * 2) running cleanUpCaps.php from the command line (this took about 12 hours for 3 million entries)

Extensions
require( "extensions/ParserFunctions/ParserFunctions.php"); $wgUseAjax = true; require_once("{$IP}/extensions/CategoryTree/CategoryTree.php"); require_once("$IP/extensions/CharInsert/CharInsert.php"); require_once("$IP/extensions/ImageMap/ImageMap.php"); require_once ("$IP/extensions/LabeledSectionTransclusion/lsth.php"); require_once ("$IP/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.php");
 * 1) Since My main purpose is Development related (bots and indexing) I wanted to introduce sufficent extetions to allow decent dumping
 * 2) consult ...
 * 3) download
 * 4) decompress to
 * 5) edit LocalSettings.php adding require( "extensions/ParserFunctions/ParserFunctions.php"); to the end
 * 1) char insert
 * 1) image map
 * 1) require_once Labeled Section Transclusion
 * 1) syntax highlighting
 * 1) oren code end

DIF
get the diff utility and edit LocalSettings.php adding $wgDiff = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff.exe'; $wgDiff3 = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff3.exe'; the alternative is to do nothing. Dif would be available but slower.
 * 1) Path to the GNU diff3 utility. Used for conflict resolution.

Extensions and Dumping
It turns out that dumping a MediaWiki via the dumphtml command-line extension is not compatible with some of the other extensions. It is prudent to turn such extensions off for making static dumps. The worst culprit is the syntax highlighting extension, which is not important in the main Wiktionary pages, but useful when developing scripts in user namespace.

Small Wiki - Simple English
importing via:

get the dump file and note the article count

php.exe maintenence\importDump.php simplewiki-20120104-pages-meta-current.xml.bz2


 * then update recent pages using

php.exe maintenence\rebuildrecentchanges.php


 * to check the status of the import

php.exe maintenence\shoStats.php


 * if importDump.php is interrupted - a second run will run through imported entries quickly.

Large Wiki

 * 1) importing via maintenence/import.php will take forever and has problems with the term Wiktionary in the inter-wiki namespace table
 * 2) importing via mwdumper with some db modification took about 12 hours.
 * 3) unfortunately it crashed every time requiring one to drop the db and reinstall.
 * 4) to speed things up I removed the indexes from the Text Page and Revisions table and later re introduced them

phpmyadmin can be used to import sql dumps - but it cannot import realty big ones for this another application is required. []

Fixing Capitalization
add to LocalSettings.php:

$wgCapitalLinks=false;

next run maintenece/cleanupCaps.php (12 hours)

DB time outs
Once imported there appeared to be a crash. This was due to database time outs. Some were caused by simultaneous db imports of required SQL tables. However even when these were done the time outs persisted.

The solution involved five days of diagnostic ad various attempts at patching things up. The actual solution came from: resolution: installed zend e-accelerator from and time outs have stopped.
 * Turning on traces and diagnostics.
 * Trying to dump pages via the dumpHtml extension.
 * Enabling Zend-Eaccelerator seems to have reduced the problem. Once done time outs no longer occurred on most pages.
 * Increasing the database time out from 30 seconds to 60.

Importing SQL Dumps
Running the rebuild script should restore the DB to fully functional status. However this script runs three tasks each taking an order of magnitude longer than its predecessor. The recreation of links seems to be impractical in a large project. Also there is no indication of progress.

I wanted to have a fully functioning version of Wiktionary at this point (perhaps without the pictures) to allow a static dump which could be used to make an offline version.

Diagnostics

 * 1) run rebuild:
 * 2) Then the wiki decided to crash. It gives the error: "Fatal error: Maximum execution time of 30 seconds exceeded in D:\xampp\htdocs\mediawiki\includes\db\DatabaseMysql.php on line 23"
 * 3) It turns out that the problem is not a crash but a slow response on many pages
 * 4) random link works
 * 5) pages generated by random link work too
 * 6) Switching to a second empty db with different table suffix works fine too

http://devzone.zend.com/1147/debugging-php-applications-with-xdebug/

More Problems
Pages were often littered with malformed extraneous tags (which if properly written would never be shown by a browser Resolution: install Tidy.

MySQL & Optimizing

 * http://dom.as/2007/01/26/mediawiki-performance-tuning/ from domas' blog
 * http://www.mediawiki.org/wiki/Manual:Performance_tuning RTFM
 * User:Robchurch/Performance_tuning has more resources
 * Manual:MySQL currently a stub