Analytics/Wikimetrics/FAQ

This is the frequently asked questions page for Wikimetrics.

What is the project code?
The WMF has a lot of different projects, see the site matrix for a complete overview. To construct the code:
 * For Wikipedia it's just the language code (for example en)
 * For Commons it's commons
 * For chapters it's ....
 * For mediawiki it's ....
 * For wikibooks it's ....
 * For wikidata it's ....
 * For wikimania it's ....
 * For wikinews it's ....
 * For wikiquote it's ....
 * For wikisource it's ....
 * For wikispecies it's ....
 * For wikiversity it's ....
 * For wikivoyage it's ....
 * For wiktionary it's ....

Where is the source code?
https://phabricator.wikimedia.org/diffusion/ANWM/ and https://github.com/wikimedia/analytics-wikimetrics

Where is the data coming from?
Wikimetrics uses the copy of the WMF databases at Wikimedia Labs.

How do I add a new feature?
Analytics/Wikimetrics/Adding_New_Features

On vagrant: Tests work fine when I run all of them but fail when I just run just one. What is going on?
nosetest is not executing properly tests that are two levels deep from main tests directory. Make sure your test is located only one level deep, for example the following would be executed properly:

/vagrant/wikimetrics/tests/some-directory/your-test.py

But the following would not: /vagrant/wikimetrics/tests/some-directory/some-deeper-directory/your-test.py

This is a bug with nosetest, it is similar, although not identical, to this one: https://code.google.com/p/python-nose/issues/detail?id=342

Looks like it was fixed in some python 2.7.* release, we are running 2.7.3 in vagrant. Things seem to work in 2.7.5

Tests just hang or fail due to queue issues, what do I do?
It is likely that if test hangs there is some issue with the queue. Logging in celery needs work on our side but the easy remedy on your dev environment is that instead of making celery log to stdout you make celery log to /tmp/.

Uncomment the following line on tests/_init_.py

celery_out = open("/tmp/logCelery.txt", "w")

Tail logs and you might be able to see any errors that the queue might be throwing.

You can log to the queue log while tests are ongoing doing:

f = open('/tmp/logCelery.txt','a') f.write(str(some variable)) f.close

When I run tests, errors show up on the console, but at the end it says "OK"
This is fine. Logging and nose get along like pirates and the English, so sometimes they fight and talk funny at each other. But all's well that ends well.

Pro tip: Don't run tests as root
Just sayin'...

How do I generate test data?
Go to:

http://localhost:5000/demo/create/fake-wiki-users/100

This would create 100 fake users in database 'wiki'. If you did this on a new database, those users' ids should be 1, 2, 3, ... 100. If not, you can find the ids you just created by issuing 'select * from user' in your 'wiki' database. Now from those users you need to create a cohort:

http://localhost:5000//cohorts/upload

Use the textarea and type in a user id per line, like: 1   2    3 Pick a project and upload

Restore from backup
Wikimetrics is backed up once an hour and once a day.

Code
The relevant changeset pertaining how backups are organized is this one: []

The relevant part of root's crontab looks like: root@wikimetrics-staging1:~# crontab -l 0 * * * * /data/project/wikimetrics/backup/hourly_script -o /data/project/wikimetrics/backup/hourly -f /var/lib/wikimetrics/public -d wikimetrics -r /a/redis/wikimetrics1-6379.rdb 30 22 * * * /data/project/wikimetrics/backup/daily_script -i /data/project/wikimetrics/backup/hourly -o /data/project/wikimetrics/backup/daily -k 10
 * 1) Puppet Name: daily wikimetrics backup

Procedure
We back up three things:


 * Wikimetrics database
 * Redis
 * Public reports stored on '/var/lib/wikimetrics/public' directory

Note that the dashboards that (in the future) will pull data from wikimetrics will do so from data on the '/var/lib/wikimetrics/public' directory. This directory stores data files generated daily and, while in theory all that data can be regenerated, it might take a long time. We want to be sure that 'at maximum' we lose just one day of data.

In order to restore the backup:

tar -xvzf
 * Get snapshot file from '/data/project/wikimetrics/backup/daily'
 * Untar to a location that you own, like ~/backup

You shall see the database dump file, redis rdb file, and the public directory

sudo stop wikimetrics-queue sudo stop wikimetrics-scheduler /etc/init.d/apache2 stop /etc/init.d/redis-server stop
 * Stop redis, wikimetrics and queue on the machine where the restore is going to happen

mysql wikimetrics < database.sql
 * Restore database:

cp ~/backup/a/redis/wikimetrics1-6379.rdb /a/redis/wikimetrics1-6379.rdb
 * Restore redis:

Move aside current public dir: mv /var/lib/wikimetrics/public /var/lib/wikimetrics_old_public/ cp -r ~/backup/var/lib/wikimetrics/public /var/lib/wikimetrics/public
 * Restore public reports:


 * Restart queue, scheduler, apache and redis


 * Make sure backup looks good, if so remove: /var/lib/wikimetrics_old_public/

Guide
https://www.mediawiki.org/wiki/Wikimetrics/Help