Extension:EventLogging/MongoDB

From mediawiki.org

There's an experimental (read: no uptime guarantees) instance of MongoDB on vanadium that is receiving all EventLogging data.

To connect, SSH into a machine on the cluster (like stat1), and run:

$ mongo vanadium.eqiad.wmnet:27017/events

With any luck, you'll see this output:

MongoDB shell version: 2.0.4
connecting to: vanadium.eqiad.wmnet:27017/events
>

This means MongoDB is ready to read and execute your queries. Cool.

Each deployed event schema is stored in its own collection (collections are the MongoDB analogue of MySQL tables). To see the list of extant collections, type show collections:

> show collections
AccountCreation
GettingStarted
GuidedTour
MobileBetaWatchlist
ServerSideAccountCreation
system.indexes

Querying[edit]

You can learn about how to query these collections in the MongoDB Manual. Here are some sample queries (and their output) to get you started:

/* Get a single document from the 'ServerSideAccountCreation' collection: */
> db.ServerSideAccountCreation.findOne()
{
    "_id" : "23d89b961abd5f9fa3207d56f8c77af6",
    "wiki" : "bgwiki",
    "isValid" : true,
    "recvFrom" : "mw1065",
    "seqId" : 4978,
    "timestamp" : ISODate("2013-02-09T14:24:38Z"),
    "schema" : "ServerSideAccountCreation",
    "event" : {
        "userName" : "Scrubbed",
        "userBuckets" : "",
        "userId" : 123456,
        "returnTo" : "Игнатий Плевенски",
        "token" : "",
        "displayMobile" : false,
        "isSelfMade" : true
    },
    "revision" : 5150394
}
/* See the total number of events in ServerSideAccountCreation: */
> db.ServerSideAccountCreation.count()
2365

/* Grab an arbitrary ruwiki ServerSideAccountCreation event: */
> db.ServerSideAccountCreation.findOne( { wiki: 'ruwiki' } )
{
    "_id" : "e6c0ffc787d450809a14d8a370d34966",
    "wiki" : "ruwiki",
    "isValid" : true,
    "recvFrom" : "mw1088",
    "seqId" : 5122,
    "timestamp" : ISODate("2013-02-09T19:54:49Z"),
    "schema" : "ServerSideAccountCreation",
    "event" : {
        "userName" : "Omitted",
        "userBuckets" : "",
        "userId" : 999999,
        "token" : "",
        "displayMobile" : false,
        "isSelfMade" : true
    },
    "revision" : 5150394
}

You can use dot notation in quotation marks to query nested properties such as your schema's fields inside 'event':

/* See the number of mobile ServerSideAccountCreation events from the
   Russian Wikipedia: */
>  db.ServerSideAccountCreation.find( {
...    wiki: 'ruwiki',
...    'event.displayMobile': true
... } ).count()
8

Like mysql, pressing [Tab] will complete commands and collection names.

Python[edit]

You can also write Python programs to interact with the data using PyMongo. Here is a simple example:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pymongo

connection = pymongo.Connection('vanadium.eqiad.wmnet', 27017)
db = connection.events

for wiki in 'enwiki', 'dewiki', 'nlwiki':
    events = db.ServerSideAccountCreation.find( {'wiki': wiki} )
    print '%s: %d' % (wiki, events.count())

And its output:

$ python mongodemo.py
enwiki: 1200
dewiki: 92
nlwiki: 21