MediaWikiAnalysis

MediaWikiAnalysis is an analyzer to gather the activity from MediaWiki sites using the cool MediaWiki API. This page and subpages will serve as a place to collaborate on ideas for this project, with the goal of sharing techniques that will be useful in similar projects.

Description
Alvaro del Castillo writes to mediawiki-api:

"Hi guys,

We are working in an analyzer to gather the activity from MediaWiki sites using the cool MediaWiki API.

Right now the idea is to get a list of wiki pages using:

"action=query&list=allpages&aplimit=500"

and then for each page get all revisions with:

action=query&prop=revisions&titles=Main%20Page&rvlimit=500

so we have all revisions activity (pretty similar to having all commits to source code) for a MediaWiki site.

We are doing it as Open Source in:

https://github.com/MetricsGrimoire/MediaWikiAnalysis

Any comments are welcomed!

After having all the data we plan to use:

https://github.com/VizGrimoire/VizGrimoireR

for data analysis (SQL+R) combination

and for doing the viz:

https://github.com/VizGrimoire/VizGrimoireJS

Kudos to Sumana Harihareswara for pointing me to this list!

Cheers"

Questions
Can the API be modified to do this more efficiently? E.g. wouldn't it be nice to have an API feature that would let revisions be displayed, with their content, in order by or  (or  or  or )? E.g. see http://meta.inclumedia.org/wiki/Tool:IncluMirrorPullBot/RecentChanges#Get_all_RecentChanges_data_possible_from_the_500_oldest_revisions._.28By_setting_a_really_old_date_to_start_with..29. Specifically, wouldn't it be nice to be able to do

But it won't work with rcprop=content. Bugzilla filing forthcoming (if there's not one already there)!

Tool Current Status
Right now the tool works correctly with mediawiki.org which is a pretty good test.

It has incremental support but just for page revisions. Every run it analyze all revisions for a page from the last analyzed date. The incremental support could be improved using the RecentChanges module.

The metrics obtained are pages, revisions and people with total numbers and also, the evolution in time of all of them (for pages first revision is the creation date).