MediaWikiAnalysis

MediaWikiAnalysis is an analyzer to gather the activity from MediaWiki sites using the cool MediaWiki API. This page and subpages will serve as a place to collaborate on ideas for this project, with the goal of sharing techniques that will be useful in similar projects.

Description
Alvaro del Castillo writes to mediawiki-api:

"Hi guys,

We are working in an analyzer to gather the activity from MediaWiki sites using the cool MediaWiki API.

Right now the idea is to get a list of wiki pages using:

"action=query&list=allpages&aplimit=500"

and then for each page get all revisions with:

action=query&prop=revisions&titles=Main%20Page&rvlimit=500

so we have all revisions activity (pretty similar to having all commits to source code) for a MediaWiki site.

We are doing it as Open Source in:

https://github.com/MetricsGrimoire/MediaWikiAnalysis

Any comments are welcomed!

After having all the data we plan to use:

https://github.com/VizGrimoire/VizGrimoireR

for data analysis (SQL+R) combination

and for doing the viz:

https://github.com/VizGrimoire/VizGrimoireJS

Kudos to Sumana Harihareswara for pointing me to this list!

Cheers"

Questions
Can the API be modified to do this more efficiently? E.g. wouldn't it be nice to have an API feature that would let revisions be displayed, with their content, in order by or  (or  or  or )? E.g. see http://meta.inclumedia.org/wiki/Tool:IncluMirrorPullBot/RecentChanges#Get_all_RecentChanges_data_possible_from_the_500_oldest_revisions._.28By_setting_a_really_old_date_to_start_with..29. Specifically, wouldn't it be nice to be able to do

But it won't work with rcprop=content. Bugzilla filing forthcoming (if there's not one already there)!