Wikistats/WLM stats

This is the procedure to collect information about editors to the Wiki Love Monuments (WLM) contest(s).

The procedure is rather hackish as it's run only once a year, and was to be implemented in limited time.

First step: collect usernames of contributing users
This is done by user Platonides, as follows time sql commonswiki_p "SELECT DISTINCT user_name FROM u_platonides_wlm_p.wlm2013 JOIN user ON (user_id=wlm_author) WHERE wlm_source = 'commons';"> wlmUsernames.txt result: wlmUsernames.txt

Or if we restrict to valid submissions (182 users less): time sql commonswiki_p "SELECT DISTINCT user_name FROM u_platonides_wlm_p.wlm2013 JOIN user ON (user_id=wlm_author) WHERE wlm_source = 'commons' AND wlm_status='Participating';"> wlmParticipatingUsernames.txt result: wlmParticipatingUsernames.txt

Second step: collect edits for all contributing users
This is done as a side-task during the editor deduplication step (where edits per user, per wiki, per month, per namespace are merged). There are no run-time arguments. This filtering step, and export to WLM specific csv files is always performed. Even the input files selection pattern is hard coded. The script (WikiCounts.pl -y ...) will read names for file WLM_uploaders_2010.txt, WLM_uploaders_2011.txt and further years. If you want to collect all edits for editors in one WLM year only, replace other input files by empty files. (yes this is Q&D)

bash folder: stat1002:/a/wikistats_git/dumps/bash or   stat1002:/a/home/ezachte/wikistats/dumps/bash (beta env) bash file: count_merge_editors.sh

input: stat1002:/a/wikistats_git/dumps/csv/csv_mw/WLM_uploaders_yyyy.txt (where yyyy in '2010', '2011', etc) plus usual input for deduplication process: stat1002:/a/wikistats_git/dumps/csv/csv_[xx]/EditsBreakdownPerUserPerMonth[lang].csv where xx is project code, and lang is language code (e.g. EN,DE,FR) or COMMONS, META etc.

output: folder: stat1002:/a/wikistats_git/dumps/csv/csv_mw

files: WLM_Uploaders_EditsBreakdownPerUserPerMonth.csv layout: username,yyyy-mm,project-language,namespace,edits example: 1971markus,2008-08,wp-de,0,51 WLM_Uploaders_EditsFirstLast.csv layout: first month,last month,user,edits,wikis example: 2001-03,2014-03,Cdani,779,wp-ca|wp-en|wp-es|wx-commons|wx-meta|wx-wikidata WLM_Uploaders_EditsFirstLast2.csv layout: month,first edit, last edit (??!) seems wrong example: 2003-02,1,0 WLM_Uploaders_EditsFirstLastRetention.csv layout: first month,last month,users,total edits,average edits per user example: 2004-04,2014-03,7,296548,42364

input and output are archived in: stat1002:/a/wikistats_git/dumps/csv/csv_mw/yyyy

Collect images/uploaders per country
bash folder: stat1002:/a/wikistats_git/dumps/bash or   stat1002:/a/home/ezachte/wikistats/dumps/bash (beta env) bash file: count_commons_images_wlm.sh

input: commons dump: /mnt/data/xmldatadumps/public/commonswiki/latest/commonswiki-latest-pages-meta-history.xml.7z bot names:    /a/wikistats_git/dumps/csv/csv_wx/BotsAll.csv country names: /a/wikistats_git/squids/csv/meta/CountryCodes.csv

output: WLM_images_by_country_by_year.csv contains images per year, (well formed tags + anomalies) contains images and uploades per year per country contains users and their upload count per country WLM_images_by_country_by_year_edits.txt contains page id,file,timestamp,usertype (R=registered user B=bot A=anonymous),year,country WLM_images_by_country_by_year_errors.txt anomalous tags and how they were fixed WLM_images_by_country_by_year_inspect.html anomalous tags and how they were fixed (as clickable html document) WLM_images_by_country_by_year_trace.txt for debugging only WLM_images_by_country_by_year_uploads.txt year,country,file,usertype (R=registered user B=bot A=anonymous),user,timestamp,unflagged

charts derived from this: http://commons.wikimedia.org/wiki/File:WLM_uploaders_2010-2012_linear.png This chart shows how the three WLM event lead to increasingly large peaks in new editors (on any project). http://commons.wikimedia.org/wiki/File:WLM_uploaders_2010-2012_log.png Same chart with logarithmic y axis. This shows that numbers of old hands contributing to WLM is non-negligible. http://commons.wikimedia.org/wiki/File:WLM_uploaders_2012_linear.png http://commons.wikimedia.org/wiki/File:WLM_uploaders_2012_log.png For completeness sake, similar linear and log charts where only WLM 2012 is taken into account. http://commons.wikimedia.org/wiki/File:WLM_uploaders_2010-2012_bar_chart_corrected.png Shows experienced vs new uploaders to each WLM event. http://commons.wikimedia.org/wiki/File:WLM_uploaders_2010-2012_vs_other_NS6_editors_linear.png Similar as first chart, now with remaining Commons contributors to namespace 6 plotted as second line. Surprisingly there is still a big leap in non WLM editors in Sep 2012 ==> (hmmm, seems too coincidental, are we missing WML participants ?, in other words should some users still move from red to blue line ?)