Wikistats/WLM stats

This is the procedure to collect information about editors to the Wiki Love Monuments (WLM) contest(s).

The procedure is rather hackish as it's run only once a year, and was to be implemented in limited time.

First step: collect usernames of contributing users
This is done by user Platonides, as follows time sql commonswiki_p "SELECT DISTINCT user_name FROM u_platonides_wlm_p.wlm2013 JOIN user ON (user_id=wlm_author) WHERE wlm_source = 'commons';"> wlmUsernames.txt result: wlmUsernames.txt

Or if we restrict to valid submissions (182 users less): time sql commonswiki_p "SELECT DISTINCT user_name FROM u_platonides_wlm_p.wlm2013 JOIN user ON (user_id=wlm_author) WHERE wlm_source = 'commons' AND wlm_status='Participating';"> wlmParticipatingUsernames.txt result: wlmParticipatingUsernames.txt

Second step: collect edits for all contributing users
This is done as a side-task during the editor deduplication step (where edits per user, per wiki, per month, per namespace are merged). There are no run-time arguments. This filtering step, and export to WLM specific csv files is always performed. Even the input files selection pattern is hard coded. The script (WikiCounts.pl -y ...) will read names for file WLM_uploaders_2010.txt, WLM_uploaders_2011.txt and further years. If you want to collect all edits for editors in one WLM year only, replace other input files by empty files. (yes this is Q&D)

bash folder: stat1002:/a/wikistats_git/dumps/bash or   stat1002:/a/home/ezachte/wikistats/dumps/bash (beta env) bash file: count_merge_editors.sh

input: stat1002:/a/wikistats_git/dumps/csv/csv_mw/WLM_uploaders_yyyy.txt (where yyyy in '2010', '2011', etc) plus usual input for deduplication process: stat1002:/a/wikistats_git/dumps/csv/csv_[xx]/EditsBreakdownPerUserPerMonth[lang].csv where xx is project code, and lang is language code (e.g. EN,DE,FR) or COMMONS, META etc.

output: folder: stat1002:/a/wikistats_git/dumps/csv/csv_mw

files: WLM_Uploaders_EditsBreakdownPerUserPerMonth.csv layout: username,yyyy-mm,project-language,namespace,edits example: 1971markus,2008-08,wp-de,0,51 WLM_Uploaders_EditsFirstLast.csv layout: first month,last month,user,edits,wikis example: 2001-03,2014-03,Cdani,779,wp-ca|wp-en|wp-es|wx-commons|wx-meta|wx-wikidata WLM_Uploaders_EditsFirstLast2.csv layout: month,first edit, last edit (??!) seems wrong example: 2003-02,1,0 WLM_Uploaders_EditsFirstLastRetention.csv layout: first month,last month,users,total edits,average edits per user example: 2004-04,2014-03,7,296548,42364

input and output are archived in: stat1002:/a/wikistats_git/dumps/csv/csv_mw/yyyy