User:Erik Zachte/progress

(3 1/ hrs) (5 hrs) (12 1/4 hrs) -> caused by wikistats skipping pages where checksum is missing in dumps -> rerunning all dumps (13 1/2 hrs) (24 1/4 hrs) (18 1/4 hrs) (12 1/4 hrs)
 * to do: removal of last config files from squid reports, and replace by cmd line parameters)
 * t do finalize monthly reporting of unsampled edits from squid log (rather than yearly avg)
 * to be resumed: analyze effects of world wide switch to https on 28 August on squid log stats;
 * week 2
 * prepared files for Limn
 * incl. fix to circumvent for Limn bug, where Limn does not know how to handle empty values for WikiData
 * incl. fix to accept new standardized file names for comScore csv files
 * fixed missing wikis from dump reports (complaint by language team)
 * there was a design flaw, since API querying was added in July 2013, a circular dependency that prevented new codes added to dblist files to be incorporated, after fix two new wikis finally got coverage:
 * Vietnamese Wikivoyage, e.g.
 * Minangkabau Wikipedia, e.g.
 * to do: start wikistats run for wp:de based on full archive dump to help address heavy debate on wp:de (needs code fix, as it needs to process 'new' partial dump files)
 * to do: check dammit merge job + prep monthly merge + prep top views reports
 * instruct several scripts to ignore input for Jan 5/6 2014 (totals will be extrapolated from remainder)
 * done WikiCountsSummarizeProjectCounts.pl, collects counts for page view reports, reran reports
 * other scripts (daily/monthly merge of dammit.lt files) are automatically doing that with hourly precision
 * any other scripts to do? hmm, pondering
 * investigating for Sara Lasner what exactly is reported on page view in wikistats summary reports. after the heavy misreporting due to bogus traffic in late 2013, are numbers in wikistats summary reports sound now? particularly compare Greek Wikipedia summary report for Nov 2013 and now.
 * comment on metrics definitions
 * to do: analyze low page views counts, also from squid logs
 * to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)
 * fixed publication of patched projectcount files
 * week 1
 * mostly vacation
 * published squid based page view/edit reports
 * published monthly wikistats dump based reports
 * week 52
 * mostly vacation
 * transforming yearly page view/edit reports with yearly averages into monthly reports, last month only
 * week 51
 * solved bug : Italian Wikivoyage page count in Wikistats seems too low
 * week 50
 * finalized patch (see week 49)
 * published squid reports
 * contributed to new metric definitions
 * week 49
 * built script to patch project files from pagecount files (per wiki, since June 1 2013) to substract counts for bogus page views
 * quick charts on total (very) active editors and how those metrics drop on Wikipedia faster than on other projects
 * patched project files
 * assessment of download size for full wikipedia for journalist
 * in depth analysis of impact of patch
 * week 48
 * investigated with Christian the issue of inflated page views by webstatscollector bug
 * prep comScore files for RC
 * file name normalization of 100's of inconsistently named historic comScore files
 * week 47
 * published monthly Wikistats reports
 * prepped data for Limn except comScore data (subscription stalled again)
 * ongoing: discussions on metrics definitions
 * marked bug 46289 as resolved (see wk 46)
 * deactivated squid based report Devices and removed links to it
 * several minor fixes on squid reports (layout, update time)

(15 hrs) (14.5 hrs)
 * week 46
 * published new geo breakdown reports based on unsampled squids log
 * also updated chart on edits breakdown global N vs S (added to squid report portal)
 * urgent: updated chart for Sue on UV trends for news sites vs Wikimedia (based on a patchwork of yearly comScore data)
 * also created new charts on top reference sites
 * got mailing list stats back running (stalled since Feb 13),
 * (two open issues : look at gap in summer 13, apply Nemo's patch after gerrit sync issue has been fixed)
 * week 45
 * prepared input for Monthly Report Card (minus comScore data, subscription renewal is ongoing)
 * updated input for Monthly Report Card (after comScore subscription renewal)
 * minor: Wikistats Overview diagram is now public (linked from Wikistats portal About page)
 * analyzed drop in mobile page views in recent months on English Wikipedia (and others) vs steep rise in non-mobile page views (it turns out the rise in non-mobile is far too large for any possible underreporting on mobile)
 * ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
 * published squid based reports

(28 1/4 hrs)
 * week 44
 * publish monthly wikistats reports
 * helped analyze drop in total active editors for Sep 2013 (probably seasonal (=within normal range) after all)
 * ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
 * made WLM data more visible in Commons report
 * fixed bugzilla bug 55558: new Wikivoyage logo on wikistats portal
 * ongoing: publish input for report card

(11.5 hrs) (11 hrs)
 * week 43
 * input for cohort analysis to Daimee
 * adapted squid based edit reports, based explanatory texts and final counts on new argument: sample rate
 * reran squid based edit reports from 1:1 unsampled edit log
 * reran edit(or) counts for Sarikas, with updated title list
 * week 42
 * collect data from German dump for external researcher (Dr Sarikas)
 * new script to build new filtered full archive dump based on discrete list of article titles
 * new script to collect edits/editors count (registered,anon,bot) from full archive dump

(21 hrs)
 * week 41
 * updated (overdue) monthly squid based reports
 * large cluster of reports on page views/edits per country are back after 6 months (more to do, see week 43)
 * squid based data collection now based on 1:1 instead of 1:1000 log files for page edits
 * fixed https://bugzilla.wikimedia.org/show_bug.cgi?id=55528

(11 3/4 hrs)
 * week 40
 * worked on squid reports (ongoing)