User:Erik Zachte/progress


 * to do: start wikistats run for wp:de based on full archive dump to help address heavy debate on wp:de (needs code fix, as it needs to process 'new' partial dump files)
 * to do: removal of last config files from squid reports, and replace by cmd line parameters)
 * to do: finalize monthly reporting of unsampled edits from squid log (rather than yearly avg)
 * to be resumed: analyze effects of world wide switch to https on 28 August on squid log stats;
 * to do: investigate duplication of page histories due to import of translated articles on other wiki (reported by Phoebe, Dec 6 2013)
 * to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)
 * to do: comment on metrics definitions
 * to do: add comment to page view report https://bugzilla.wikimedia.org/show_bug.cgi?id=57980#c6

(23 3/4 hrs)
 * week 4
 * produced long term browser trend charts (mobile/non mobile as well as absolute/relative) from squid log based csv files
 * reran squid log reports with bogus traffic filtered out (Jul-Dec 2013)
 * looking into doing the same for crawler patterns
 * generated input for monthly report card (incl minor bug fixing)
 * week 3
 * continued to analyze low page views counts, also from squid logs
 * produced breakdowns of article traffic by directly analyzing squids log with grep
 * see last pages of

(24.5 hrs) (3 1/ hrs) (5 hrs) (12 1/4 hrs) -> caused by wikistats skipping pages where checksum is missing in dumps -> rerunning all dumps (13 1/2 hrs) (24 1/4 hrs) (18 1/4 hrs) (12 1/4 hrs)
 * week 2
 * prepared files for Limn
 * incl. fix to circumvent for Limn bug, where Limn does not know how to handle empty values for WikiData
 * incl. fix to accept new standardized file names for comScore csv files
 * fixed missing wikis from dump reports (complaint by language team)
 * there was a design flaw, since API querying was added in July 2013, a circular dependency that prevented new codes added to dblist files to be incorporated, after fix two new wikis finally got coverage:
 * Vietnamese Wikivoyage, e.g.
 * Minangkabau Wikipedia, e.g.
 * updated monthly merged page view files + prepped top views reports, e.g. wp:en
 * instructed scripts to ignore input for Jan 5/6 2014 (totals will be extrapolated from remainder)
 * done WikiCountsSummarizeProjectCounts.pl, collects counts for page view reports, reran reports
 * done SquidCollectBrowserStatsExcel.pl
 * other scripts (daily/monthly merge of dammit.lt files) are automatically doing that with hourly precision
 * any other scripts to do? hmm, pondering
 * fixed page view counts shown in Summary reports for Sara Lasner e.g. Greek Wikipedia, now shows pv count for same month as other data in the report
 * added trend line for mobile page views and combined mobile+non-mobile to Summary reports, e.g. Japanese Wikipedia
 * fixed publication of patched projectcount files
 * started to analyze low page views counts, also from squid logs
 * week 1
 * mostly vacation
 * published squid based page view/edit reports
 * published monthly wikistats dump based reports
 * week 52
 * mostly vacation
 * transforming yearly page view/edit reports with yearly averages into monthly reports, last month only
 * week 51
 * solved bug : Italian Wikivoyage page count in Wikistats seems too low
 * week 50
 * finalized patch (see week 49)
 * published squid reports
 * contributed to new metric definitions
 * week 49
 * built script to patch project files from pagecount files (per wiki, since June 1 2013) to substract counts for bogus page views
 * quick charts on total (very) active editors and how those metrics drop on Wikipedia faster than on other projects
 * patched project files
 * assessment of download size for full wikipedia for journalist
 * in depth analysis of impact of patch
 * week 48
 * investigated with Christian the issue of inflated page views by webstatscollector bug
 * prep comScore files for RC
 * file name normalization of 100's of inconsistently named historic comScore files
 * week 47
 * published monthly Wikistats reports
 * prepped data for Limn except comScore data (subscription stalled again)
 * ongoing: discussions on metrics definitions
 * marked bug 46289 as resolved (see wk 46)
 * deactivated squid based report Devices and removed links to it
 * several minor fixes on squid reports (layout, update time)

(15 hrs) (14.5 hrs)
 * week 46
 * published new geo breakdown reports based on unsampled squids log
 * also updated chart on edits breakdown global N vs S (added to squid report portal)
 * urgent: updated chart for Sue on UV trends for news sites vs Wikimedia (based on a patchwork of yearly comScore data)
 * also created new charts on top reference sites
 * got mailing list stats back running (stalled since Feb 13),
 * (two open issues : look at gap in summer 13, apply Nemo's patch after gerrit sync issue has been fixed)
 * week 45
 * prepared input for Monthly Report Card (minus comScore data, subscription renewal is ongoing)
 * updated input for Monthly Report Card (after comScore subscription renewal)
 * minor: Wikistats Overview diagram is now public (linked from Wikistats portal About page)
 * analyzed drop in mobile page views in recent months on English Wikipedia (and others) vs steep rise in non-mobile page views (it turns out the rise in non-mobile is far too large for any possible underreporting on mobile)
 * ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
 * published squid based reports

(28 1/4 hrs)
 * week 44
 * publish monthly wikistats reports
 * helped analyze drop in total active editors for Sep 2013 (probably seasonal (=within normal range) after all)
 * ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
 * made WLM data more visible in Commons report
 * fixed bugzilla bug 55558: new Wikivoyage logo on wikistats portal
 * ongoing: publish input for report card

(11.5 hrs) (11 hrs)
 * week 43
 * input for cohort analysis to Daimee
 * adapted squid based edit reports, based explanatory texts and final counts on new argument: sample rate
 * reran squid based edit reports from 1:1 unsampled edit log
 * reran edit(or) counts for Sarikas, with updated title list
 * week 42
 * collect data from German dump for external researcher (Dr Sarikas)
 * new script to build new filtered full archive dump based on discrete list of article titles
 * new script to collect edits/editors count (registered,anon,bot) from full archive dump

(21 hrs)
 * week 41
 * updated (overdue) monthly squid based reports
 * large cluster of reports on page views/edits per country are back after 6 months (more to do, see week 43)
 * squid based data collection now based on 1:1 instead of 1:1000 log files for page edits
 * fixed https://bugzilla.wikimedia.org/show_bug.cgi?id=55528

(11 3/4 hrs)
 * week 40
 * worked on squid reports (ongoing)