User:Erik Zachte (WMF)/Progress

For earlier history see User:Erik_Zachte/progress
 * to do: removal of last config files from squid reports, and replace by cmd line parameters)
 * to do: investigate duplication of page histories due to import of translated articles on other wiki (reported by Phoebe, Dec 6 2013)
 * to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)


 * week 9
 * Removed translations for namespace 'User' from wikistats (some translations were incomplete, and buggy, and not really needed) per Amir's request

(17 1/2 hrs) (25 1/4 hrs)
 * week 8
 * testing of new media file request dump
 * user requests (new log item):
 * [question] Raw file stats vs pageview API stats: (Jason Bub)
 * [question] [data] monthly per country view stats (Rütger Egolf, Research Assistant at Centre for European Economic Research)
 * [question] Explain how wikilinks are counted in wikistats (explained perl code) by
 * week 7
 * derive estimates for new quarterly report card from incomplete data (dumps have stalled) by extrapolation
 * adapt wikistats scripts to allow merge of totalactive editors for only those wikis which have data for latest month
 * Provide total active editors (TAE) for December 2014
 * Report Edits for 2014 Oct-Dec

(22 1/2 hrs)
 * week 6
 * partial publishing of RC input (dumps are lagging)
 * analyze progress of dump generation (by parsing index.html for 900+ wikis, for all available dump dates),
 * autonomous growth is dump sizes and job length can be shown
 * with a few further tweaks this scan can be run say half an hour, and also report on stalled dump jobs

(16 1/4 hrs) (8 3/4)
 * week 5
 * fixed 2 issues (coding & config glitch) which made Summary charts not update since Sep 2014, see e.g.
 * final tweaks (hopefully) for Wiki Loves Africa reporting
 * investigating 5 percent of page views /edit from sampled squid logs which don't have country info (ongoing)
 * issues with dumps (lagging behind, ongoing)
 * reassessment of where we are with issues with media file request counts RFC
 * week 4
 * fixed wikivoyage report showing wikipedia counts for el/fa
 * rerun Wiki Loves Africa reporting (now using categories *and* templates to find all images)

(17 hrs)
 * week 3
 * analysis of maintenance categories on wp:en (req. Lila), first release published
 * finalized analysis of wp:en maintenance categories (req. Lila), see
 * adapted several script to use proxy on stat1002 from now on, see
 * added Persian and Greek wikivoyage and looked into extraordinary large page counts for those two wikis

(22 1/4 hrs)
 * week 2
 * Wiki Loves Africa reporting (ongoing, looking into discrepancies)
 * analysis of maintenance categories on wp:en (req. Lila), ongoing
 * most wikistats reporting broken due to recent config changes, several issues
 * stat1001 changed to private IP (Putty config fixed)
 * updated all bash files for new access to stat1001
 * daily aggregation of page views aborted due to trivial error -> Q&D fix

(1 hrs)
 * week 53/2014 1/2015

(8 hrs)
 * week 52
 * misc maint.

(9 3/4 hrs)
 * week 51
 * end of year administrative housekeeping / reorg.

(13 3/4)
 * week 50
 * meetup with Europeana on how to proceed once media file requests counts are produced daily
 * looked into overnight sudden drop in article count on no.wikipedia.org of 30k articles (seems Mediawiki counter issue, not Wikistats)
 * mails

(18.5 hrs)
 * week 49
 * published traffic reports
 * adapted code for Medicin Translation Taskforce (which moved to google spreadsheet) (ongoing)
 * started to do daily/monthly aggregation of new hourly pageviews files from Hive successor of webstatscollector script (adapting existing script)

(12.5 hrs)
 * week 48
 * WLM reprisal (as contest continued in Oct)
 * comScore rank reassessment for
 * GLAM media file stats
 * data/config maintenance

(10 1/4 hrs)
 * week 47
 * GLAM media file stats
 * data/config maintenance

(29 1/4 hrs)
 * week 46
 * preparing for GLAM hackaton: RFC media file requests dump
 * GLAM hackaton

(3.5 hrs)
 * week 45

(17 3/4 hrs)
 * week 44
 * WLM 2014 stats (partial, will complete after Nov data are available)
 * Report Card prep
 * traffic reports
 * many mail threads

(22 hrs)
 * week 43
 * GLAM media file stats

(17 hrs)
 * week 42
 * GLAM media file stats

(31 3/4 hrs) (9 3/4 hrs)
 * week 41
 * started to look into hive (a bit)
 * studied new hive implementation of webstatscollector:
 * convert webrequests to pagecounts
 * render the pagecounts files
 * render the projectcounts files
 * commented on new pageview defs
 * generalised filters
 * week 40
 * updated PediaPress stats (adding 22 months till Nov 2013)
 * updated mailing list scanner (new aliases)
 * investigate source of implausible rise in monthly page views, see Trello card
 * prep squid reports (ongoing)

(11.5 hrs)
 * week 39
 * some page view stuff
 * prep report card

(11.5 hrs) (18 3/4 hrs)
 * week 38
 * helped define functionality for webstatscollector 2.0
 * fixed bug 57376 missing country names on this squid report
 * week 37
 * published squid based reports
 * worked on mobile stats (perc mobile per country), see also blog post
 * added support for new MSIE user agent string format to squid scripts 64125
 * investigated bug 70721, proving it's a non-fix issue
 * investigated millions of pageviews for same article by one ip address (stuck F5 key)

(18 3/4 hrs)
 * week 36
 * cleanup on stats1001/2/3,many old files removed,triggered by Ariels inventory

(19 3/4 hrs)
 * week 35
 * further research on pageviews from Africa, page views per country per language, see Google doc with charts
 * encoding issues in webstatscollector