User:Erik Zachte (WMF)/Progress

For earlier history see User:Erik_Zachte/progress (I'll migrate that history to this page when I'm allowed (new user, external links))


 * to do: start wikistats run for wp:de based on full archive dump to help address heavy debate on wp:de (needs code fix, as it needs to process 'new' partial dump files)
 * to do: removal of last config files from squid reports, and replace by cmd line parameters)
 * to do: finalize monthly reporting of unsampled edits from squid log (rather than yearly avg)
 * to be resumed: analyze effects of world wide switch to https on 28 August on squid log stats;
 * to do: investigate duplication of page histories due to import of translated articles on other wiki (reported by Phoebe, Dec 6 2013)
 * to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)
 * to do: comment on metrics definitions
 * to do: add comment to page view report https://bugzilla.wikimedia.org/show_bug.cgi?id=57980#c6


 * week 51
 * end of year administrative housekeeping / reorg.

(13 3/4)
 * week 50
 * meetup with Europeana on how to proceed once media file requests counts are produced daily
 * looked into overnight sudden drop in article count on no.wikipedia.org of 30k articles (seems Mediawiki counter issue, not Wikistats)
 * mails

(18.5 hrs)
 * week 49
 * published traffic reports
 * adapted code for Medicin Translation Taskforce (which moved to google spreadsheet) (ongoing)
 * started to do daily/monthly aggregation of new hourly pageviews files from Hive successor of webstatscollector script (adapting existing script)

(12.5 hrs)
 * week 48
 * WLM reprisal (as contest continued in Oct)
 * comScore rank reassessment for
 * GLAM media file stats
 * data/config maintenance

(10 1/4 hrs)
 * week 47
 * GLAM media file stats
 * data/config maintenance

(29 1/4 hrs)
 * week 46
 * preparing for GLAM hackaton: RFC media file requests dump
 * GLAM hackaton

(3.5 hrs)
 * week 45

(17 3/4 hrs)
 * week 44
 * WLM 2014 stats (partial, will complete after Nov data are available)
 * Report Card prep
 * traffic reports
 * many mail threads

(22 hrs)
 * week 43
 * GLAM media file stats

(17 hrs)
 * week 42
 * GLAM media file stats

(31 3/4 hrs) (9 3/4 hrs)
 * week 41
 * started to look into hive (a bit)
 * studied new hive implementation of webstatscollector:
 * convert webrequests to pagecounts
 * render the pagecounts files
 * render the projectcounts files
 * commented on new pageview defs
 * generalised filters
 * week 40
 * updated PediaPress stats (adding 22 months till Nov 2013)
 * updated mailing list scanner (new aliases)
 * investigate source of implausible rise in monthly page views, see Trello card
 * prep squid reports (ongoing)

(11.5 hrs)
 * week 39
 * some page view stuff
 * prep report card

(11.5 hrs) (18 3/4 hrs)
 * week 38
 * helped define functionality for webstatscollector 2.0
 * fixed bug 57376 missing country names on this squid report
 * week 37
 * published squid based reports
 * worked on mobile stats (perc mobile per country), see also blog post
 * added support for new MSIE user agent string format to squid scripts 64125
 * investigated bug 70721, proving it's a non-fix issue
 * investigated millions of pageviews for same article by one ip address (stuck F5 key)

(18 3/4 hrs)
 * week 36
 * cleanup on stats1001/2/3,many old files removed,triggered by Ariels inventory

(19 3/4 hrs)
 * week 35
 * further research on pageviews from Africa, page views per country per language, see Google doc with charts
 * encoding issues in webstatscollector