User:Erik Zachte (WMF)/Progress

For earlier history see User:Erik_Zachte/progress (I'll migrate that history to this page when I'm allowed (new user, external links))


 * to do: start wikistats run for wp:de based on full archive dump to help address heavy debate on wp:de (needs code fix, as it needs to process 'new' partial dump files)
 * to do: removal of last config files from squid reports, and replace by cmd line parameters)
 * to do: finalize monthly reporting of unsampled edits from squid log (rather than yearly avg)
 * to be resumed: analyze effects of world wide switch to https on 28 August on squid log stats;
 * to do: investigate duplication of page histories due to import of translated articles on other wiki (reported by Phoebe, Dec 6 2013)
 * to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)
 * to do: comment on metrics definitions
 * to do: add comment to page view report https://bugzilla.wikimedia.org/show_bug.cgi?id=57980#c6

(9 3/4 hrs)
 * week 41
 * started to look into hive (a bit)
 * studied new hive implementation of webstatscollector:
 * convert webrequests to pagecounts
 * render the pagecounts files
 * render the projectcounts files
 * commented on new pageview defs
 * generalised filters
 * week 40
 * updated PediaPress stats (adding 22 months till Nov 2013)
 * updated mailing list scanner (new aliases)
 * investigate source of implausible rise in monthly page views, see Trello card
 * prep squid reports (ongoing)

(11.5 hrs)
 * week 39
 * some page view stuff
 * prep report card

(11.5 hrs) (18 3/4 hrs)
 * week 38
 * helped define functionality for webstatscollector 2.0
 * fixed bug 57376 missing country names on this squid report
 * week 37
 * published squid based reports
 * worked on mobile stats (perc mobile per country), see also blog post
 * added support for new MSIE user agent string format to squid scripts 64125
 * investigated bug 70721, proving it's a non-fix issue
 * investigated millions of pageviews for same article by one ip address (stuck F5 key)

(18 3/4 hrs)
 * week 36
 * cleanup on stats1001/2/3,many old files removed,triggered by Ariels inventory

(19 3/4 hrs)
 * week 35
 * further research on pageviews from Africa, page views per country per language, see Google doc with charts
 * encoding issues in webstatscollector