User:Erik Zachte (WMF)/Progress

From mediawiki.org

For earlier history see User:Erik_Zachte/progress

  • to do: investigate duplication of page histories due to import of translated articles on other wiki (reported by Phoebe, Dec 6 2013)
  • to do: look into page view forecast algorithm, no longer sure how that works (and add some comments in the code)

week 33

( xx hrs)

  • mostly worked on new D3 viz: geo breakdown of WMF traffic
week 32

(10 1/4 hrs)

  • mostly worked on new D3 viz: geo breakdown of WMF traffic
week 31

(4 1/2 hrs)

  • misc.
week 30

(5 3/4 hrs)

  • misc.
week 29

(22 1/2 hrs)

  • mostly worked on new D3 viz: geo breakdown of WMF traffic
week 28

(9 1/4 hrs)

  • mostly worked on new D3 viz: geo breakdown of WMF traffic
week 27

(30 1/2 hrs)

  • mostly worked on new D3 viz: geo breakdown of WMF traffic
week 26

(18 hrs)

  • started new D3 viz: geo breakdown of WMF traffic
week 25

(21 3/4 hrs)

week 24

(6 hrs)

week 23

(12 3/4 hrs)

week 22

(4 1/2 hrs)

  • published Wikistats reports
week 21

(14 3/4 hrs)

week 19

(7 1/4 hrs)

week 18

(12 hrs)

week 17

(12 3/4 hrs)

week 16

(14 1/4 hrs)

  • ... (?)
week 15

(11 hrs)

  • server maintenance, backup scripts updated, lots of pruning of backups
week 14

(18 1/2 hrs)

  • published traffic by country reports for Mar 2016 and 2016 Q1
  • investigated why so many inactive Wikipedias seem active all of a sudden (turns out these are rather trivial seeding activities from at least two ip addresses: 73.182.28.179 130.254.150.79)
  • extra metrics on Sitemap page + set default sort order via url, see [1]
  • sent update on video view stats for Khan Academy
week 13

( 8 hrs)

  • expanded sitemap pages with extra metrics (e.g. [1]), taken from another lesser known table ([2]), as they deserve more prominence, and can help to set sane limits for inclusion criteria of active wikis
week 12

(11 1/4 hrs)

  • worked on sane limits for inclusion criteria of active wikis
  • feedback on new traffic reports
week 11

(13 1/4 hrs)

  • published Wikistats reports for February
  • published regional traffic reports for February [3] [4] [5]
(only traffic reports for which Wikistats is still responsible, but which have been migrated to hadoop (aka webstatscollector 3.0))
week 10

(4 1/4 hrs)

week 9

(1 hr)

week 8

(8 3/4 hrs)

week 7

(15 1/4 hrs)

week 6

(10 3/4 hrs)

  • mediacounts for webm files in category Videos_from_Osmosis for Rishi Desai and James Heilman
  • custom data for WMNL for one-time mass mailing to recently very active users
  • wikistat monthly reports
  • investigating glitch in previous release of Wikistats reports [6]
  • press inquiry about total size of Wikipedia (Awuku, Yaw Boateng from German website t-online.de)
week 5

(9 1/4 hrs)

  • see week 5
week 4

(19 1/4 hrs)

  • working on script to collect bot free counts for all months before May 2015, so that we can patch our PV history, using a close approximation of new pageview definition
week 3

(3 1/4 hrs)

week 2

(9 1/4 hrs)

  • WLA stats
week 1

(9 1/4 hrs)

week 53

(5 1/2 hrs)

week 52

(5 hrs)

  • administrative
week 51

(11 1/2 hrs)

  • started to revive regional reports using new hourly hadoop-based csv files
week 50

(11 3/4 hrs)

week 49

(13 1/4 hrs)

week 48

(22 3/4 hrs)

week 47

(19 1/4 hrs)

week 46

(38 3/4 hrs)

week 45

(29 3/4 hrs)

week 44

(21 1/2 hrs)

week 43

(11 hrs)

  • ...
  • partially answered T113406 Quantifying the "sum of all contributors"
week 42

(17 3/4 hrs)

  • published monthly dump reports
  • working on data files for geo reports, see T114379
  • working on assessing reliability of US per state breakdown of page views
week 41

(9 3/4 hrs)

week 40

(24 hrs)

  • created process flow diagrams + proposed changes for Wikistats Pageview Reports T114379
  • lots of discussions on this
week 39

(8.5 hrs)

week 38

(18 hrs)

week 37

(5 1/4 hrs)

  • data for Dario, for Lila
  • published monthly Wikistats reports
  • discussed/contributed to meaning of editor activity counts
week 36

(8 1/4 hrs)

week 35

(10 hrs)

  • published monthly Wikistats reports
  • collected some data for audit report, see [8]
  • investigating way too low comScore stats
  • discussed/contributed to upcoming signpost article on editor trends, lots of miss/500 in recent days, WLM retention and new signups
week 34

(17 hrs)

  • debugged documentation for [9] new hourly pageview files
  • investigating Percentage pageviews from Russia is too low in recent geographical breakdowns in Wikistats
  • added orwikisource to Wikistats
  • discussed/contributed to stats.grok.se upgrade, upcoming signpost blog on growth in monthly editors, GA stats using Wikimedia Stats (wikimedia-l), protocol indepence on stats.wikimedia.org portal, missing browser reports, detailed geohack tools (wikitech-l), quarterly goals
week 33

(5 hrs)

week 32

(11 1/4 hrs)

week 31

(25 1/2 hrs)

  • JIT (?) workaround for corrupt stub dumps, needed soon for Quarterly Report Card
  • opened discussion (mail and wiki) on future of Wikistats traffic reports, see also [10] and [11]
  • updated all traffic reports with error notice for geo reports and call to vote to all reports
  • prepared data for Quarterly Report Card and Monthly Report Card
week 30

(9 3/4 hrs)

week 29

(6 3/4 hrs)

  • published squid log based view/edit reports for 2015-04/05/06
    • N/S reports still suffers from large percentage (~5%) not attributed to North or South (so didn't publish update)
  • investigating page views underreporting in June/July 2015
week 28

(1 1/4 hrs)

week 27

(3/4 hr)

week 26

(8.5 hrs)

week 25

(2 3/4 hrs)

week 24

(11 hrs)

week 23

(3 hrs)

week 22

(7:30 hrs)

week 21

(11 3/4 hrs)

  • wikistats script maint for stub dumps
week 20

(21:30 hrs)

  • stub dump woes, see also T89273
  • quarterly report JIT
  • editor trends as YoY, see blog post
week 19

(14 3/4 hrs)

  • last tweaks on analyzing traffic to wikistats site
  • tweaked script to follow dump progress (woes with timely delivery continue)
  • report card
week 18

(12 3/4 hrs)

  • analyzing traffic to wikistats site
  • extra meetings, about reorganization
week 17

(22.15 hrs)

  • reviving stalled squid reports
  • fixing maintenance scripts (backups)
week 16

(11.15 hrs)

  • Attended GLAM-WIKI conference (day 3)
  • Analysis of usage of Khan videos for James Heilman
  • reviving stalled squid reports (ongoing)
week 15

(24.5 hrs)

  • Attended GLAM-WIKI conference (day 1,2)
  • Input for Nuria's/Andrew's talk proposal at NYC
week 14

(2.5 hrs)

  • Looked at how many images on commons are linked to or requested on a single day (for Jaime)
week 13

(14 1/4 hrs)

  • Analyzed unsampled edits logs,both old (webstatscollector) and new (hadoop), see [12]
  • Published and announced new media file request count dumps
  • Collected metrics and analyzed anomaly (bug) for Swiss TV Request
  • Collected metrics for WHYY, the NPR affiliate station in Philadelphia
week 12

(13 3/4 hrs)

  • Analyzed unsampled edits logs,both old (webstatscollector) and new (hadoop), see [13]
week 11

(9 hrs)

  • Report Card for March (delayed because of missing dumps)
week 10

(17 hrs)

  • After fixing T90230 last week, rsync of daily aggregates of page view still didn't happen. Turns out rsync now needs -ipv4 parameter.
  • prerelease Wikistats (dumps for February not all in yet)
  • Report Card (ongoing) (data for comScore not yet accessible, subscription expired)
  • Data fact check for Communications
week 9

(20 hrs)

week 8

(17 1/2 hrs)

  • testing of new media file request dump
  • user requests (new log item):
    • [question] Raw file stats vs pageview API stats: (Jason Bub)
    • [question] [data] monthly per country view stats (RĂźtger Egolf, Research Assistant at Centre for European Economic Research)
    • [question] Explain how wikilinks are counted in wikistats (explained perl code) by
week 7

(25 1/4 hrs)

  • derive estimates for new quarterly report card from incomplete data (dumps have stalled) by extrapolation
    • adapt wikistats scripts to allow merge of totalactive editors for only those wikis which have data for latest month
    • [14] Provide total active editors (TAE) for December 2014
    • [15] Report Edits for 2014 Oct-Dec
week 6

(22 1/2 hrs)

  • partial publishing of RC input (dumps are lagging)
  • analyze progress of dump generation (by parsing index.html for 900+ wikis, for all available dump dates),
    • autonomous growth is dump sizes and job length can be shown
    • with a few further tweaks this scan can be run say half an hour, and also report on stalled dump jobs
week 5

(16 1/4 hrs)

  • fixed 2 issues (coding & config glitch) which made Summary charts not update since Sep 2014, see e.g. [16]
  • final tweaks (hopefully) for Wiki Loves Africa reporting
  • investigating 5 percent of page views /edit from sampled squid logs which don't have country info (ongoing)
  • issues with dumps (lagging behind, ongoing)
  • reassessment of where we are with issues with media file request counts RFC
week 4

(8 3/4)

  • fixed wikivoyage report showing wikipedia counts for el/fa
  • rerun Wiki Loves Africa reporting (now using categories *and* templates to find all images)
week 3

(17 hrs)

  • analysis of maintenance categories on wp:en (req. Lila), first release published
  • finalized analysis of wp:en maintenance categories (req. Lila), see [17]
  • adapted several script to use proxy on stat1002 from now on, see [18]
  • added Persian and Greek wikivoyage and looked into extraordinary large page counts for those two wikis
week 2

(22 1/4 hrs)

  • Wiki Loves Africa reporting (ongoing, looking into discrepancies)
  • analysis of maintenance categories on wp:en (req. Lila), ongoing
  • most wikistats reporting broken due to recent config changes, several issues
    • stat1001 changed to private IP (Putty config fixed)
    • updated all bash files for new access to stat1001
  • daily aggregation of page views aborted due to trivial error -> Q&D fix
week 53/2014 1/2015

(1 hrs)

week 52

(8 hrs)

  • misc maint.
week 51

(9 3/4 hrs)

  • end of year administrative housekeeping / reorg.
week 50

(13 3/4)

  • meetup with Europeana on how to proceed once media file requests counts are produced daily
  • looked into overnight sudden drop in article count on no.wikipedia.org of 30k articles (seems Mediawiki counter issue, not Wikistats)
  • mails
week 49

(18.5 hrs)

  • published traffic reports
  • adapted code for Medicin Translation Taskforce (which moved to google spreadsheet) (ongoing)
  • started to do daily/monthly aggregation of new hourly pageviews files from Hive successor of webstatscollector script (adapting existing script)
week 48

(12.5 hrs)

  • WLM reprisal (as contest continued in Oct)
  • comScore rank reassessment for [19]
  • GLAM media file stats
  • data/config maintenance
week 47

(10 1/4 hrs)

  • GLAM media file stats
  • data/config maintenance
week 46

(29 1/4 hrs)

week 45

(3.5 hrs)

week 44

(17 3/4 hrs)

  • WLM 2014 stats (partial, will complete after Nov data are available)
  • Report Card prep
  • traffic reports
  • many mail threads
week 43

(22 hrs)

  • GLAM media file stats
week 42

(17 hrs)

  • GLAM media file stats
week 41

(31 3/4 hrs)

week 40

(9 3/4 hrs)

  • updated PediaPress stats (adding 22 months till Nov 2013)
  • updated mailing list scanner (new aliases)
  • investigate source of implausible rise in monthly page views, see Trello card
  • prep squid reports (ongoing)
week 39

(11.5 hrs)

  • some page view stuff
  • prep report card
week 38

(11.5 hrs)

  • helped define functionality for webstatscollector 2.0
  • fixed bug 57376 missing country names on this squid report
week 37

(18 3/4 hrs)

  • published squid based reports
  • worked on mobile stats (perc mobile per country), see also blog post
  • added support for new MSIE user agent string format to squid scripts 64125
  • investigated bug 70721, proving it's a non-fix issue
  • investigated millions of pageviews for same article by one ip address (stuck F5 key)
week 36

(18 3/4 hrs)

  • cleanup on stats1001/2/3,many old files removed,triggered by Ariels inventory
  • ...
week 35

(19 3/4 hrs)

  • further research on pageviews from Africa, page views per country per language, see Google doc with charts
  • encoding issues in webstatscollector