Analytics/Wikistats/ReportCard

The Wikimedia Report Card provides a compact overview, trending charts and synopsis, for basic Wikimedia metrics. Reports are primarily targeting WMF management and staff, but most are also available for the general public. Charts ideally cover three years. The report is published each month, before the first Thursday of the month, for the one but last month (e.g. Report Card published in March shows data for January).

Considerable manual intervention is needed. See WMF office wiki for details (private wiki). Several scripts collect, reprocess and format input, followed by a lot of copy & paste (C&P) into a huge spreadsheet which generates about 40 charts. These charts are saved in a two step process (C&P to Paintshop as intermediary for best output quality). Follows manual analyis and crafting of synopsis (Q&D input into perl script).

The perl script ReportCardGenerateHtml.pl generates three variations of these reports from a custom html template RT_yyyy_mm.html (home grown syntax): a one column summary for online viewing, a two column summary for printing (obsolete?), a detailed version with more charts, plus a synopsis.

Web pages are generated in two sets with one extra chart with comScore metrics in a staff-only set about one other high profile web properties (which we receive for free but are not allowed to publish). All in all quite a time intensive process. A project for further automation has started May 2011.

Data used
See below for abbreviations used.

Unique Visitors for all All Wikimedia Projects
Source: comScore/MyMetrix / Saved reports (tab showing *) / Stu's Wikimedia Reports / Multi-Country Media Trend, UV's by region
 * Unique Visitors (linear/indexed)
 * downloaded and renamed to Multi-Country Media Trend, UVs by region_(Jan 10 - Mar 11).csv (e.g. )

Source:comScore/MyMetrix / Saved reports (tab showing *) / Stu's Wikimedia Reports / Multi-Country Media Trend, % reach by region (e.g. )
 * Reach Percentage (linear/indexed)
 * downloaded and renamed to Multi-Country Media Trend, % reach by region_(Jan 10 - Mar 11).csv

Data for newest month in both csv files are C&P to Excel

Page Requests for All Wikimedia Projects
Source: squid log via dammit.lt/wikistats


 * Hourly projectcount files are downloaded daily from and added to bayes:/a/dammit.lt/projectcounts/projectcounts-[yyyy].tar
 * Daily run of WikiCountsSummarizeProjectCounts.pl (part of bayes:/home/ezachte/wikistats/pageviews_montly.sh) reads all tar files since 2008 (needs optimization) and writes 8 sets of csv files (one set per project): page views per wiki per day/week/month, and more + a set of csv files for RC (with all projects data merged)

Two files are used for RC (where data for all 8 projects have already been merged):
 * bayes:/a/wikistats/csv_wp/PageViewsPerMonthPopularWikisNormalized_yyyy_mm.csv (C&P -> Excel)
 * This file (e.g. ) contains 4 time series: page view totals per month for largests 25 wikis (mobile/non-mobile, combined) + per project


 * bayes:/a/wikistats/csv_wp/PageViewsMoversShakersPopularWikisNormalized_yyyy_mm.html (included as it in generated RC html)

Web Properties - Unique Visitors
Source: comScore/MyMetrix / Saved reports (tab showing *) / Wikimedia / Top 1000 properties, UV trend [Media Trend]
 * downloaded and renamed to Top 1000 poperties, UV trend_(Jan 10 - Mar 11).csv (no example, non-public info we get from comScore for free)

Data for newest month in both csv files are C&P to Excel

Abbreviations

 * RC: Report Card
 * C&P: Copy&Paste
 * Q&D: Quick and Dirty (meaning code needs improvement)
 * project codes used in wikistats are
 * wb: Wikibooks
 * wn: Wikinews
 * wk: Wiktionary
 * wp: Wikipedia
 * wq: Wikiquote
 * ws: Wikisource
 * wv: Wikiveristy
 * wx: Other projects (e.g. commons, meta)