Analytics/Wikistats/Limn Feed

From mediawiki.org

Every month a set Wikistats data are collected for Limn by a set of perl scripts:

Steps:

  • AnalyticsPrepBinariesData.pl read counts for binaries which were generated by wikistats and which reside in /a/wikistats/csv_[project code]/StatisticsPerBinariesExtension.csv. It filters and reorganizes data and produces 'analytics_in_binaries.csv'. Output contains per line: project code, language, month, extension name, count.
  • AnalyticsPrepComscoreData.pl scans

default is add only Category:Analytics /a/analytics/comscore for newest comScore csv files (with data for last 14 months), parses those csv files, adds/replaces data from these csv files into master files (containing full history) and generates input csv file 'analytics_in_comscore.csv' ready for importing into database. Note : these csv files were manually downloaded from http://mymetrix.comscore.com/app/report.aspx and were given more descriptive names (optional). The script finds newest files based on partial name search. Options: *no*-r replace

    • -i input folder, contains manually downloaded csv files from comScore (or xls files manually converted to csv)
    • -m master files with full history
    • -o output csv file, with reach per region, UV's per region and UV's per top web property, ready for import into database
  • AnalyticsPrepWikiCountsOutput.pl reads a plethora of fields from several csv files from wikistats process. It filters and reorganizes data and produces 'analytics_in_wikistats.csv', ready for import into Limn.

WikiCountsSummarizeProjectCounts.pl (which is part of daily 'pageviews_monthly.sh' job) which processes hourly projectcounts files (per wiki page view totals for one hour) from http://dammit.lt/wikistats, and generates several files on different aggregation levels. Only action here is to copy data to this folder to have all Limn input in one place. Note: this one files contains stats for all projects, but resides with wikistats csv files for Wikipedia project.*File 'analytics_in_page_views.csv' is written daily as part of