Wikistats/Analytics Database Feed

From MediaWiki.org
Jump to: navigation, search

Wikistats is one the processes that feeds a dedicated statistics database named 'analytics' (currently on test server 'project2'). This database has been designed to feed the new Executive Dashboard (work in progress, see prototype), This database can also be queried via a public API. The feeder scripts reside at server 'bayes', in folder '/a/analytics'. Scripts also are archived in Subversion.

[edit] analytics_new.sh

Defines the database and tables and loads data from existing csv files. For this it executes SQL from 'analytics_create_and_load_from_csv.txt'

[edit] analytics_upd.sh

Prepares new csv files, by invoking 'analytics_generate_csv_files.sh' (see below). Then it refreshes tables by full reload from csv files in this folder. For this it executes SQL from 'analytics_refresh_from_csv.txt'.

[edit] analytics_generate_csv_files.sh

Invoked by cron scheduled 'analytics_upd.sh'. It prepares a set of csv files from several sources, to be imported into the analytics database. Naming convention: All csv files which contain direct input for database have '_in_' in the file name.

Steps:

  • AnalyticsPrepBinariesData.pl read counts for binaries which were generated by wikistats and which reside in /a/wikistats/csv_[project code]/StatisticsPerBinariesExtension.csv. It filters and reorganizes data and produces 'analytics_in_binaries.csv'. Output contains per line: project code, language, month, extension name, count.
  • AnalyticsPrepComscoreData.pl scans /a/analytics/comscore for newest comScore csv files (with data for last 14 months), parses those csv files, adds/replaces data from these csv files into master files (containing full history) and generates input csv file 'analytics_in_comscore.csv' ready for importing into database. Note : these csv files were manually downloaded from http://mymetrix.comscore.com/app/report.aspx and were given more descriptive names (optional). The script finds newest files based on partial name search. Options:
    • -r replace (default is add only)
    • -i input folder, contains manually downloaded csv files from comScore (or xls files manually converted to csv)
    • -m master files with full history
    • -o output csv file, with reach per region, UV's per region and UV's per top web property, ready for import into database
  • AnalyticsPrepWikiCountsOutput.pl reads a plethora of fields from several csv files from wikistats process. It filters and reorganizes data and produces 'analytics_in_wikistats.csv', ready for import into analytics database.
  • File 'analytics_in_page_views.csv' is written daily as part of WikiCountsSummarizeProjectCounts.pl (which is part of daily 'pageviews_monthly.sh' job) which processes hourly projectcounts files (per wiki page view totals for one hour) from http://dammit.lt/wikistats, and generates several files on different aggregation levels. Only action here is to copy data to this folder to have all datbase input in one place. Note: this one files contains stats for all projects, but resides with wikistats csv files for Wikipedia project.
Personal tools
Namespaces

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox