Analytics/Reportcard/2.0/Database design

From mediawiki.org
Request for comment (RFC)
Database design
Component General
Creation date
Author(s) Erik Zachte, Diederik van Liere
Document status draft
withdrawn

Database design Report Card 2.0[edit]

This page documents the database design for the new reportcard ([http://reportcard.wmflabs.org/reportCard/ reportcard.wmflabs.org/reportCard/). The document consists of two sections:

  • the current database design consists of 10 tables. Most of these tables have the same structure.
  • a new simplified proposed design. Please chime in on the Talk page with your thoughts.

The high level goals are documented here Analytics/Reportcard/Requirements.

Current Table Designs[edit]

CREATE TABLE `binaries` (
  `date` date NOT NULL,
  `project_code` char(2) NOT NULL DEFAULT '',
  `language_code` char(15) NOT NULL DEFAULT '',
  `extension` varchar(10) NOT NULL DEFAULT '',
  `binaries` bigint(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`,`extension`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `page_views` (
  `date` date NOT NULL,
  `project_code` char(2) NOT NULL DEFAULT '',
  `language_code` char(15) NOT NULL DEFAULT '',
  `views_non_mobile_raw` bigint(15) DEFAULT NULL,
  `views_mobile_raw` bigint(15) DEFAULT NULL,
  `views_non_mobile_normalized` bigint(15) DEFAULT NULL,
  `views_mobile_normalized` bigint(15) DEFAULT NULL,
  `views_raw` bigint(15) DEFAULT NULL,
  `views_normalized` bigint(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `offline` (
  `date` date NOT NULL,
  `source` varchar(255) NOT NULL DEFAULT '',
  `readers` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`source`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `mobilepageviews` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editortargets` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editorsbygeography` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `wikistats` (
  `date` date NOT NULL,
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `language_code` varchar(15) NOT NULL DEFAULT '',
  `editors_all_time` int(10) DEFAULT NULL,
  `editors_new` int(7) DEFAULT NULL,
  `editors_ge_5` int(7) DEFAULT NULL,
  `editors_ge_25` int(7) DEFAULT NULL,
  `editors_ge_100` int(7) DEFAULT NULL,
  `articles` int(12) DEFAULT NULL,
  `articles_new_per_day` int(9) DEFAULT NULL,
  `articles_over_bytes_500` int(12) DEFAULT NULL,
  `articles_over_bytes_2000` int(12) DEFAULT NULL,
  `edits_per_article` decimal(9,1) DEFAULT NULL,
  `bytes_per_article` decimal(9,1) DEFAULT NULL,
  `edits` int(12) DEFAULT NULL,
  `size_in_bytes` int(15) DEFAULT NULL,
  `size_in_words` int(15) DEFAULT NULL,
  `links_internal` int(15) DEFAULT NULL,
  `links_interwiki` int(15) DEFAULT NULL,
  `links_image` int(15) DEFAULT NULL,
  `links_external` int(15) DEFAULT NULL,
  `redirects` int(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Suggested Table Design[edit]

Summary of changes:

  • Merge different tables in one single table
  • change CHARSET to UTF8
  • change column type date to timestamp
  • set engine to Innodb instead of MyISAM.
  • add view column which is of type ENUM
  • change size of extension field to 4
  • change size of language_code field to 4
  • change bigint to bigint unsigned as 'value' will always be 0 or large
  • rename binaries to value
  • rename table name to aggregated_data
CREATE TABLE `analytics_aggregation` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hashed_key` integer NOT NULL,
  `trait_instance_key` varchar(512) DEFAULT NULL COMMENT 'hashed_key is generated from this more descriptive key',
  `count` bigint(9) unsigned DEFAULT NULL,
  `project` char(32) NOT NULL COMMENT 'Wikimedia project code',
  `language_code` char(4) NOT NULL COMMENT 'Wikimedia language code, usually derived from project URL',
  `aggregate_timestamp` timestamp NULL DEFAULT NULL COMMENT 'timestamp for which this aggregate counts data',
  `granularity` integer NULL DEFAULT NULL COMMENT 'time granularity in seconds for which this aggregate counts data ',
  `theme` enum('readers','editors','devices','articles','diversity','ecosystem','files','context') DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `hashed_key` (`hashed_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Background Links[edit]