Analytics/Reportcard/2.0/Database design
From MediaWiki.org
< Analytics | Reportcard | 2.0(Redirected from Analytics/Reportcard/Database design)
Contents |
Database design Report Card 2.0 [edit]
This page documents the database design for the new reportcard ([http://reportcard.wmflabs.org/reportCard/ reportcard.wmflabs.org/reportCard/). The document consists of two sections:
- the current database design consists of 10 tables. Most of these tables have the same structure.
- a new simplified proposed design. Please chime in on the Talk page with your thoughts.
The high level goals are documented here Analytics/Reportcard/Requirements.
Current Table Designs [edit]
CREATE TABLE `binaries` ( `date` date NOT NULL, `project_code` char(2) NOT NULL DEFAULT '', `language_code` char(15) NOT NULL DEFAULT '', `extension` varchar(10) NOT NULL DEFAULT '', `binaries` bigint(15) DEFAULT NULL, PRIMARY KEY (`date`,`project_code`,`language_code`,`extension`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `page_views` ( `date` date NOT NULL, `project_code` char(2) NOT NULL DEFAULT '', `language_code` char(15) NOT NULL DEFAULT '', `views_non_mobile_raw` bigint(15) DEFAULT NULL, `views_mobile_raw` bigint(15) DEFAULT NULL, `views_non_mobile_normalized` bigint(15) DEFAULT NULL, `views_mobile_normalized` bigint(15) DEFAULT NULL, `views_raw` bigint(15) DEFAULT NULL, `views_normalized` bigint(15) DEFAULT NULL, PRIMARY KEY (`date`,`project_code`,`language_code`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `offline` ( `date` date NOT NULL, `source` varchar(255) NOT NULL DEFAULT '', `readers` bigint(12) NOT NULL DEFAULT '0', PRIMARY KEY (`date`,`source`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `mobilepageviews` ( `date` date NOT NULL, `language_code` char(15) NOT NULL DEFAULT '', `project_code` varchar(10) NOT NULL DEFAULT '', `country_code` varchar(3) NOT NULL DEFAULT '', `value` bigint(12) NOT NULL DEFAULT '0', PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editortargets` ( `date` date NOT NULL, `language_code` char(15) NOT NULL DEFAULT '', `project_code` varchar(10) NOT NULL DEFAULT '', `country_code` varchar(3) NOT NULL DEFAULT '', `value` bigint(12) NOT NULL DEFAULT '0', PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editorsbygeography` ( `date` date NOT NULL, `language_code` char(15) NOT NULL DEFAULT '', `project_code` varchar(10) NOT NULL DEFAULT '', `country_code` varchar(3) NOT NULL DEFAULT '', `value` bigint(12) NOT NULL DEFAULT '0', PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `wikistats` ( `date` date NOT NULL, `project_code` varchar(10) NOT NULL DEFAULT '', `language_code` varchar(15) NOT NULL DEFAULT '', `editors_all_time` int(10) DEFAULT NULL, `editors_new` int(7) DEFAULT NULL, `editors_ge_5` int(7) DEFAULT NULL, `editors_ge_25` int(7) DEFAULT NULL, `editors_ge_100` int(7) DEFAULT NULL, `articles` int(12) DEFAULT NULL, `articles_new_per_day` int(9) DEFAULT NULL, `articles_over_bytes_500` int(12) DEFAULT NULL, `articles_over_bytes_2000` int(12) DEFAULT NULL, `edits_per_article` decimal(9,1) DEFAULT NULL, `bytes_per_article` decimal(9,1) DEFAULT NULL, `edits` int(12) DEFAULT NULL, `size_in_bytes` int(15) DEFAULT NULL, `size_in_words` int(15) DEFAULT NULL, `links_internal` int(15) DEFAULT NULL, `links_interwiki` int(15) DEFAULT NULL, `links_image` int(15) DEFAULT NULL, `links_external` int(15) DEFAULT NULL, `redirects` int(15) DEFAULT NULL, PRIMARY KEY (`date`,`project_code`,`language_code`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Suggested Table Design [edit]
Summary of changes:
- Merge different tables in one single table
- change CHARSET to UTF8
- change column type date to timestamp
- set engine to Innodb instead of MyISAM.
- add view column which is of type ENUM
- change size of extension field to 4
- change size of language_code field to 4
- change bigint to bigint unsigned as 'value' will always be 0 or large
- rename binaries to value
- rename table name to aggregated_data
CREATE TABLE `analytics_aggregation` ( `id` int(11) NOT NULL AUTO_INCREMENT, `hashed_key` integer NOT NULL, `trait_instance_key` varchar(512) DEFAULT NULL COMMENT 'hashed_key is generated from this more descriptive key', `count` bigint(9) unsigned DEFAULT NULL, `project` char(32) NOT NULL COMMENT 'Wikimedia project code', `language_code` char(4) NOT NULL COMMENT 'Wikimedia language code, usually derived from project URL', `aggregate_timestamp` timestamp NULL DEFAULT NULL COMMENT 'timestamp for which this aggregate counts data', `granularity` integer NULL DEFAULT NULL COMMENT 'time granularity in seconds for which this aggregate counts data ', `theme` enum('readers','editors','devices','articles','diversity','ecosystem','files','context') DEFAULT NULL, PRIMARY KEY (`id`), KEY `hashed_key` (`hashed_key`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;