Talk:Analytics/Reportcard/2.0/Database design

change column type date to datetime
If you don't need to store dates before 1970, I recommend using TIMESTAMP for all date/time columns. They are stored as integers but work with MySQL date functions DIEDERIK: OK, will change that.

set engine to Innodb instead of MyISAM.
Otto: if we are planning on doing lots of updates, then definitely. If not, then MyISAM is probably fine. Also, if we use MyISAM then we could also use a MERGE table to aggregate a bunch of these smaller tables. As long as we are pretty sure the structure of this table will rarely change, keeping lots of smaller tables and then making them look like one big table is very useful. A rotator script could be responsible for creating a new data table (once a week? once a month?), and then altering the MERGE table UNION definition to include the new table. INSERTs will be inserted into the last table in the UNION list, and SELECTs on the MERGE table will select across all of the underlying tables and aggregate the results.

See: http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html

DIEDERIK: My understanding is that InnoDB has become the default and that performance wise they are very close.

add view column which is of type ENUM
I like ENUMs when either the table size is going to stay small, or we can be 100% certain that we will never need to alter the ENUM column to add or remove values. Could we use a separate join table instead? So, somethinglike:

So instead of this in the data table `view` char(9) NOT NULL ENUM('media', 'mobile', 'editors', 'offline', 'targets', 'wikistats', 'pageviews'),

Have this `data_view_id` int(9) NOT NULL,

And then CREATE TABLE data_view (`data_view_id` int(9) NOT NULL, name varchar(256) ); INSERT INTO data_view(name) VALUES('media', 'mobile', 'editors', 'offline', 'targets', 'wikistats', 'pageviews');

DIEDERIK: Yes, we could that. However, I am not sure if we can put an index on a foreign key and I want to avoid JOINS.

`value` bigint(9) unsigned DEFAULT 0,
Are you always storing integers? maybe a BLOB type here would be more appropriate for free form data?

DIEDERIK: To keep it simple, I would say let's assume we will only store integers. That allows us to do sum, max, min and other statistical queries.

rename table name to data
I am a huge fan of long unambiguous names. 'data' is probably appropriate here, because this is, um, 'data', but then again, what isn't data? How about something like 'log_data', or 'log' or 'analytics_data' or something?

DIEDERIK: Yes, you are right, I'll come up with something more descriptive.

PRIMARY KEY (`dt`,`project_code`,`language_code`,`view`, `key`)
I believe that all tables should have a single (probably auto increment) unsigned integer primary key. You can make this combo a UNIQUE KEY instead.

DIEDERIK: maybe we should store the hashed value of `dt`,`project_code`,`language_code`,`view`, `key` and index that, and have an ordinary primary key.