Notifications/Database layout

Tables in Echo

 * echo_event - storage for notification triggering event, for example, edit-user-talk event would trigger talk page notification and email. this table has one to many relation with echo_notification

`event_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `event_timestamp` binary(14) NOT NULL, `event_type` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `event_variant` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, `event_agent_id` int(10) unsigned DEFAULT NULL, `event_agent_ip` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, `event_page_namespace` int(10) unsigned DEFAULT NULL, `event_page_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, `event_extra` blob, PRIMARY KEY (`event_id`), KEY `type_page` (`event_type`,`event_page_namespace`,`event_page_title`,`event_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
 * echo_event | CREATE TABLE `echo_event` (


 * echo_notification - storage for individual notification

`notification_event` int(10) unsigned NOT NULL, `notification_user` int(10) unsigned NOT NULL, `notification_timestamp` binary(14) NOT NULL, `notification_read_timestamp` binary(14) DEFAULT NULL, `notification_bundle_base` tinyint(3) unsigned NOT NULL DEFAULT '1', `notification_bundle_hash` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `notification_bundle_display_hash` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, UNIQUE KEY `user_event` (`notification_user`,`notification_event`), KEY `user_timestamp` (`notification_user`,`notification_timestamp`), KEY `echo_notification_user_base_timestamp` (`notification_user`,`notification_bundle_base`,`notification_timestamp`), KEY `echo_notification_user_hash_timestamp` (`notification_user`,`notification_bundle_hash`,`notification_timestamp`), KEY `echo_notification_user_hash_base_timestamp` (`notification_user`,`notification_bundle_display_hash`,`notification_bundle_base`,`notification_timestamp`), KEY `echo_notification_user_base_read_timestamp` (`notification_user`,`notification_bundle_base`,`notification_read_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
 * echo_notification | CREATE TABLE `echo_notification` (
 * echo_subscription - this table is currently not used since we don't support subscription, it's mainly for user who wants to get notification on a particular event of a particular page; for example, I want to get notification on John Dow's talk page notification

`sub_user` int(10) unsigned NOT NULL, `sub_event_type` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `sub_page_namespace` int(10) unsigned DEFAULT NULL, `sub_page_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, `sub_notify_type` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `sub_enabled` tinyint(1) unsigned NOT NULL DEFAULT '1', UNIQUE KEY `user_subscriptions` (`sub_user`,`sub_event_type`,`sub_page_namespace`,`sub_page_title`,`sub_notify_type`,`sub_enabled`), KEY `page_subscriptions` (`sub_page_namespace`,`sub_page_title`,`sub_event_type`,`sub_user`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
 * echo_subscription | CREATE TABLE `echo_subscription` (
 * echo_email_batch - this table stores a list of event_id for a user, this is primarily used for daily/weekly email digest and the email bundling. For email digest, the cron would look at the event for a user and send a email batch.  For email bundling, a delayed job would be popped off the queue and look for the bundle-able event based on hash and user_id.

`eeb_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `eeb_user_id` int(10) unsigned NOT NULL, `eeb_event_priority` tinyint(3) unsigned NOT NULL DEFAULT '10', `eeb_event_id` int(10) unsigned NOT NULL, `eeb_event_hash` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, PRIMARY KEY (`eeb_id`), UNIQUE KEY `echo_email_batch_user_event` (`eeb_user_id`,`eeb_event_id`), UNIQUE KEY `echo_email_batch_user_priority_event` (`eeb_user_id`,`eeb_event_priority`,`eeb_event_id`), UNIQUE KEY `echo_email_batch_user_hash_event` (`eeb_user_id`,`eeb_event_hash`,`eeb_event_id`), UNIQUE KEY `echo_email_batch_user_priority_hash_event` (`eeb_user_id`,`eeb_event_priority`,`eeb_event_hash`,`eeb_event_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
 * echo_email_batch | CREATE TABLE `echo_email_batch` (
 * echo_email_bundle_status - this table stores the timestamp and possibly status of a bundle-able email. For example, We bundle page link notification by title.  In the 1st hour, we send a single notification: 'San Francisco was linked by User A', in the 5th hour, we send bundle notification: 'San Francisco was linked by User B and 7 others', in 9th hour, no notification, reset cycle, send single notification again for the next notification, then start the 4 hour cycle again, this table is used to remember the state of a particle bundle notification

`eebs_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `eebs_user_id` int(10) unsigned NOT NULL, `eebs_bundle_hash` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `eebs_last_email_timestamp` binary(14) NOT NULL, PRIMARY KEY (`eebs_id`), UNIQUE KEY `echo_email_user_hash` (`eebs_user_id`,`eebs_bundle_hash`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
 * echo_email_bundle_status | CREATE TABLE `echo_email_bundle_status` (

Web and email bundling background
For web bundling, we want to bundle based on user's last visit on notification overlay or page for each bundle-able group, here is an example

1. User receives two new page link notification

2. User visits the overlay, he would see "San Francisco was linked by UserA and 1 other"

3. User receives 3 new page link notification

4. User visits the overlay, he would see


 * "San Francisco was linked by UserX and 2 others"
 * "San Francisco was linked by UserA and 1 other"

For email bundling of the notifications above, it would follow the rules detailed in echo_email_bundle_status table, initially send out a single notification, then bundle notification every 4 hours till the cycle reset.

Key columns for bundling
echo_notification.notification_bundle_hash - This is a hash to determine whether a group of notifications is bundle-able

echo_notification.notification_bundle_display_hash - for web bundling, we want to further bundle based on user's visit to the overlay or the page.

For the 5 individual notifications in the bundling background section, they would have the same notification_bundle_hash, the first 2 have the same notification_bundle_display hash, and the last 3 have the same notification_bundle_display_hash.

echo_notification.notification_bundle_base - Notification overlay and page would use this to pull list of notifications. For example, If 3 notifications have the same notification_display_hash, only one of them will be pulled by the list, and it the rest is part of "and X others". notification_bundle_base is mainly used to avoid the usage of group by/distanct, because they involve a full index scan and a temporary table.

Index usage

 * echo_notification (`notification_user`,`notification_bundle_base`,`notification_timestamp`)
 * this index is used to pull a list ( currently 10 ) of notifications for the overlay and the notification page. For notifications that should be bundled as one notification, there would be only record with notification_bundle_base = 1 and the others are 0, the idea is to pull the base notification/event and construct the bundle message ( and X others ) for each of them.  The main reason to use notification_bundle_base is to avoid the usage of GROUP BY, since GROUP BY does a full index scan with a temp table for all applicable records
 * for notification_bundle_base, the number of 0 should be much greater than 1
 * notification_event was not added to the key mainly because there would not be many event for the same notification_timestamp

example query:

mysql> explain SELECT * FROM `echo_notification` LEFT JOIN `echo_event` ON ((notification_event=event_id)) WHERE notification_user = '2' AND event_type IN ('welcome','edit-user-talk','reverted','page-linked','mention')  AND (notification_timestamp <= '20130306213156') AND (notification_event < 567) AND (notification_bundle_base = 1)  ORDER BY notification_timestamp DESC, notification_event DESC LIMIT 11;

++-+---++-+---+-+--+--+-+ ++-+---++-+---+-+--+--+-+ ++-+---++-+---+-+--+--+-+ 2 rows in set (0.00 sec)
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                   | key_len | ref                                              | rows | Extra       |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_base_timestamp | 5       | const,const                                      |   18 | Using where |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY,type_page                                                                                                                                                                           | PRIMARY                               | 4       | git_my_wiki.echo_notification.notification_event |    1 | Using where |

Note: the query can be used as INNER JOIN, I keep it as it LEFT JOIN because it was originally written in this way, I will need to research the code to see if this change would break anything.

Question?: I don't understand why the LEFT JOIN would be useful here. Is there a scenario where there are notifications that don't have matching events? If we found some would we have all the columns needed to display them? LWelling (talk) 20:11, 19 March 2013 (UTC)
 * Yes, a notification would always have a matching events, I agree that the left join is not useful and necessary, I will update it to INNER JOIN Bsitu (talk) 17:32, 19 March 2013 (UTC)

example query: mysql> explain SELECT notification_read_timestamp, notification_bundle_display_hash FROM echo_notification WHERE notification_user = 2 AND notification_bundle_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY notification_timestamp DESC LIMIT 1; ++-+---+--+-+---+-+-+--+-+ ++-+---+--+-+---+-+-+--+-+ ++-+---+--+-+---+-+-+--+-+ 1 row in set (0.00 sec)
 * echo_notification (`notification_user`,`notification_bundle_hash`,`notification_timestamp`)
 * This index is used to get the read_timestamp of last bundle_hash, if the read_timestamp is NULL, that means the notification has not been read, then re-use the notification_bundle_display_hash, otherwise, generate a new notification_bundle_display_hash so there will be no new bundling group
 * id | select_type | table            | type | possible_keys                                                                                                                                                                               | key                                   | key_len | ref         | rows | Extra       |
 * 1 | SIMPLE     | echo_notification | ref  | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_hash_timestamp | 102     | const,const |   23 | Using where |

example query: mysql> explain SELECT event_agent_id, event_agent_ip FROM echo_notification, echo_event WHERE notification_event=event_id AND notification_user = 2 AND notification_bundle_base = 0 AND notification_bundle_display_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY notification_timestamp DESC LIMIT 500 -> ; ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ 2 rows in set (0.00 sec)
 * echo_notification (`notification_user`,`notification_bundle_display_hash`,`notification_bundle_base`,`notification_timestamp`)
 * This is mainly used to get the number of others in this message "User A and 45 others posted on your talk page", if the number is > 100, we show 99+, the reason we do limit 500 is that this number would be large enough to cover duplicated user_id or user_ip
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                        | key_len | ref                                              | rows | Extra                    |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_hash_base_timestamp | 103     | const,const,const                                |    1 | Using where; Using index |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY                                                                                                                                                                                     | PRIMARY                                    | 4       | git_my_wiki.echo_notification.notification_event |    1 |                          |

mysql> UPDATE echo_notification SET notification_bundle_base = 0 WHERE notification_user = 2 AND notification_bundle_display_hash = '441a3b00f0ed9fad181a618713361cd3' AND notification_bundle_base = 1

example query: mysql> explain SELECT notification_event FROM echo_notification LEFT JOIN echo_event ON notification_event=event_id WHERE notification_user = 2 AND notification_bundle_base = 1 AND notification_read_timestamp IS NULL AND event_type IN ( 'welcome','edit-user-talk','reverted','page-linked','mention' ) LIMIT 100; ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+
 * echo_notification (`notification_user`,`notification_bundle_base`,`notification_read_timestamp`)
 * This index is used to get the number of un-read notification, 'User A and 5 others posted on your talk page', which includes at least 6 notifications, is considered as one notification
 * Since we will display 99+ if the count is more than 100, we avoid select count by doing LIMIT 100
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                        | key_len | ref                                              | rows | Extra                    |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_base_read_timestamp | 20      | const,const,const                                |    1 | Using where; Using index |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY,type_page                                                                                                                                                                           | PRIMARY                                    | 4       | git_my_wiki.echo_notification.notification_event |    1 | Using where              |


 * echo_email_batch (`eeb_user_id`,`eeb_event_hash`,`eeb_event_id`)
 * when a bundle email job is popped off the queue, the job will use this index to look for the base event for bundling based on bundle hash, from the base event, we can generate message like "User A and 99+ others posted on your talk page", we use LIMIT 100 style to get the number instead of SELECT COUNT

example query: mysql> explain SELECT eeb_event_id FROM echo_email_batch WHERE eeb_user_id = 2 and eeb_event_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY eeb_event_id DESC LIMIT 1; ++-+--+--+-+--+-+-+--+--+ ++-+--+--+-+--+-+-+--+--+ ++-+--+--+-+--+-+-+--+--+ 1 row in set (0.01 sec)
 * id | select_type | table           | type | possible_keys                                                                                                                               | key                              | key_len | ref         | rows | Extra                    |
 * 1 | SIMPLE     | echo_email_batch | ref  | echo_email_batch_user_event,echo_email_batch_user_priority_event,echo_email_batch_user_hash_event,echo_email_batch_user_priority_hash_event | echo_email_batch_user_hash_event | 102     | const,const |    1 | Using where; Using index |


 * echo_email_batch (eeb_user_id, eeb_event_priority, eeb_event_hash, eeb_event_id)
 * When a cron job is processing daily/weekly digest, the cron will use this index to look for base event/hash, from the base event/hash, we can generate message like "User A and 99+ others posted on your talk page", normally we would only show max 20 bundle message in digest, we show notification with highest priority

example query: mysql> explain SELECT * FROM echo_email_batch, echo_event WHERE eeb_user_id = 2 AND event_id = eeb_event_id AND event_type IN ( 'welcome','edit-user-talk','reverted','page-linked','mention' ) AND eeb_event_priority >= 5 AND eeb_event_hash > '223092671ba4ede48f2fa0fee23b42e2' ORDER BY eeb_event_priority, eeb_event_hash, eeb_event_id LIMIT 1; ++-+--++-+---+-+---+--+--+ ++-+--++-+---+-+---+--+--+ ++-+--++-+---+-+---+--+--+ 2 rows in set (0.00 sec)
 * id | select_type | table           | type   | possible_keys                                                                                                                               | key                                       | key_len | ref                                       | rows | Extra                    |
 * 1 | SIMPLE     | echo_email_batch | range  | echo_email_batch_user_event,echo_email_batch_user_priority_event,echo_email_batch_user_hash_event,echo_email_batch_user_priority_hash_event | echo_email_batch_user_priority_hash_event | 103     | NULL                                      |    3 | Using where; Using index |
 * 1 | SIMPLE     | echo_event       | eq_ref | PRIMARY,type_page                                                                                                                           | PRIMARY                                   | 4       | git_my_wiki.echo_email_batch.eeb_event_id |    1 | Using where              |

mysql>


 * echo_email_bundle_status (`eebs_user_id`,`eebs_bundle_hash`)
 * This index is used to look up the timestamp of last email for a particular bundle hash, based on this timestamp, we will send a single notification or schedule it in the delayed job queue

example query: mysql> explain SELECT eebs_last_email_timestamp FROM echo_email_bundle_status WHERE eebs_user_id = 2 AND eebs_bundle_hash = '30f6d73b100437cf6a023aa8c83f1bf9'; ++-+--+---+--+--+-+-+--+---+ ++-+--+---+--+--+-+-+--+---+ ++-+--+---+--+--+-+-+--+---+ 1 row in set (0.01 sec) mysql>
 * id | select_type | table                   | type  | possible_keys        | key                  | key_len | ref         | rows | Extra |
 * 1 | SIMPLE     | echo_email_bundle_status | const | echo_email_user_hash | echo_email_user_hash | 102     | const,const |    1 |       |

Latest update from meeting in 03/22/2013

 * move Echo tables to extension1 db


 * echo storage will be moved entirely to NoSQL( redis ) in the future for enwiki and large wikis


 * set up redis instance in ee-prototype for delayed job queue so this can be tested


 * update echo_event.event_user_ip from varbinary(255) to varbinary(32)


 * research 'type_page' key in echo_event and see if it can be dropped.


 * echo_event.event_timestamp can be dropped and make sure we use echo_notification.notification_timestamp throughout the code base instead


 * drop table echo_subscription since we don't support user subscription now


 * do not force unnecessary unique key in echo_email_batch table if it is not necessary, this is a lot of overhead, just make sure ( eeb_user_id, eeb_event_id ) is unique


 * we don't need to worry about performance issue in GROUP BY/DISTINCT clause for queries used in echo_email_batch because email digest, delay job queue are running in the background, it's okay to use GROUP BY, but make sure we keep index at user level so there won't be a full table scan


 * use memcache to replace table: echo_email_bundle_status, third party applications can use objectcache if memcache is not available, this would prevent the data volume growing in a constant scale


 * investigate the possibility of storing meta data ( user_count, summary, etc, which is used to construct a bundle message ) instead of actual echo event for bundling from product point of view.