Notifications/Database layout

Tables in Echo

 * echo_event - storage for notification triggering event, for example, edit-user-talk event would trigger talk page notification and email. this table has one to many relation with echo_notification


 * echo_notification - storage for individual notification


 * echo_subscription - this table is currently not used since we don't support subscription, it's mainly for user who wants to get notification on a particular event of a particular page; for example, I want to get notification on John Dow's talk page notification


 * echo_email_batch - this table stores a list of event_id for a user, this is primarily used for daily/weekly email digest and the email bundling. For email digest, the cron would look at the event for a user and send a email batch.  For email bundling, a delayed job would be popped off the queue and look for the bundle-able event based on hash and user_id.


 * echo_email_bundle_status - this table stores the timestamp and possibly status of a bundle-able email. For example, We bundle page link notification by title.  In the 1st hour, we send a single notification: 'San Francisco was linked by User A', in the 5th hour, we send bundle notification: 'San Francisco was linked by User B and 7 others', in 9th hour, no notification, reset cycle, send single notification again for the next notification, then start the 4 hour cycle again, this table is used to remember the state of a particle bundle notification

Web and email bundling background
For web bundling, we want to bundle based on user's last visit on notification overlay or page for each bundle-able group, here is an example

1. User receives two new page link notification

2. User visits the overlay, he would see "San Francisco was linked by UserA and 1 other"

3. User receives 3 new page link notification

4. User visits the overlay, he would see


 * "San Francisco was linked by UserX and 2 others"
 * "San Francisco was linked by UserA and 1 other"

For email bundling of the notifications above, it would follow the rules detailed in echo_email_bundle_status table, initially send out a single notification, then bundle notification every 4 hours till the cycle reset.

Key columns for bundling
echo_notification.notification_bundle_hash - This is a hash to determine whether a group of notifications is bundle-able

echo_notification.notification_bundle_display_hash - for web bundling, we want to further bundle based on user's visit to the overlay or the page.

For the 5 individual notifications in the bundling background section, they would have the same notification_bundle_hash, the first 2 have the same notification_bundle_display hash, and the last 3 have the same notification_bundle_display_hash.

echo_notification.notification_bundle_base - Notification overlay and page would use this to pull list of notifications. For example, If 3 notifications have the same notification_display_hash, only one of them will be pulled by the list, and it the rest is part of "and X others". notification_bundle_base is mainly used to avoid the usage of group by/distanct, because they involve a full index scan and a temporary table.

Index usage

 * echo_notification (`notification_user`,`notification_bundle_base`,`notification_timestamp`)
 * this index is used to pull a list ( currently 10 ) of notifications for the overlay and the notification page, for notifications that should be bundled as one notification, there would be only record with 1 and the others are 0, the idea is too pull the base one and construct the bundle message ( and X others ) for each of them. The reason I use base is to avoid GROUP BY query, since GROUP BY does a full index scan of all applicable records and then apply the limit clause
 * for notification_bundle_base, the number of 0 should be much greater than 1
 * notification_event was not added to the key mainly because there would not be many event for the same notification_timestamp

example query:

mysql> explain SELECT * FROM `echo_notification` LEFT JOIN `echo_event` ON ((notification_event=event_id)) WHERE notification_user = '2' AND event_type IN ('welcome','edit-user-talk','reverted','page-linked','mention')  AND (notification_timestamp <= '20130306213156') AND (notification_event < 567) AND (notification_bundle_base = 1)  ORDER BY notification_timestamp DESC, notification_event DESC LIMIT 11;

++-+---++-+---+-+--+--+-+ ++-+---++-+---+-+--+--+-+ ++-+---++-+---+-+--+--+-+ 2 rows in set (0.00 sec)
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                   | key_len | ref                                              | rows | Extra       |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_base_timestamp | 5       | const,const                                      |   18 | Using where |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY,type_page                                                                                                                                                                           | PRIMARY                               | 4       | git_my_wiki.echo_notification.notification_event |    1 | Using where |

Note: the query can be used as INNER JOIN, I keep it as it LEFT JOIN because it was originally written in this way, I will need to research the code to see if this change would break anything.

example query: mysql> explain SELECT notification_read_timestamp, notification_bundle_display_hash FROM echo_notification WHERE notification_user = 2 AND notification_bundle_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY notification_timestamp DESC LIMIT 1; ++-+---+--+-+---+-+-+--+-+ ++-+---+--+-+---+-+-+--+-+ ++-+---+--+-+---+-+-+--+-+ 1 row in set (0.00 sec)
 * echo_notification (`notification_user`,`notification_bundle_hash`,`notification_timestamp`)
 * This index is used to get the read_timestamp of last bundle_hash, if the read_timestamp is NULL, that means the notification has not been read, then re-use the notification_bundle_display_hash, otherwise, generate a new notification_bundle_display_hash so there will be no new bundling group
 * id | select_type | table            | type | possible_keys                                                                                                                                                                               | key                                   | key_len | ref         | rows | Extra       |
 * 1 | SIMPLE     | echo_notification | ref  | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_hash_timestamp | 102     | const,const |   23 | Using where |

example query: mysql> explain SELECT event_agent_id, event_agent_ip FROM echo_notification, echo_event WHERE notification_event=event_id AND notification_user = 2 AND notification_bundle_base = 0 AND notification_bundle_display_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY notification_timestamp DESC LIMIT 500 -> ; ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ 2 rows in set (0.00 sec)
 * echo_notification (`notification_user`,`notification_bundle_display_hash`,`notification_bundle_base`,`notification_timestamp`)
 * This is mainly used to get the number of others in this message "User A and 45 others posted on your talk page", if the number is > 100, we show 99+, the reason we do limit 500 is that this number would be large enough to cover duplicated user_id or user_ip
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                        | key_len | ref                                              | rows | Extra                    |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_hash_base_timestamp | 103     | const,const,const                                |    1 | Using where; Using index |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY                                                                                                                                                                                     | PRIMARY                                    | 4       | git_my_wiki.echo_notification.notification_event |    1 |                          |

mysql> UPDATE echo_notification SET notification_bundle_base = 0 WHERE notification_user = 2 AND notification_bundle_display_hash = '441a3b00f0ed9fad181a618713361cd3' AND notification_bundle_base = 1

example query: mysql> explain SELECT notification_event FROM echo_notification LEFT JOIN echo_event ON notification_event=event_id WHERE notification_user = 2 AND notification_bundle_base = 1 AND notification_read_timestamp IS NULL AND event_type IN ( 'welcome','edit-user-talk','reverted','page-linked','mention' ) LIMIT 100; ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+ ++-+---++-++-+--+--+--+
 * echo_notification (`notification_user`,`notification_bundle_base`,`notification_read_timestamp`)
 * This index is used to get the number of un-read notification, 'User A and 5 others posted on your talk page', which includes at least 6 notifications, is considered as one notification
 * Since we will display 99+ if the count is more than 100, we avoid select count by doing LIMIT 100
 * id | select_type | table            | type   | possible_keys                                                                                                                                                                               | key                                        | key_len | ref                                              | rows | Extra                    |
 * 1 | SIMPLE     | echo_notification | ref    | user_event,user_timestamp,echo_notification_user_base_timestamp,echo_notification_user_hash_timestamp,echo_notification_user_hash_base_timestamp,echo_notification_user_base_read_timestamp | echo_notification_user_base_read_timestamp | 20      | const,const,const                                |    1 | Using where; Using index |
 * 1 | SIMPLE     | echo_event        | eq_ref | PRIMARY,type_page                                                                                                                                                                           | PRIMARY                                    | 4       | git_my_wiki.echo_notification.notification_event |    1 | Using where              |


 * echo_email_batch (`eeb_user_id`,`eeb_event_hash`,`eeb_event_id`)
 * when a bundle email job is popped off the queue, the job will use this index to look for the base event for bundling based on bundle hash, from the base event, we can generate message like "User A and 99+ others posted on your talk page", we use LIMIT 100 style to get the number instead of SELECT COUNT

example query: mysql> explain SELECT eeb_event_id FROM echo_email_batch WHERE eeb_user_id = 2 and eeb_event_hash = '223092671ba4ede48f2fa0fee23b42e2' ORDER BY eeb_event_id DESC LIMIT 1; ++-+--+--+-+--+-+-+--+--+ ++-+--+--+-+--+-+-+--+--+ ++-+--+--+-+--+-+-+--+--+ 1 row in set (0.01 sec)
 * id | select_type | table           | type | possible_keys                                                                                                                               | key                              | key_len | ref         | rows | Extra                    |
 * 1 | SIMPLE     | echo_email_batch | ref  | echo_email_batch_user_event,echo_email_batch_user_priority_event,echo_email_batch_user_hash_event,echo_email_batch_user_priority_hash_event | echo_email_batch_user_hash_event | 102     | const,const |    1 | Using where; Using index |


 * echo_email_batch (eeb_user_id, eeb_event_priority, eeb_event_hash, eeb_event_id)
 * When a cron job is processing daily/weekly digest, the cron will use this index to look for base event/hash, from the base event/hash, we can generate message like "User A and 99+ others posted on your talk page", normally we would only show max 20 bundle message in digest, we show notification with highest priority

example query: mysql> explain SELECT * FROM echo_email_batch, echo_event WHERE eeb_user_id = 2 AND event_id = eeb_event_id AND event_type IN ( 'welcome','edit-user-talk','reverted','page-linked','mention' ) AND eeb_event_priority >= 5 AND eeb_event_hash > '223092671ba4ede48f2fa0fee23b42e2' ORDER BY eeb_event_priority, eeb_event_hash, eeb_event_id LIMIT 1; ++-+--++-+---+-+---+--+--+ ++-+--++-+---+-+---+--+--+ ++-+--++-+---+-+---+--+--+ 2 rows in set (0.00 sec)
 * id | select_type | table           | type   | possible_keys                                                                                                                               | key                                       | key_len | ref                                       | rows | Extra                    |
 * 1 | SIMPLE     | echo_email_batch | range  | echo_email_batch_user_event,echo_email_batch_user_priority_event,echo_email_batch_user_hash_event,echo_email_batch_user_priority_hash_event | echo_email_batch_user_priority_hash_event | 103     | NULL                                      |    3 | Using where; Using index |
 * 1 | SIMPLE     | echo_event       | eq_ref | PRIMARY,type_page                                                                                                                           | PRIMARY                                   | 4       | git_my_wiki.echo_email_batch.eeb_event_id |    1 | Using where              |

mysql>


 * echo_email_bundle_status (`eebs_user_id`,`eebs_bundle_hash`)
 * This index is used to look up the timestamp of last email for a particular bundle hash, based on this timestamp, we will send a single notification or schedule it in the delayed job queue

example query: mysql> explain SELECT eebs_last_email_timestamp FROM echo_email_bundle_status WHERE eebs_user_id = 2 AND eebs_bundle_hash = '30f6d73b100437cf6a023aa8c83f1bf9'; ++-+--+---+--+--+-+-+--+---+ ++-+--+---+--+--+-+-+--+---+ ++-+--+---+--+--+-+-+--+---+ 1 row in set (0.01 sec) mysql>
 * id | select_type | table                   | type  | possible_keys        | key                  | key_len | ref         | rows | Extra |
 * 1 | SIMPLE     | echo_email_bundle_status | const | echo_email_user_hash | echo_email_user_hash | 102     | const,const |    1 |       |