Analytics/Hardware/Appendix

Comparisons
Several large consumers of big-data have published detailed information about some of their data processing products and clusters. A few are presented here for reference.

Twitter Rainbird (Cassandra, 2011)
Analytics for the Promoted Tweets advertising platform.
 * 100,000s writes/sec
 * 10,000s reads/sec
 * 100TB+ storage
 * Extremely low latency: <100ms reads
 * Events are batched for ~60 seconds
 * Parsing and structuring performed by bundlers
 * Clients submitting events are Rainbird-aware

Facebook Insights (Hbase, 2011)
Social-plugin analytics for site owners.
 * 20 billion events/day
 * 200,000 events/second
 * <30s average delay before event surfaces in queries
 * 100+ metrics, but stored only as counters
 * Events are batched for ~1.5 seconds
 * Each node handling 10k writes/sec

Facebook Messages (Hbase, 2010)
The Facebook messaging system.
 * 135 billion messages/month (~4.5B/day)
 * 1.5M+ operations/second at peak
 * ~55% reads: 825,000 reads/sec
 * ~45% writes: 675,000 writes/sec
 * 2PB+ (petabytes) data in storage