|This page is obsolete. It is kept for historical interest only. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date.|
Several large consumers of big-data have published detailed information about some of their data processing products and clusters. A few are presented here for reference.
Twitter Rainbird (Cassandra, 2011)
Analytics for the Promoted Tweets advertising platform.
- 100,000s writes/sec
- 10,000s reads/sec
- 100TB+ storage
- Extremely low latency: <100ms reads
- Events are batched for ~60 seconds
- Parsing and structuring performed by bundlers
- Clients submitting events are Rainbird-aware
Facebook Insights (Hbase, 2011)
Social-plugin analytics for site owners.
- 20 billion events/day
- 200,000 events/second
- <30s average delay before event surfaces in queries
- 100+ metrics, but stored only as counters
- Events are batched for ~1.5 seconds
- Each node handling 10k writes/sec
Facebook Messages (Hbase, 2010)
The Facebook messaging system.
- 135 billion messages/month (~4.5B/day)
- 1.5M+ operations/second at peak
- ~55% reads: 825,000 reads/sec
- ~45% writes: 675,000 writes/sec
- 2PB+ (petabytes) data in storage