Analytics/Archive/Infrastructure/JMX Monitoring

Almost everything in the big data world is Java; and because most of it was written by experienced engineers and seen real production deployment, the components all publish stats and controls via JMX. Thus a solid JMX monitoring solution would be great.

= Research =


 * Comparison of Network Monitoring Systems -- most of the solutions listed here won't do what we need, but it was my starting point for many of these links.

Off The Shelf Solutions
These should be full-stack monitoring applications, with client producer libraries, a server for data aggregation, a dashboard for reviewing data/interacting with JMX services, and configurable instrumentation alerts.


 * Zabbix: I've used this before. It wasn't fantastic, pretty, or easy to use, but it did most of what it said. I found the source to be totally unhackable, as it's an ugly combination of C, PHP, and Java.
 * Jolokia (source): Looks promising: Focuses on JMX; Cubism support for graphs; polyglot data logging (with support for normal JMX (JSR-160), a JVM agent, Python's OSGI, and even an on-page JS agent). I'd be interested in trying it out.
 * Turmeric (source): I think this is more of a full-stack application platform (they call it a "policy-driven SOA platform") that incidentally provides monitoring of its services. Probably inappropriate for our needs, but it seems pretty interesting at least. See also: a related blogpost.

Libraries
These libraries might not provide all the features/components of an OTS package, but hopefully also provide less cruft and cleaner interfaces/better ideas.


 * Ooyala's Hastur (client): I attended a talk on Hastur's architecture at the Cassandra Summit (video, slides), and I've wanted to play with it since then. If you're curious about the arch, check out those links. Unfortunately, while it provides both a client and a server, as far as I can tell it doesn't have a dashboard. I'm also unclear if it has a JMX adapter out of the box, but I believe I read somewhere it did.
 * Netflix's Servo: An application monitoring library. API looks great (using annotations!), awesome set of transforms/features, high quality code. Unfortunately, Netflix runs everything on EC2, so the library only supports CloudWatch out of the box -- we'd still need an aggregator.
 * Twitter's Ostrich: A stats collector & reporter for Scala servers. For when we inevitably start writing Scala code. It does appear to provide an aggregator/admin server, but I didn't spend much time looking at it.