Analytics Quarterly Review

Folks present: Erik M, Patrick, Dario, David, Diederik, Howie, Robla, Tomasz, Jessie, Gayle, Ori, CT, Terry, Asher, Erik Z, Andrew Otto, Dan

Kraken Dataflow Diagram: Project:

CDH4 — the world's leading Apache Hadoop Distribution.

Hue Hue is a general purpose web interface built for the Hadoop ecosystem. Use Hue if you want to easily run and schedule Pig and Hive jobs. Navigate to to You'll need a Hue login account. Otto should have created one for you and given you a password if you also asked him for a shell account earlier. (This will soon be hooked into LDAP, and you will be able to use your usual WMF password).

The Hadoop Distributed File System ( HDFS ) is a distributed file system designed to run on commodity hardware.

Patrick: What is the status of Kraken as a prototype? Coming to it in the Kraken section (slide 12) ✔

Wikistats: traffic scripts (aka squid scripts) are improved now by contractor, dumps scripts are stable. All scripts are in git.

On Storm: Nathan Marz provided useful feedback during the research phase, which encouraged us to examine Storm as a solution for the ETL/Stream Processing phase.