Analytics/Archive/Data Processing/status

Last update on: 2014-07-monthly

2014-05-monthly
Capacity, deployment and CDH 5 (new Hadoop) version was worked on this month. These initiatives should be resolved in June. A permissions issue caused the page view dumps to stall for a weekend. The system was fixed promptly and no data was lost.

2014-06-monthly
The team has now integrated Data Processing as part of its Development Process. New Stories/Features have been identified and tasked. Also, experimentation with Cloudera Hadoop 5 is complete and we are ready to upgrade the cluster in July.

2014-07-monthly
New nodes were added to the cluster this month and all machines were upgraded to run CDH5. The team decided not to preserve any data on the cluster during the upgrade and started fresh. The team hosted a Tech Talk on our Hadoop installation. The video is here: https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg and the slides are here: https://docs.google.com/a/wikimedia.org/presentation/d/1ZPmfN-kmfqWEJUMIRg2feSstFPY45js4AnYaf3NbLNE/edit#slide=id.p.

Duplicate monitoring has also been implemented in Hadoop to monitor the incoming logs varnish logs.

