Analytics/Archive/Data Processing/status

Last update on: 2014-09-monthly<section end="

Elisabeth van hoof

Mijn zus werd vermoord omdat ze de waarheid van haar echtgenoot aan het licht wow brengen op een verschrikkelijke wijze werd 1 van de 2ling dochtertjes mee vermoord ( Cindy 4 jaar ) ze werd een jaar later gevonden in een plastic zakje helemaal verstikt Het ander dochtertje Wendy werd in een tehuis groot gebracht ik weet niet dat ons Wendy nog leeft, mijn hart breekt als ik eraan terug denk. Ik hen lang genoeg gezwegen ! Wendy als je dit leest stuur dan naar mijn email

Marleenvanhoof@hotmail.com en dan kom je de waarheid te weten

2014-05-monthly
Capacity, deployment and CDH 5 (new Hadoop) version was worked on this month. These initiatives should be resolved in June. A permissions issue caused the page view dumps to stall for a weekend. The system was fixed promptly and no data was lost.

2014-06-monthly
The team has now integrated Data Processing as part of its Development Process. New Stories/Features have been identified and tasked. Also, experimentation with Cloudera Hadoop 5 is complete and we are ready to upgrade the cluster in July.

2014-07-monthly
New nodes were added to the cluster this month and all machines were upgraded to run CDH5. The team decided not to preserve any data on the cluster during the upgrade and started fresh. The team hosted a Tech Talk on our Hadoop installation (see video and slides). Duplicate monitoring has also been implemented in Hadoop to monitor the incoming Varnish logs.



2014-08-monthly
 The team continued monitoring analytics systems and responding to issues when [non-critical] alarms in went off. Packet losses and kafka issues were diagnosed and handled.

Hadoop worker nodes now automatically set memory limits according to what is available. Previously all workers had the same fixed limit. This allows for better resource utilization.

Logstash is now available at https://logstash.wikimedia.org (Wikitech account required). Logs from Hadoop are piped there for easier search and diagnosis of Hadoop jobs.

Some uses of udp2log were migrated to kafkatee. The latter is not prone to packet losses. In particular Webstatscollector was switched over and error rates were seen to drop drastically. Eventually, the “collecting” part of Webstatscollector will be implemented in Hadoop, a much more scalable environment to handle such work. 

2014-09-monthly
 A terrific weekly summary is posted to the Analytics mailing list with a summary at the top of each email. Here are the links to related posts in the archives. 
 * 2014-09-01--2014-09-07
 * 2014-09-08--2014-09-14
 * 2014-09-15--2014-09-21
 * 2014-09-22--2014-09-28