User:Memeht/Improving the Wikimedia Performance Portal/Progress Reports

This page will house all Reports and links to blog posts, code samples created during the FOSS-OPW Internship.

I also maintain a blog (about my FOSS-OPW Project amongst other things) here

Community Bonding Report
Unfortunately, since being selected as an FOSS-OPW Intern, I have not been in contact with my mentors as they have been in the middle of deploying high-impact Wikimedia features.
 * 1. How was your landing and your first meeting(s) with your mentors?

Due to the relatively short timeframe of the Internship, I focused on on-boarding from my end. This included conducting further research on Wikimedia Operating and Performance Goals, Wikimedia network architecture, Best Practices for Performance Management metrics (from high volume organizations like Google and New Relic), and Dashboard Design Fundamentals.

I created a Functional Specification draft,identified improvements to the dashboards displayed on GDash and completed an Introductory Tutorial to Grafana.

As previously mentioned, I have not been in contact with my mentors so this process has yet to be finalized.
 * 2. What is the way of working that you have agreed? (tools in use, communication channels, meetings…)

In my proposal, I noted my work/learning style, and I am sure that after meeting with my mentors, we will be able to develop a working process.


 * 3. Lessons learned since you applied for this OPW round and since you were accepted.
 * Fundamental use-case of Dashboards: As a tool to communicate insights, not necessarily for in-depth, on-the-spot analysis.
 * Different logging mechanisms used in Wikimedia's Platform.
 * Gained a better context of the metrics being displayed on GDash.
 * Need to document data flows in order to provide context for data.
 * Understood how Phabricator works


 * See Project Phabricator Page

Week 2 (Dec 16 - Dec 22)
This week has been as research-heavy as the previous one. I have worked on retooling old models of Mediawiki's performance data and logging processes to help me better understand Visual Editor, and further explored  for Analytics related to Visual Editor.

Since performance data is in a time-series format, I have also been taking a look at some basic statistical techniques for normalizing such data and identifying/accounting for seasonality within data spread. Interesting stuff!

I have also been in touch with my new/Interim mentor Quim, and decided on a weekly meeting schedule that fits both our schedule, which should help in the coming weeks when I get  a chance to play with data.

I also worked on setting up Linn but it's been rough going.

Project Pivot

 * Project pivot to adding performance instrumentation to Parsoid
 * List of Phabricator Tasks

Week 3 (Dec 22 - Dec 29)

 * Started work on Interim Mediawiki Project
 * Obtained an updated (Gerrit pull) copy of the Parsoid code base
 * Researched the HTML2wt & wt2HTML pipeline process
 * Reviewed current Visual Editor and Parsoid performance instrumentation (X-Parsoid Headers, Event Logging etc)
 * Ran some wt2HTML & HTML2wt tests on Commandline and via Web interface
 * Researched the Parsoid and Visual Editor pipeline
 * Researched Mediawiki performance instrumentation guidelines
 * Detailed out tasks for Project on Parsoid and made adjustments based on Parsoid Mentor's feedback
 * .js review
 * Determined communication plan with mentor (combination of IRC, Google Hangouts, email as needed)
 * Spoke with Mentor and other Parsoid team members on IRC to gain a better understanding of the Parsoid pipeline and its integration with Visual Editor
 * Weekly meeting with mentor/coordinator
 * Talked to Mediawiki Analytics Mailing List to understand SQL queries utilized in aggregate the Metrics being displayed on 'edit-reportcard-wmflabs' page
 * Spoke with Parsoid project Mentor about Project deliverables and set up some beginning Project Goals

Project Deliverables

 * With Parsoid Mentor (Subramanya Sastry | Subbu), came up with some Goals for Project;
 * Add instrumentation to HTML2wt pipeline using EventEmitters and StatsD/Event Logging
 * Metrics will include 'Time to DOM Diff" amongst others
 * Metrics should be displayed on a TBD front-end visualization interface
 * Possible deployment on or after the second week of January 2015

Week 4 (Dec 29 - Jan 5)

 * Began discussion with Mediawiki Analytics community about possible Visualization front-end for Parsoid Metrics
 * Continued reading Parsoid codebase to determine which parts of the codebase Instrumentation should be added
 * Reviewed .js callbacks and asynchronous programming (Promises API)
 * Spoke with Parsoid Mentor to further understand codebase and steps in instrumentation process
 * Eliminated Event Logging as a metric aggregation candidate, leaving statsd as the default
 * Light research on statsd implementation, statsd clients and RestBase's implementation

Week 5 (Jan 5 - Jan 12)

 * Set up graphite server
 * Update to Parsoid's package.json
 * Created first half of Performance Instrumentation documentation
 * Determined txstatsd as event timer module and statsd client
 * See Here
 * ..and here
 * Understood how to implement txstatsd
 * See Here
 * Began adding instrumentation to codebase
 * Node.js review

Week 6 (Jan 12 - Jan 19)

 * Submitted Patch
 * Interview with Mentors

Week 7 (Jan 19 - Jan 26)

 * Due to computer crash, spent most of this week reinstalling programs and setting up a new computer
 * Resubmitted patch
 * Researched Server Configs

Week 8 (Jan 26 - Feb 2)

 * Updated and re-submitted patch
 * Started working on adding server-side settings for Grafana, including reaching out to mailing lists etc
 * Meeting with Mentors

Week 9 (Feb 2 - Feb 9)

 * Continued working on patch
 * Research on submitting a proposal for Open Source Bridge

Week 10 (Feb 9 - Feb 16)

 * Resolved Racing condition problem in code
 * Updated txstatsd wrapper code with changes to rbUtil.js
 * Continued working on patch

Week 11 (Feb 16 - Feb 25)

 * Patch merged into Parsoid codebase!
 * Added config settings for beta labs and production
 * Added amendments to timing namespacing;
 * Here
 * And Here
 * Performance Instrumentation live in Production
 * Set up dashboards in Grafana