Parsoid/Linting/GSoC 2014 Application

(Project title)

 * Public URL : Parser_migration_tool
 * Bugzilla report : Bug Report
 * Announcement :

Name and contact information
Name: Hardik Juneja Email: hardikjuneja.hj@gmail.com IRC or IM networks/handle(s): hardikj Location: India Typical working hours: 12am to 3am until August, 6pm to 2am after August

Project Summary
This GSOC project aims at enhancing the communication of parsoid developers with wikipedia editors through a tool that generates information about all the broken and deprecated wikitext lying all around pages on wikipedia. This will be done by creating a generator for an existing tool CheckWiki that will find issues for the tool and also feed fixup information that will be generated using parsoid. Since parsoid has got lot of fixup information that can help the wikipedia editors to know where broken wikitext is and how they can fix it, this tool might be quite useful for the Community.

The project aims at implementing a generator which would have following features -


 * 1) Finding issues like broken and deprecated wikitext and reporting them to checkwiki
 * 2) Generating fixup information for each issue using parsoid
 * 3) Feeding this information to checkwiki or provide a web service for checkwiki to pull data.

Project Scope

 * 1) Finding issues
 * 2) * Using some infrastructure of logging setup that is used to log production errors and also for  tracing and debugging during development.
 * 3) * Creating log events when particular issue is found
 * 4) Generating fixup information -
 * 5) * Planning the database structure and Create a database.
 * 6) * Creating an interface that listen to the log events and save it into a database.
 * 7) Feeding this information / provide a web service -
 * 8) * Creating web API’s for check wiki so they can pull data from our database
 * 9) * Creating a database sync service that will keep both database in sync
 * 10) Filtering and optimization -
 * 11) * Filter and optimize the process of Generation of issues
 * 12) * Generate fixup information for hard problem like balanced/unbalanced templates using parsoid

Architecture
The broad architecture looks like - [Pages] |                    V             Logger Emits issues |                     V [db with list of pages to fix + what to fix] |                    + < - Bots Fixing Issues |                    V                    Webapp

Deliverables

 * 1) Integration with logger to generate events for each issues
 * 2) A robust Backend to store issues
 * 3) Flexible API’s for Bots and Checkwiki to pull information from backend
 * 4) A database sync service to keep both database in sync.

Estimated project timeline

 * Community Bonding Period (2-3 weeks)
 * Study logger code and familiarize myself with its structure.
 * Lay down the modular design of the project
 * discuss the project design with the community.
 * Fix some bugs along the way and get my hands dirty.


 * Logger Integration (2 weeks)
 * Plan on what event are required to be generated by the logger
 * Get the logger up and running


 * Data Model and event listeners (2 weeks)
 * Building Data Models
 * Building event listeners for each event emitted by logger


 * Community feedback period (1-2 weeks)
 * I'd like to share my work with the community and subject it to feedback.
 * This gives me time to interact with the community, explain the progress of my project and incorporate popular suggestions.

Milestone: Prototype of Fixup information Generator.


 * API's + Database sync service (2-3 weeks)
 * Build API for bots and checkawiki
 * Build a database sync service

Milestone: Working project prototype. ready for integrated testing


 * 2 weeks: Add unit tests.
 * 1 week: Proper testing using some demo pages on a sandbox
 * 2 weeks: Testing and documentation.

About you
I am Hardik Juneja, a B.Tech, Computer Science Engineering student, in JIIT, Noida, India. I love building things that are useful for people and are fun to build with huge learning curve. Languages I mostly code in JavaScript and Python. I love to automate things and make life easy. I got interested in this project after seeing it in the list of ideas on the GSoC ideas page. It excites me to work on such a functionality which will make life easier for many Wikipedia editors.

Participation
I stay online on the IRC during my work hours and can be found on #mediawiki, #mediawiki-parsoid. For Community feedback and discussion, I use the mailing lists (Wikitech-l and Wikidata-l). I will try to maintain a copy of my work on my Github. For development, I will use local environment of parsoid and mediawiki. I'll try to commit early and often to my branch. I think documentation is a important part of a project, so I will try to document my work when possible and also test it regularly.

Past open source experience
I am an active member of Open Source Developers Club in my university. Ever since I am introduced to open source, everything I develop and use is open source. As a contributor, I've also attended a few open source meetups including PyCon India 2013, Jsconf delhi 2013 and few local meetups of linux user group, Firefox, etc.
 * I have contributed few patches to Mozilla AMO project.
 * I have also contributed a [ https://gerrit.wikimedia.org/r/#/q/owner:hardikjuneja.hj%2540gmail.com,n,z patch] to parsoid project.
 * I have also created a [ https://github.com/hardikj/GGU-SL plugin] for sublime editor.
 * I am also a contributor to Eden project of sahana software foundation here some of my patches.
 * All my other projects can be found on my Github profile.

Any other info

 * Notes related to the project - Notes
 * Check Wiki - Project_Check_Wiki