Parsoid/Linting/GSoC 2014 Application

Lintoid: Parsoid-based online-detector of broken wikitext

 * Public URL : Parser_migration_tool
 * Bugzilla report : Bug Report
 * Announcement :

Name and contact information
Name: Hardik Juneja Email: hardikjuneja.hj@gmail.com IRC or IM networks/handle(s): hardikj Location: India Typical working hours: Online from 12pm to 3am until August, 6pm to 2am after August.

Project Summary
This GSOC project aims at detecting the broken and deprecated wikitext found on wiki pages and later generating fixup information using Parsoid. Since Parsoid has got lot of fixup information that can help wiki editors to know where broken wikitext is and how they can fix it, this tool might be quite useful for the community. hence, this tool will also enhance the communication of Parsoid developers with wiki editors. Since we don’t necessarily want to reinvent the wheel of UI and fixup workflows, we will collaborate with the existing WikiProject CheckWiki by feeding fixup information generated using Parsoid to CheckWiki. This tool will also help Parsoid developers in collecting statistics about use of templates in balanced / unbalanced contexts.

The project aims at implementing a generator which would have following features -


 * 1) Finding issues like broken and deprecated wikitext and reporting them to checkwiki.
 * 2) Generating fixup information for each issue using Parsoid.
 * 3) Feeding this information to CheckWiki or provide a web service for CheckWiki to pull data.

Project Scope

 * 1) Finding issues using Parsoid based linter
 * 2) * Using some infrastructure of logging setup that is used to log production errors and also for tracing and debugging during development to create a Parsoid based linter.
 * 3) * Creating events when particular issue is found.
 * 4) Generating fixup information -
 * 5) * Planning the database structure and Create a database.
 * 6) * Creating an interface that listen to the events generated by linter and save it into a database.
 * 7) Feeding this information / provide a web service -
 * 8) * Creating web API’s for check wiki so they can pull data from our database
 * 9) * Creating a database sync service that will keep both database in sync
 * 10) Filtering and optimization -
 * 11) * Filter and optimize the process of Generation of issues
 * 12) * Generate fixup information for hard problem like balanced/unbalanced templates using Parsoid. this will be used for collecting statistics about use of templates in balanced / unbalanced context. such information is useful in order to categorize templates into those that are basically always producing balanced output and those that often produce unbalanced output.

Architecture
The broad architecture looks like - [Pages] |                    V             Logger Emits issues |                     V [db with list of pages to fix + what to fix] |                    + < - Bots Fixing Issues |                    V                    Webapp

Deliverables

 * 1) Parsoid based linter using logger to generate events for each issues
 * 2) A robust Backend to store issues
 * 3) Flexible API’s for Bots and Checkwiki to pull information from backend
 * 4) A database sync service to keep both database in sync.

Estimated project timeline

 * Community Bonding Period (2-3 weeks)
 * Study logger code and familiarize myself with its structure.
 * Lay down the modular design of the project
 * discuss the project design with the community.
 * Fix some bugs along the way and get my hands dirty.


 * Logger Integration (2 weeks)
 * Plan on what event are required to be generated by the logger
 * Get the logger up and running


 * Data Model and event listeners (2 weeks)
 * Building Data Models
 * Building event listeners for each event emitted by logger


 * Community feedback period (1-2 weeks)
 * I'd like to share my work with the community and subject it to feedback.
 * This gives me time to interact with the community, explain the progress of my project and incorporate popular suggestions.

Milestone: Prototype of Fixup information Generator.


 * API's + Database sync service (2-3 weeks)
 * Build API for bots and checkawiki
 * Build a database sync service

Milestone: Working project prototype. ready for integrated testing


 * 2 weeks: Add unit tests.
 * 1 week: Proper testing using some demo pages on a sandbox
 * 2 weeks: Testing and documentation.

About you
I am Hardik Juneja, a B.Tech, Computer Science Engineering student, in JIIT, Noida, India. I love building things that are useful for people and are fun to build with huge learning curve. Languages I mostly code in JavaScript and Python. I love to automate things and make life easy. I got interested in this project after seeing it in the list of ideas on the GSoC ideas page. It excites me to work on such a functionality which will make life easier for many Wikipedia editors.

Participation
I stay online on the IRC during my work hours and can be found on #mediawiki, #mediawiki-parsoid. For Community feedback and discussion, I use the mailing lists (Wikitech-l and Wikidata-l). I will try to maintain a copy of my work on my Github. For development, I will use local environment of Parsoid and Mediawiki. I'll try to commit early and often to my branch. I think documentation is a important part of a project, so I will try to document my work when possible and also test it regularly.

Past open source experience
I am an active member of Open Source Developers Club in my university. Ever since I am introduced to open source, everything I develop and use is open source. As a contributor, I've also attended a few open source meetups including PyCon India 2013, Jsconf delhi 2013 and few local meetups of linux user group, Firefox, etc.
 * I have contributed few patches to Mozilla AMO project.
 * I have also contributed a [ https://gerrit.wikimedia.org/r/#/q/owner:hardikjuneja.hj%2540gmail.com,n,z patch] to Parsoid project.
 * I have also created a [ https://github.com/hardikj/GGU-SL plugin] for sublime editor.
 * I am also a contributor to Eden project of sahana software foundation here some of my patches.
 * All my other projects can be found on my Github profile.

Any other info

 * Notes related to the project - Notes
 * Check Wiki - Project_Check_Wiki