User:YuviPanda/GSoC/PoC

From mediawiki.org

A Proof of Concept (PoC) to be built before the official start of GSoC's coding period so I would not be going down the wrong rabbit's hole.

Goal[edit]

Pick one WikiProject, and write enough code so that it is able to save assessment data in a database whenever it is changed.

Partial Use Case (for PoC)[edit]

  1. Someone who is part of the Wikiproject wants to change assessment of an article (importance or class)
  2. Goes to talk page, edits template to make new assessment, hits save. He is done.
  3. The assessment data is automatically updated in the database.

Tasks[edit]

  1. Pick Wikiproject to use. Criteria for selection: Picked w:en:Wikipedia:WikiProject_India.
    1. Moderately large (so I can test with all articles on my desktop system) 83k articles in total, 65k assessed.
    2. 'Typical' assessment template - reasonably similar to other assessment templates. w:en:Template:WP_India is reasonably complex.
    3. Somewhat active community to help with testing and kick my lazy ass They can physically kick me. And lots of active testers I personally know.
  2. Write 'Assessment Template Processor'. Takes in a representation of the wikitext as input and puts out the assessment data as output. Needs to work only for the picked Wikiproject's assessment template for PoC Hacky version up
  3. Figure out preliminary schema for storing data. Enough to handle the picked WikiProject. Done
  4. Write a hook that runs on page save, and updates the database with the new data. Done.

Non-Tasks[edit]

These, while relevant to the project, will not be done for the PoC:

  1. Anything with UI on it.
  2. Logging.

Data Model[edit]

Data model with just enough things to make the PoC work:

  1. Project (:has name, wikipage name) Not necessary in PoC, since we're dealing with only one project.
  2. Rating (:has importance (+ last mod), quality (+ last mod), article title, rev and link to project)

Hook[edit]

Will use Manual:Hooks/ArticleSaveComplete. Tested to work with Talk pages too (duh!). Should pass on parsing only if the page 'can contain' an assessment (is talk page, etc).

Access to the preprocessor using the Article's getParserOutput() method. The produced object has a getTemplates() method that sounds about what I need. Not what I needed, it just gives me a list. Need access to Preprocessed DOM, which ofcourse doesn't occur when you save. is produced when the render is done after save (thanks Platonides!). This just got a lot more complicated.

At this point, my options are:

  1. Use the preprocessed DOM at save time (Might be computationally expensive, if I'm not able to reuse the one already created for the after-save render. Need to profile(?))
  2. Use the hook to put a job in the job queue, which does preprocess + updateDB later on (slightly delayed updates)
  3. Use regexing at hook time to do template detection ('and now you have two problems...')

Timeline[edit]

TBD