WikidataEntitySuggester/Progress

= Monthly reports = I'll be dividing each report into "Things done" and "Things to do" sections, the former being what I did over the month, the latter being my immediate goals.

June
Things done:
 * 1) Did some research on techniques to provide recommendations for values.
 * 2) I finished documentation on the Wikidata Entity Suggester prototype, and am in the middle of posting them at this page and its sub-pages.
 * 3) Set up the Gerrit repository, got access to an m1.large Labs instance wikidata-suggester.
 * 4) I had written MapReduce scripts in Python to be used with Disco in May, to replace the C programs that Byrial shared (to parse the wiki dump, generate a csv file and database tables) since the C programs sometimes broke if some fields overshot some limits. Disco has an erlang dependency, so I decided to change the scripts to be used with Hadoop through Hadoop Streaming. I've configured Hadoop on the wikidata-suggester server and tested the scripts.
 * 5) Transferred the prototype code to Gerrit. I'm yet to push the PHP client.
 * 6) Made a few changes at my GSoC proposal page to reflect the new developments (addition of the wiki pages, extension page etc.)
 * 7) Have done partial deployment of the prototype on the labs server, should be finished in a couple of days. The instance now has a public IP; have opened a few ports to monitor Hadoop, Myrrix etc.
 * 8) Created the extension page for entity suggester here.

Things to do:
 * 1) Add some functionality to the MapReduce scripts to create database tables.
 * 2) Finish deploying the prototype on labs.
 * 3) Receive feedback, ask for new ideas/features.
 * 4) Do more research for recommendation, case-based reasoning, write code.

July
Things done:
 * 1) Almost finished the property-suggester. Two Servlets and some code-review left to be done.
 * 2) Wrote tests for the the Java-side backend with the property suggester. (Higher level HTTP-based tests may be written later too)
 * 3) Added code docs for the Python MapReduce scripts and the Java classes.
 * 4) Made a few small improvements in the Java code and removed value suggestion stuff, extra dependencies.
 * 5) Wrote MapReduce scripts for counting the property frequencies for a) all items and b) source references.
 * 6) Planned how to implement value and qualifier suggestions, discussed in brief the requirements of the MediaWiki PHP-side API to be written for the entity suggester.

Things to do:
 * 1) Write Java servlets for suggesting properties for empty items (items that don't have any props yet) and for empty source refs - initially a naive implementation will be done that'll fetch the top N properties from the two ordered property frequency lists (see 5. above). This should finish the property suggester.
 * 2) Work on the MediaWiki API. Read up on docs, learn what work is needed to be done and start on it.
 * 3) Implement value and qualifier suggestions. Should be simple to add new features like these since one feature of the backend is already done.

August to mid September

 * 1) Made thorough changes in the code for the backend Java REST API - the backend can now be trained and can suggest properties for claims, source refs and qualifiers, and also values for a given property.
 * 2) Written tests and code-level docs for the Java REST API.
 * 3) Organized the Python scripts into two all-encompassing modules, mapper.py and reducer.py, complete with documentation.
 * 4) Tested all features of the REST API on the wikidata-suggester test instance.
 * 5) Most of the bulk of coding of the MediaWiki API module is complete. Some code reviews left.
 * 6) Demonstrating the PHP API module not finished, tests need to be written.