User:Drakefjustin/Fill-in-the-blanks

Comments and feedback is welcome.

Identity
Name: Justin Drake Email: drakefjustin (gmail.com account) Project title: Fill-in-the-blanks

Contact/working info
Timezone: UTC +1 Typical working hours: 10:00-13:00; 14:30-19:00; 21:00-22:30 Skype: Randomblue12

Abstract
This project aims to make learning content from Wikipedia articles (and potentially Wiktionary) more interactive, in a very simple way. I propose to develop a fill-in-the-blanks extension for MediaWiki, automatically generating cloze tests. The tool would remove at most one link per sentence in a selected portion of article to be (semi-randomly) removed and replaced with a blank input box. The key observation is that an article's wikilinks hold pertinent and localized information, ready to be exploited for active learning and quizzing. Automatic correction and scoring will be available, with the option for users to override decisions.

Implementation details
What? MediaWiki extension

Languages PHP and javascript

Links

 * Retrieving links: Relevant classes are ApiQueryLinks, SpecialWhatLinksHere and Linker.
 * Selecting links: Links will be selected at random. Not only is this easy to implement, but it allows quizzes on the same article to vary. If time permits, I could find heuristics to gauge the "technicality" of a link, for example counting links to an article. The higher the count, the more likely it is a common concept. The difficulty of the quizz can be altered by using such metrics. Also, the right amount of links should be removed. For now, I expect at most one link per sentence is a good balance. This could be made customizable.
 * Search instances of links: It is Wikipedia convention not to link every instance of a relevant concept within an article. I will therefore have to find the non-liked instances of linked concepts, maybe using ApiQuerySearch.
 * Checking links: Efficient correction can make the difference between a useful tool and a broken tool. As a first easy solution, I intend to allow the user, for each individual answer, to override the automatic correction if he feels his answer is correct. A next step is to improve the automatic correction. For example, for each link, redirects will be counted as valid answers since synonyms, variant spellings, common misspellings, etc. are often redirected to the corresponding article. The API can retrieve the list of redirects to an article easily.

UI

 * Highlighting text: I want users to be able to select the text they want to be quizzed on right from the Wikipedia article. Highlighting text is an easy and natural solution, so ArticleHighlight could be useful.
 * Identifying sentences: I need to identify the sentence structure within an article for two reasons. The first is that at most one link per sentence should be removed. The second is to avoid the user to select fragments of sentences: the basic "quizz unit" is a sentence. Some work has been done on sentence-level editing.
 * Input boxes: The blank input boxes should all have the same width. Code from InputBox and RemoveRedlinks could be useful.

Scoring

 * Instant feedback: Have small logos such as Green tick.svg and Red x.png in the top right corner using ArticleEmblems.
 * Logging of results: This doesn't have to be secure. That is, users can cheat themselves if they wish. Logging of results can be done locally on each user page, e.g. at User:example/QuizResults/.
 * Show score: Potentially use space in sidebar with SkinBuildSidebar.

About me
I'm a fourth year student in mathematics at the University of Cambridge, UK. I completed my undergrad there, and I am currently enrolled in a Master's program called "Part III". This is probably my last year as a student. In particular, I should be completely free this summer, for full commitment on this project.

I don't have any experience programming for MediaWiki, but I use MediaWiki through Wikipedia a lot. I am relatively tech-savvy and I know HTML, Javascript, PHP, and C. Most of my programming experience is in C, from coursework I did. For sure this will be a great learning experience, and I look forward to it.

I am especially enthusiastic as this relates to a team project which grew out of the start-up weekend in Cambridge. Myself, and two others, have plans to build a much fuller (and smarter!) automatically-generated interactive learning framework for Wikipedia, using other pieces of structured data, such as infobox templates. MediaWiki mentoring through the GSoC would provide a significant boost for our project!

Deliverables
This project (as described in the abstract) is definitely technically doable in the given time frame. Having said that, extra fancy features can make this project "arbitrarily difficult" as required. Fancy features include:
 * 1) Per-sentence selection for quizzing.
 * 2) Improved automatic correction with synonyms and variant names, variant spellings and misspellings, stemming, etc.
 * 3) Automatic recording of quiz results for signed-in users. (E.g. store results in user/Example/Fill-in-the-blanks/Results.)
 * 4) Customization of density of links to remove, level of randomness, instant feedback, skinning, etc.
 * 5) Advanced scoring and statistics.
 * 6) Identify anomalies within an article using global statistic.

Added to the coding-related difficulties, this project has to produce something pedagogically useful, which I'm not taking for granted. I intend test the extension periodically and fine-tune from the feedback I receive.

Project schedule
First half of summer


 * Extraction of links and mapping of "concept-space" within an article. (Indeed, Wikipedia policy is not to link concepts multiple times, so I will have to retrace every instance of a linked concept.)
 * Have a usable and neat UI, with basic functionality.
 * Testing, tweaking, optimization, and rethinking of best way forward for second half of summer, based on feedback.

Second half of summer


 * Reach a useful product, which is easy to use.
 * Add a selection of more fancy features (from the above list).
 * Testing, tweaking, optimization, and rethinking of best way forward for future, based on feedback.

Mock-ups