User:N3c~mediawikiwiki/GSoC Proposal

GSoC: Prototyping inline comments

 * Public URL: https://www.mediawiki.org/wiki/User:N3c/GSoC_Proposal
 * Bugzilla report link: (Bug 46440) https://bugzilla.wikimedia.org/show_bug.cgi?id=46440
 * Wikitech-L : Could not retrieve URL due to connection problems to http://lists.wikimedia.org/pipermail/wikitech-l/

Name and contact information
The work hours shouldn't be an issue. I have switched my body clock to working at night due to various duties previously.
 * Name: Anthony Chen
 * Email:cs3245hw4 #at* gmail #dot* com
 * IRC or IM networks/handle(s): I used to have a registered nickname but it is probably already deleted by the server as I started using IRC less in the past few years (Rizon). Do you use Skype?
 * Time Zone: UTC+8
 * Typical working hours: 11am to 11pm usually

Synopsis
Essentially, the project is an extension that allows any wiki user to select text and then make an inline comment or a reply to an existing inline comment. Imagine: a user lands in a Wikipedia article, selects one sentence and leaves an inline comment that others can optionally read and reply to.

Users can make useful comments regarding specific part of articles, which will be a part of collaborative work. The key benefit is to users to collaborate easily - because this actually allows you to point to something and comment in direct reference to it. It's like pointing your finger to a piece of paper and telling your friend sitting next to you, which can only be done in person and is currently impossible over the Internet. So it's a really powerful feature for collaborations since it makes one of the Internet-impossibles into a possible action.

That was for the insertion of a new comment. For the replying part, it will be a format will likely be similar to how threads are like in a forum, for the prototype.

In summary, it will take references from (and to some extent resemble) existing programs like: Among them, I will be most likely using the OKFN annotator methods, which are open sourced, for the annotation of the text.
 * https://en.wikipedia.org/wiki/Stet_%28software%29
 * http://openannotation.org/
 * http://www.openannotation.org/wiki/index.php/Web_Page_Clients_List
 * http://gplv3.fsf.org/comments/lgpl-draft-1.html
 * http://gplv3.fsf.org/comments/gplv3-draft-2.html#1404:1519:2004:1405:1572:1612:1613:1749:1805:1405:1612:1613:1749:1419:1500:1526:1712:
 * http://www.co-ment.com/see/
 * https://lite.co-ment.com/
 * http://okfnlabs.org/annotator/
 * http://hypothes.is/
 * http://homes.cs.washington.edu/~travis/reflect/ (slightly related)

Work Breakdown

 * The stub of the code to be written
 * Basic functions and classes writen for inline annotations of text, followed by inline comment
 * Documentation for them
 * Advanced functions and classes written for reply threading, username assignment for replies.
 * Documentation update
 * Advanced functions for increasing intensity of highlighting. (depending on progress)
 * Completed Documentation

Milestones and Deadlines
These are the rough milestones and their corresponding deadlines.

May 28-June 16

 * Further understanding the usage needs, the potential use cases that the Wikimedia desires to have.
 * Further research on the basic elements and frameworks that can be used.
 * Further familiarization with the Wikimedia framework and the platform it is based on.

June 16-June 22

 * Writing of the stub functions and classes.
 * Reviewing the stub functions with the mentor, and then revising them.
 * Writing some documentation for the function names and their respective return types.

June 22-July 31

 * Creating the basic inline selection and inline annotation features. No user assignment at this point of time, posts are anonymous.
 * Writing Part1 of documentation for the function names and their respective return types required for the above features.

August 1-August 27

 * Creating the advanced features of reply threading, user identification and username assignment for replies.
 * Writing Part2 of documentation for the function names and their respective return types required for the above features.

August 27-September 16

 * User testing
 * Final bugfixes
 * Improvement of user interfaces. I will do the advanced function for increasing intensity of highlighting if the time permits.
 * Touchup on documentation

September16-September 27

 * Final review with mentor. Final touchup on documentation
 * Cleaning up of the code

About Me
I am Anthony Chen, currently an undergraduate in Computer Science and Engineering. I read a lot. On Wikipedia, I have perhaps spent hundreds of hours reading and using the site. From the time spent on Wikipedia, I feel that it has been an awesome resource, which is why it was one of the organisations that I looked for projects under.

That was when I saw the inline commenting project. It immediately grabbed my interest. I wanted to take the project on because it is a feature that I intend to implement for a personal side project further down the road. This side project was once a module project this semester, during which participants were required to design prototype apps and implement them. I had already wanted to provide inline commenting in my app. However, there was a time constraint that limited me from actually coding it out. Nevertheless, the inline commenting feature is something that I have already thought through for a considerable time in the module. I have a picture in my mind on how it will look like and have thought through the backend, which I will talk more about in later sections. The challenge now is to tie the frontend with backend, the actual coding of the backend and making it suitable for the Wikimedia platform, and making my code maintainable for the future developers.

As such, there is a personal stake on this project in both giving back to Wikipedia and to helping myself progress on my own side project.

Participation
I believe in communicating with mentors frequently, to ask questions whenever in doubt. From my past experience, communicating via voice truimphs text as there are some things that are too time-consuming or difficult to write and explain in text.

As such, I plan to have about 2-3 Skype sessions with my mentor every month. In additon, I will also provide bi-weekly updates to my mentor on my progress via email.

During the course of the project, I plan to publish my source code on Github and make frequent commits. As for mentorship and help, there are 3 main avenues. One, my mentor. Two the Wikitech-L mailing list., where the Wikimedia community largely resides. Three, on Google and public code forums.

Platforms & Languages
I have developed on a couple of platforms including Linux, Titanium mobile, and web, for both work and sometimes for my own projects. I have experience mainly with languages like Python, Javascript, Perl, C++, and some PHP, as well as frameworks such as FabricJS, NLTK, Titanium-mobile. I also have experience on software architectures such as MVC and OOP. Other experiences include developing in Assembly and VHDL, but they aren’t exactly relevant to this Wikimedia project - nevertheless, they have provided good experience and food for thought for architecting software in general.

Domain Knowledge & Research
In the past 2 years, I have mainly been participating in hackathons and doing personal projects to improve my skillsets in developing on the aforementioned platforms and languages. In fact, my team and I came in first runner-up locally in the NASA International Space Apps Challenge (http://spaceappschallenge.org/awards/) and were nominated for the Global Judging. Through these hackathons and personal projects, I have gained experience on gotchas and unintended bugs that open sourced web projects often have.

On top of that, in preparation for Google Summer of Code, I also took up modules on data structures and information processing -and-retrieval for large data sets. Through them I gain experience in information processing and utilizing open sourced frameworks like the NLTK package, and I have some experience on building a prototype Natural Language parser, for document and word searches.

With specific relevance to this particular GsoC project, one edge over other people is that I have over 80% confidence that I can successfully implement the user tracking for all the inline comment starters and their subsequent repliers. This is because I have done some similar information tracking projects in the past and I am able to transfer the relevant domain knowledge into this specific use case.

In addition, I also have some experience in managing large data sets for information tracking. With inline commenting, I would expect several thousand higlights per wiki-page, with a potential several dozen or even several hundred replies per comment. As such, large data sets will be involved and this needs to managed effectively so that users do not get any bad lag-filed experience. Furthermore, this problem will be worsened by the millions page requests per day/hour. A really low-cost method of accessing the data is greatly necessary.

Moreover, as mentioned, I have given good thought on this feature before, as part of the design of the prototype app in a university module.

MediaWiki Platform Specific Issues
Although I may be slightly disadvantaged against long time veteran Mediawiki contributors who have contributed for many years, I believe that I will be able to catch up to a reasonable extent for me to deliver my work. Up against contributors who have only recently started contributing to Wikimedia in the past few weeks, I believe there is only a small gap between us, and I will definitely be able to catch up based on my past performance.

In the summer of my first year, I did software development work professionally for Micron Semiconductor. Although, my experience at that time was limited to just one Computer Science introductory module on C, which only required writing of simple programs, I was able to learn and adapt quickly to an entirely new language required at Micron. What I managed to do was to learn a completely alien language (at that point of time) within 1 week and a half and complete my work well within the stipulated timeframe of 9 weeks. I managed to do so even though I had to learn the language indepdently (as my supervisor was busy) and the language had a completely and vastly different syntax and structure that I had never before seen or even heard of during the introductory module on C. Nevertheless, I was able to grasp all those concepts in 1 week and a half.

In addition, for some projects last year, I managed to pick up another completely new language, Python, in just two days. With my experience in picking things up and making inferences rapidly, as well as my experiences with time pressures in picking up new frameworks in hackathons, I believe I have what it takes to get up to speed.

Other Info
I don’t really like to provide personal particulars on publicly open webpages. Harvesters often scour such pages for those data. I can and will provide my skype and github handle over a private IRC room.

Mentors
Matt Flaschen will be my mentor.