User:Devayon/GSoC2011

This page outline my proposal (and subsequent updates) for Semantic MediaWiki for the Google Summer of Code 2011 programme.

Identity
Name:Devayon Das Email:devayon.das [at] gmail.com Project title:Improving Semantic Search/Semantic Query usability issues in SMW

Contact/working info
Timezone:+0530 GMT (IST) Typical working hours:11:00hrs to 1700hrs (flexible)

Project summary
This project is based on the idea by Markus Krötzsch with inputs from him and others over at the SMW development mailing list.

The Semantic MediaWiki (SMW) holds promise in one day being implemented successfully in large projects such as Wikipedia. The two major issues which need to have elegant solutions before this happens are:
 * Performance improvements for massively large scale wikis
 * Usability improvements.

On the second point, many improvements have been made, many growing into extensions, such as the Halo Project extensions and Ask the Wiki. However, as usable as these interfaces are, they have not been able to eliminate the need of the Special:Ask. This, even though Special:Ask is "cluttered and unwieldy, yet cannot represent all features of SMW queries".

My proposal is to improve the user experience in constructing Semantic Queries in SMW.

Present approaches could be categorized as: But they have disjoint starting points, although (as I've been pointed out) common aspects.
 * form-based/query centred (current Special:Ask)
 * form-based/data centred (like Semantic Forms for queries)
 * browsing based (like Drilldown or Exhibit)
 * structured query builder (like in Halo)
 * graphical query builder

So, what we need is a clean underlying framework/architecture, over which these UI's could be built. My proposal is to rework Special:Ask and separate it into two modules (performance obviously kept in mind): This will go a long way in helping people conduct semantic searches, and should have a long-term future as a base on which other query/data explorer modules could be 'plugged in'.
 * A better version of the query-centred UI, Special:Ask, attempting to make as many underlying features of SMW queries available to the user.
 * A module over which the new Special:Ask will run on. (future alternate UI modules should be able to be plugged into this)

If the future on MediaWiki meshes in with the Semantic Web (which I think everyone in the SMW community believes), this project should be an important step.

Project Updates

 * (date forgotten) The SMW rig is up and running. No data in the wiki though. Any ideas on how to populate it up quickly?
 * 26 April: This proposal has been accepted for GSoC 2011. Officially, my mentor is Benedikt Kämpgen. That doesn't mean I'll spare the rest of the SMW community. Look for my frantic emails in your Inbox.
 * 26 April: Minor Mixup. Mentor will be Markus Krötzsch. He's sent documentation.
 * 27 April: Waiting for Commit Access
 * 1 May: got the wiki-dump for my test rig. Still no commit access
 * 6 May: Some activity. Chatted a bit with my Mentor. Got Commit Access. Importing Wiki's still troublesome. Maybe we should just write a test to test this for every version of SMW. Hanging around at #semanticmediawiki. Also the folks at Google emailed about administrative business.
 * 6 May: Wasted too much time trying to import the entire wiki. Headed over to the SMW sandbox and just imported some pages. For testing, will import more if necessary. There!
 * Some-time in May end: Had interesting discussions with Markus and others. Now have some clear ideas for what should be done in weeks 1 and 2.
 * June week 1 and half: Excuses! Net connections were down, had to run around town looking for a proper solution. Unbelievable how much time has been wasted not coding. This is slow even for India. And I'm hopelessly behind schedule! :(
 * June 12: Am working on splitting the result generation from the query entry in Special:Ask. The work involves less of coding and more of removing irrelevant lines. Which is good, because one is forced to get what each and every line does, before removing, preserving or adding to it. Code already works, and I'll put in a commit by the end of tomorrow (to stave off potential goof-ups).
 * June 13: Apart from a minor commit error (sorry for that, Jeroen) caused from an earlier commit yesterday, things are going well. Markus has assigned some bugs to me (which gives me plenty reason to stick around with SMW long after GSoC ends). I wonder if that's a best practice for GSoC, assigning bugs?
 * June 13[Later]: As the day ends, committed a strictly "working" version of a query creator. Frankly I am not happy with the results. Much of the code which generates the form in Special Ask and and the code that prints the results are so tightly coupled that I was forced to undo many of the changes I had made in the last hour or so. The next commit should be of cleaner code (hopefully). Much of what is committed was anyway written yesterday. It's as if my commits are a day behind me.
 * June 16: The work goes well, although I haven't made any commits these last few days. Trying to separate the query form from the result generator throws up interesting ideas on how certain parts could be reused. The javascript auto-complete is broken, but because of an existing bug (the page loads up all the property names for autocomplete, which may cause a timeout), I don't mind. The plan is to implement new autocomplete using the MediaWiki API. That should be useful and reusable in case others want to make more UIs.
 * June 20: Fixed some warnings in QueryCreator. Oddly my editor inserted some some blanks lines between methods in Special:Ask. There is no side-effect, but just plain weird.
 * June 23: This week, I read through the Html class for SMW (which wasn't much work, really). Am writing a helper class which other UIs can use to query and generate results quickly, and without writing too much code (if they don't want to!)
 * June 25: It seems creating PHP backend for making the javascript autocomplete (Milestone 4) is already done by the MW API. Updated the AutoCompletion system in Special:Ask (and solved bug 27687 in the process) to use this instead. Will now add the same to search by property and special browse (if found necessary)
 * June 26: Added the Autosuggest Javascript to SMW's Special:Browse and search by Property.
 * July 7: It's been a while since I posted here. The mid-evals are coming up and I'm racing to finish a part of my project. Meanwhile, interesting discussions are going on about Selenium on the dev-mailing list. Perhaps making salenium test for my work will not happen.
 * July 8: Mid-evaluations are coming and my progress looks odd. While Milestones 2-4 are done, Milestone 1 seems problematic. And related to evaluations, there happens to be a guide book for mentors of GSoC. After reading it I strongly advise future applicants not to make the same mistake. There are few things in the world as depressing.
 * July 13: This is turning out to be a good week. I'm finally satisfied with the rate at which I'm committing code. Not much time is spent reading documentation. And I'm maybe only about one week (or less!) behind schedule. A little more of speed up, and minor change to the testing deliverables and I might be able to finish before time! Maybe it's the mid-evals!
 * July 21: The last week went well. Passed the mid-evals. I've added an RSS link for any results returned by QueryCreator, as well as included some updates that Jeoren made to Special:Ask. The SMW release candidate is out, so I'm trying to be careful in not introducing any bugs in the code. Benedikt Kämpgen has released Selenium tests for SMW, which I'll be looking at (to make tests for my work) by next week.
 * August 21: Time flies when you are having fun! The last month has been more fun than the first two of GSoC. The interface, QueryCreator is looking good and is easier to use, less cluttered, and has better support (random sorting and printout parameter support) than Special:Ask. I'm a running a bit late thouch, not in GSoC terms (am on schedule) but in terms of the SMW release schedule. QC is not included in the current minor version of SMW which came out today (1.6.1), but I'm quite sure it will be out in SMW 1.6.2. which should be out in another 2 months or so, hopefully. All my major issues with the old interface have been addressed, but Markus has a list of things he wants to see done in QC which I'm implementing now. Today's my last day of development under the GSoC programme, but The last month has been so much fun that I'll be sticking around, improving the interfaces in other places in SMW.

About me
I'm am a student of CS in Assam, India. When using any new software, I try to approach as the most clueless customer which usually seems to point out any UI design issues. As a programmer, when I try designing an architecture, or coding a file, I like it to be simple to understand. Which makes maintenance easier.

This minor obsession with wondering what makes us get the system design drew me to MediaWiki(because it stores useful information in one place) and SMW (because the Semantic Web is a godsend for clueless people).

I had come across Semantic Queries in SMW a while ago, discussing Semantic Web in class. It bothered me that I needed to go through the documentation to get usable results. Months later, when I came across the same issue posted as a GSoC idea, that too by a core member of SMW, it was good to know I wasn't imagining the problem. Whew!

Getting this right is very important simply because Semantic Queries, if not improved, could soon become a bottleneck for SMW's future. Or maybe I'm just imagining it. I'm currently doing research for a project related to data privacy in my department. My other hobbies, include working on Processing and visualization of datasets. When I scrape out extra free time, I also try to do some reading and gardening, which stems from appreciating good design.

Deliverables
I wanted the deliverables list to be a little flexible, because I expect the number of different classes I create to vary depending on how much cohesion there is in the existing code + my work. Also, I want to pester my mentor(s) to add to this list (so we can have a better product)

Required deliverables

 * Seperate JavaScript auto-complete for every token type(category, properties, etc.)
 * Server pages to provide data to each of the above scripts.
 * A UI interface for Special:Ask. Should have (at least the same or) better usability than current version.
 * Salenium Test cases for created components.
 * Documentation for all components created.
 * Documentation for existing code for the class SMWAskPage

If time permits

 * A graphical UI for browsing categories, using interface created above.
 * OR, alternative deliverables based on further work assigned by mentor(s).

Pre-coding period

 * Set up the development environment (already done)
 * Read documentation (already started)
 * Bother people on the mailing list (started, and continuing)
 * Document existing code
 * Set up local testing environment (Selenium)
 * Get some dummy data to fill my test server.

From 24th May, 2011
Although Officially I'm supposed to start from this date, I expect to begin earlier, just to get a heads-up start. Also, I'll have departmental presentations and in Week 1 (Semester ends at the end of Week 1) so I'll start work early to cover up for this.
 * Milestone 1 Expected time: Week 1 and 2
 * Separate SMW_SpecialAsk.php into UI and non-UI components.
 * Non UI components to be forked into separate class.
 * Test.
 * Will also end up expanding documentation currently available.


 * Milestone 2 Expected time: Week 3 and 4
 * Simplify UI layout.
 * Add/remove page elements to simplify look for the newbie user.
 * Test.
 * Will take so much time because I want to make multiple mock-ups of the UI.


 * Milestone 3 Expected time: Week 5
 * Javascript Autocomplete.
 * Separate the existing auto-complete into specialized one for different query tokens.
 * Create PHP code to feed the JavaScript.
 * Tests for php are expected to fail.


 * Milestone 4 Expected time: Week 6
 * PHP feeds to JavaScript
 * Fix PHP code from previous milestone to return list of categories ranked in custom order(alphabetic, best match or height in category tree.
 * Useful for building other UI.
 * This should take less time if I can leverage existing query printers to return json. If not not, may spill into next week. Trouble expected for ranking according to height in the category tree/graph.


 * Milestone 5 Expected time: Week 7
 * Other Stuff.
 * Code Clean-up.
 * Add more documentation.
 * Improve the UI (add keyboard short-cuts)


 * Milestone 6 Expected time: Week 8
 * More JavaScript
 * Create more PHP code to return data for JavaScripts for other tokens.
 * Code from Milestone 6 can be modified to achieve this. Therefore, work should proceed fast on this.


 * Milestone 7 Expected time: Week 9 and 10
 * Tests!
 * Write Selenium Test cases for Special:Ask.
 * Although pages by now are expected to work for valid queries, invalid queries are expected to cause trouble. Code. Rinse. Repeat.


 * Milestone 8 Expected time: Week 11 and 12
 * Example UI
 * Hopefully, I should be done by now. Therefore, will build a sample category selector based on the above work with JS InfoVis Toolkit.
 * The code is expected to be a bit hacky but will serve as a good example on how to build on from my GSoC work.
 * Alternatively, mentor(s) may (they have suggested this) put me to work on UI improvements elsewhere on SMW.


 * Of course, this time may also be spent fixing code in case I'm running behind.

Participation
I code in long stretches, usually at night, but also sometimes during the day. I tend to document my code as I write, although I come back and re-factor variables/class names later during code clean-up. Progress on my code can be viewed as bi-weekly SVN commits. Rather than blog about my work (I'd rather code than blog), I'll put up small updates on my profile page on mediawiki.org about my progress. I expect that to be more useful for future GSoC aspirants.

SMW related queries will be emailed to the SMW development mailing list, and to Markus Krötzsch, Neill Mitchell and Jeroen De Dauw since they have shown interest to (co)mentor. General media-wiki questions (although they seem unlikely) will be forwarded to the medawiki mailing list. I also expect to view code and have questions on other open-source projects such as Exhibit. Questions will be posted to them if necessary and any insights will be summarised on this page.

Quick questions will end up on IRC channels of mediwiki and SMW.

Past open source experience
I confess to being a FOSS newbie (everyone is a newbie sometime, right?) but am passionate enough about getting this right. Just to clarify, I've been working with PHP, MySQL and Javascript for quite sometime, but if selected of GSoC, this would be the first time I'd be contributing.

I've been informally helping a friend with the iCub Simulator and YARP, but have not committed any code.

And but of course, I just like everyone else, use a lot of FOSS software/libraries in my work.

UI ideas

 * Search at GitHub. A simple clean search page, good for the novice but also great for people who want to build complex queries
 * More GitHub. Graphics intensive UIs should have keyboard shortcuts for the PowerUser
 * Gephi If we ever wanted to 'surf' the semantic data as a graph, Gephi sure knows how to draw them!
 * A tree drawing javascript with JavaScript InfoVis Toolkit

Other SMW UI browsers
These are listed here, to show that each has their strength but hasn't eradicated the need for a better Special:Ask
 * Semantic Drilldown
 * Ask the Wiki
 * Exhibit
 * The Halo Project