User:Apsdehal/GSoC2014 Proposal

Annotation Tool that extracts information from the books and feed them on Wikidata
Public Url:


 * (https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Annotation_tool_that_extracts_statements_from_books_and_feed_them_on_Wikidata)

Announcement of Proposal:


 * Announcement 1


 * Announcement 2

Name and Contact Information
Name:


 * Amanpreet Singh

Email:


 * amanpreet.iitr2013@gmail.com

IRC Nick:


 * apsdehal

Web Page / Blog / Microblog:


 * Spookout

Location:


 * Roorkee, Uttarakhand, India

Typical Working Hours:


 * 10:00- 13:00, 15:30-19:00, 22:00-03:00 ( IST ) 4:30- 7:30, 9:30-1:30, 16:30- 21:30 ( UTC )

Synopsis
Project is strongly based on belief to improve the user interactivity with Wikidata and create a whole new world of data sharing and saving by creating a tool that on highlighting a statement would provide a GUI to fix its structure then feed it to Wikidata. Wikidata is a free information base that is same for humans and machines. It centralizes access to and structurally manage data so that every piece of data is easily available and accessible. By the means of the extension people can save their important notes and quotes directly on Wikidata hence making it more accessible.

Possible Mentors

 * 1) Cristian Consonni
 * 2) Andrea Zanni
 * 3) The Pundit team

Use cases

 * 1) You are at home, reading a book on Wikisource. As in the case of taking notes on paper, you can annotate and directly feed and share important quotes and data automatically with their source to the knowledge base of Wikidata. Furthermore, the users after you will be able to see your notes and thus saving the time. This can be done just by activating the plugin.
 * 2) You are at a presentation or seminar at work. An important fact or data point is shared during the presentation, e.g. your national statistical institute as just released the latest population data on their website. You can annotate it, click and it is on Wikidata.
 * 3) You are reading the news on your tablet using your browser, a new prime minister is being nominated. You can select the relevant text and insert this information in wikidata.
 * 4) Given a statement from Wikidata (or another source), we can use this tool to mark up a reference and import that reference to Wikidata. This could help with providing references for the millions of statements that currently don't have one. So more people annotating through this tool will add more and more references to the Wikidata. So this way many claims can be converted to proper statements.

Glossary of Wikidata terms used:

 * Item: It is a page in Wikidata main namespace representing a real-life topic, concept, or subject. Items are identified by a prefixed id, or by a sitelink to an external page, or by a unique combination of multilingual label and description.


 * Properties: It is a descriptor of a value for a particular item. In other words, it is an attribute for an item.


 * Statements: is a piece of data about an item, recorded on the item's page. A statement consists of a claim (a property-value pair such as "Season: Winter", together with optional qualifiers), supported by optional references (giving the source for the claim).
 * Claim: It is simply a statement without references.
 * Value: Simply an information about item that explains something about it.
 * Quantifier: is a part of the claim that says something about the specific claim, often in a descriptive way.

The side picture explains the terms, by using an item named London.

How it will work?
I am going to create a Mediawiki extension for this project that will offer a GUI on highlighting a sentence.

This GUI will analyze the statement using Pundit software, as a triple (subject, object, predicate), offer a change screen and then feed the same to Wikidata by linking to its items and property. The tool will offer suggestions based on the existing properties and items on Wikidata. For the whole process, we are going to use Wikidata's regularly improving API to achieve our goal. Through this whole data I saved or searched will be shared with the global world.

Following schema shows how the extension will work in details:


 * Firstly we are going to track the user using api to check if he/she is login and if not redirect to login page. User can still anonymously annotate text as usual like an anonymous user edits pages on mediawiki.
 * Pundit integrated with Wikidata vocabulary(that will be fetched from wikidata accordingly) will be packaged as a browser plugin and as a bookmarklet.
 * We will provide a GUI to the user so that he/she can annotate text.
 * Next, the interface should propose to:
 * chose a subject (i.e. an item)
 * choose a predicate (i.e.a property)
 * choose an object (i.e. data value, or statement)
 * The proposed predicated should already exist on wikidata, if not we will present user with an interface with title:


 * 'Can't find what you are looking for? Propose a property', and then move him to property proposal page. After this step, till now the annotation has become a claim.


 * In the next step we will gather sources of the annotation such as gathering website url, book's name (Wikisource) and many more. If we can't find sources we will provide an interface to user to input them himself, so as to convert the claim to statement through references.
 * Pundit will analyze the annotation as subject, object and predicate, pack it as statement and then save it at Pundit server.
 * Javascript scripts will be run to update the item's page on wikidata with the necessary information about the statement created. This will be also be done sometimes through wikibase api.
 * The flow will be unidirectional, that the user create annotations, save it on Pundit server then it is synchronized with wikidata.
 * Further extensions to this project can be Bidirectionality, extension should be made independent of Pundit server.

Tools to be used
1. Wikibase api: I am going to use api for wikidata provided by addwiki for the interaction related to wikidata, it is in currently stable and is the most regularly maintained api. I will interact with wikidata item pages through this api. Second job this api will do is to retrieve items, values and properties from wikidata as to present to user so he/she can create their own statements. Also the login status of the user will be checked through this api.

2. Pundit: Pundit is the free open source software for augmenting web pages with semantically structured annotations. I am going to use this to analyze the structure of the sentence that is annotated into subject, predicate and object. Afterwards feeding it with properties, items and values from wikidata. The reason I chose Pundit is basically its an open source software, well established and regularly maintained. On the other hand creators of this beautiful software are ready to help in case I need any. Example of how Pundit works, will explain in detail the process of annotating by it.

Required Deliverables

 * Create a plugin that would can provide function of annotating text and then feed that annotated text to wikidata.
 * Plugin must use existing properties available from wikidata, if not available ask for creating a new one.
 * While new items can be created on the fly.
 * Show the user with the plugin activated the annotations made by previous users.

Optional Deliverables

 * Create a mediawiki extension for the same and thus increasing the reach.

Details on Timeline

 * Task 1:


 * I have mediawiki running on my machine since I am contributing to mediawiki from the last year. I have also setup pundit in my machine through the minor task I have done, through which I been also familarized with the wikibase api and making api requests to wikibase api through Javascript. I am regularly in contact with my mentors through a google group and we regularly do discussion on topic and post questions in case I have doubts. We also hangout through voice call on google hangouts, thus making the communication more effective.


 * Task 2:


 * Since I creating a plugin for a which can also be easily saved in bookmark, it won't take much time to implement since pundit already the functionality of packaging it in the bookmarklet, the major time will be consumed in setting up the plugin. So initial code will be based on setting up the pundit to be in synchronization with mediawiki. In this phase the login functionality through mediawiki will also be implemented.


 * Task 3:


 * Writing unit tests and then testing the code is essential and integral part of this project, so this will done on many stages of the project. I will be using jQuery QUnit tests o test my code, so thus this code can be regularly extended to cover unit tests


 * Task 4:


 * I have to modify the current GUI provided by the pundit and blend it into the traditional look of mediawiki, so I will be writing css and javascript to create and style the GUI during whole this time.


 * Task 5:


 * Again unit tests through QUnitTest module will test the whole code to find if anything is broken, hence improving the overall stability of the code. I any errors are found they all have to be fixed regularly during this period.


 * Task 6:
 * Now this task is related with bringing the wikidata vocabulary from its server to the user's frontend and suggest user properties and values based on the data received. Since wikidata api is publicly available to make request and fetch json data from it, so this job will be done through javascript ajax request. I will be making get JSON requests to the server in turn handle the data received with my javascript functions on the client and thus making pundit available the whole bunch of the wikidata vocabulary needed.


 * Task 7:
 * In this task statements(annotations) made by the user will be saved on the wikidata's page for the specific item present in the annotations, this will be done through the javascript ajax requests to wikidata server to edit the page. In this sequence post requests can also be made to php scripts saved on other server if the task is complicated to make use wikibase api client available in php. Also in task we will store annotations on pundit server to make sure that the further user after the one annotating can see what was annotated before them.


 * Task 8:
 * This will involve writing documentation for the javascript objects(represents as classes) made during the task and also adding comments to the source code, and packing up the plugin for public use.


 * Task 9:
 * Again write QUnit Tests avaliable by jQuery foundation to test the whole code and in turn fix up any bugs found that can break down the plugin.


 * Task 10:
 * Open the plugin to public and fix up any bugs found by them by actively working on it.


 * Task 11:
 * Submitting the source code to google for final evaluation and opening the plugin officially for the use by the public, and hence finishing up the project.


 * Optional Task:


 * If the time permits I would create an extension for the plugin on mediawiki and host up a testing scenario at wikimedia labs, thus increasing the reach of the project.

Participation
For me, It was and will be always - Sharing is Caring. This has always helped me to get well with the WikiMedia community. I will publish all my completion reports on my blog weekly. All source code I write will be published to my Github repo and will be pushed to branch in Pundit repo also to make sure of collaboration. I try always to stay live in IRC, and am regular in replying to emails, so helping me to blend in the community. Testing and documentation will be added to the Wikitech Mail page. I am mostly available on #mediawiki, #wikimedia-dev, #wikidata during my working hours. I usually hangout with my mentors to discuss the ideas, I post our discussions and question on our google group which is free to everyone to join.

About Me
I am a 19 year old, second year student currently enrolled in Electrical Engineering (IV Year Course) at IIT Roorkee. I developed a passion for programming and web development in my freshman year. I am regularly contributing to Mediawiki since November 2013. I am an active member of SDSLabs at IIT Roorkee. I am currently proficient in Javascript, PHP, Python and Node.js. I have been using linux for the past two years and thus a initial source of inspiration for open source. I open source all my projects that I do individually so that the mass can gain something from it. I have been developing apps regularly at SDSLabs, we code late night at our lab and we all enjoy it. SDSLabs github profile. I usually work between 4:00 p.m. to 11:00 p.m. in weekdays and 11:00 a.m. to 11:00 p.m. in weekends, rest time is spent in usual exception of studies.

I am having summer vacations from April end to July mid so I think I would be able to complete my project in time and will continue working on developing it further in time once GSoC is completed. Coming from remote village in valleys of Himachal Pradesh, I love the idea of open source and think that 'Sharing is Caring' and hope that this idea will spread more through the communities like Mediawiki and projects like GSoC.

I am eagerly looking towards my project, as I selected this project because it involves the idea of sharing i.e. collaborating data and no doubt it involves my favourite language Javascript, and also some PHP in serverend. This project is in a way interesting because it aims at connecting data around the world with their sources and help people save their important data, so I am excited about this.

Past Projects

 * 1) Github Profile.
 * 2) Build web app for a local startup at IIT Roorkee, Roorkee Delivers.
 * 3) Created a code sharing website OpenCode
 * 4) A web app that makes matches on the basis of common interest between two people.
 * 5) jQuery plugin for shopping cart ( jCart ) and cookies ( jCookie ).
 * 6) Created an application for the alumni to share their experiences at IIT Roorkee.
 * 7) Contribution to Mediawiki (Gerrit Repo).
 * 8) I have mostly worked on improving the extension Multimedia Viewer.
 * 9) I have also contributed to open source project Moodle.
 * 10) Worked on our own lab music player based on play by github in Node.js.