User:Dheerajjoshi1991/Auto suggestion of categories

=Auto suggestion of categories using Semantic Analysis & Semi-Supervised Learning=
 * Public URL :http://www.mediawiki.org/wiki/User:Dheerajjoshi1991
 * Bugzilla report: https://bugzilla.wikimedia.org/show_bug.cgi?id=47871
 * Announcement: http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/068949.html

Name and contact information

Name: Dheeraj Joshi

Email: dheerajjoshi1991@gmail.com

IRC or IM networks/handle(s): djadmin22 (IRC)

Location: Chennai, India

Typical working hours: 8 hours a day, 5-6 days a week (or more if the situation demands it).

Synopsis

The key idea is to implement a plugin that will recommend categories to the user based on the content of the articles and talks pages on wikipedia ,mediawiki and other wikis too. As soon as user completes the article/talks, the categories tags will get populated automatically via ajax requests.

This is indeed very useful in wiki pages since it generates automatic categories which could be helpful to the users while specifying the categories it belongs to.

The existing extension creates only the tags with user information and the posting date(http://www.mediawiki.org/wiki/Extension:DiscussionThreading) and other one is category suspension (http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Category_Suppression/Participants) which are bit relevant to the my idea and does not satisfy the automatic process.

Advantages

·Automatic Category suggestion.

·Carry out the task with predefined vocabularies.

·Ajax based fast auto suggest categories that get populated on the fly using an efficient algorithm that takes minimal time and space complexity.

·Natural Language processing and cluster based approach for higher accuracy of results

-No Manual adding the categories

Deliverables

1st May- May 20

In this period I’d like to discuss my idea with the mentors, take feedback from the community and do the necessary changes to the idea. Since, the project is heavily into semantic analysis which is currently a hot research topic and involves NLP, Machine Learning, so I’ll utilize this time for discussion with Professors in this field to come up with the best solution. Also, familiarization with the mediawiki coding standards would be needed.

May 20- May 31

Finalize the best possible text mining algorithm- using semi-supervised learning methodologies.

June 1- June 14

Code the algorithm using suitable data structures and design the front end.

June 15- June30

Code the wrapper for using the corpus/database for synonym extraction using either WordNet or any other suitable repository.

July Period

Integrate all the modules together. Do dry runs and eliminate preliminary bugs in the code.

August 2

Mid Term evaluation By this time we will have a working prototype of the project will be ready.

August2-August15 

Code improvements-optimization and cleanup, changes based on feedback received during mid-term evaluation. Before final deadline Complete documentation and any other pending work. Final Submission At the end of the final submission we will have a complete working module with the auto tagging feature implemented..

About you

I’m a student who loves to learn, understand and make myself so strong to survive in any competitive scenario..My research interest lies in the field of internet and web programming, databases and cloud computing. Recently I have been involved in the use of these technologies to offer online learning environments and exploring ways of improving the quality of experiences these environments have to offer.I’m enthusiastic about web and wiki. I personally feel to contribute to wiki by using technical skills, So that I could make it much better with my experience and hard work. Just I need a push start through GSoC to work for this organization and stay as long as I can to contribute and make it more awesome.

Participation

I am a person who likes to work in an environment that encourages originality and innovative thinking. Since I have done some research in the given project area and have implement some feature like it before, so I believe I won’t get as problem as I would have got if I had started from scratch.I have been working on Machine Learning, Semantic analysis & web 2.0 and been contact with good researchers or I’d say my mentors, So in any case they’d definitely help me out.I’d be using GIT version control system to carry the project and thus will be uploading my source code on github(if allowed).

 Past open source experience 

I have worked on Drupal Code Modification for one Module: Automatic Integration. Though I have never been in to Bug findings on open source, but I have participated in various Bug Bounty Programs and patched my bugs in web applications.I’d like to contribute to Open source.

Any other info

Internship at IIIT-Delhi  May-July 2012- Worked on Smart Room System Project :  We have developed a prototype system for monitoring several parameters inside a room e.g. motion detection, temperature, current, voltage etc.

-Worked in a Startup: Medibeep - a social venture          Dec-Jan 2012- Developed Online Registration System

Academics Projects and Achievements:

-Developed College Research Portal.

- Case Study of Campus LAN

- Websites for various Conferences.

- Technical Director at Computer Society of India-VIT Chapter