User:AalekhN/GSoC proposal 2014

Multilingual, usable and effective captchas

 * Public URL: https://www.mediawiki.org/wiki/User:AalekhN/GSoC_proposal_2014
 * Bugzilla report:
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=32695
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=5309


 * Announcement: https://www.mediawiki.org/wiki/Summer_of_Code_2014#Multilingual.2C_usable_and_effective_captchas.

Name and contact information
Name:Aalekh Nigam Email:aalekh1993@rediffmail.com IRC or IM networks/handle(s):aalekhN Location:New Delhi,India Timezone: Kolkata,INDIA, UTC+5:30 Typical working hours:12:00PM to 2:00AM until August,05:00PM to 2:00AM after August (Indian Standard Time)

Synopsis
Current CAPTCHAs implemented in many Wikimedia Projects are mostly broken, lacks localization and are many times vulnerable.This project aims to design an Multilingual, usable and effective captcha,which will be much more secure (difficult for bots to solve) ,user friendly (easier for humans to solve) and Multilingual for Mediawiki Projects...the new solution will also be useful for Blind and Visually Impaired users.

Scope of Work
There could be three best approaches to produce required result for the project:


 * Captcha on the basis of Selection of Particular Object: The following type of image will be shown as follow:

the only possible answer to given captcha is selecting images in increasing numerical order as shown below:

other possible questions for the captchas could be "Select the images in which man is wearing sunglasses?" as shown below:

An demo demostrating raw implementation of the given type of captcha is given here.If the captcha combination gets wrong ;captcha reloads itself and shows captcha with different images.

the answer to given question will be shown as follow: for better security issue I plan to provide 8 options for the captcha rather than 4 options.
 * Ask User to click on the same image as provided:The Question for the given type of captcha looks like:

Now when audio ask's user to select number "0" our user will use arrow key to move across different blocks like a slide show....with different voice speaking out the options and pressing enter to select the number spoken as option....hence verifying that the user is human.....although the above shown image is visual equivalent the actual image visible will be: based upon the user resarch conducted i would make audio captcha system audio for question two times....pause for 5 seconds then give a new captcha to the user if the user does not click on any option or selects the wrong option .....this can help us with bot attacks ....we can reserve a key lets say "a" as a virtual cursor for the users to be able to reload the captcha if the sound is not audible to the users.
 * For blind and visually impaired users:We can use and audio captcha system which ask user to select the number simultaneously as it is asked in the audio.For example the visual equivalent of the audio asked by the user will be :

Since,for any project of mass scale to be successful requires localization.....so for this project,I propose the use of an indexing system which would sort out the unjustified and irrevelant images out of those retrieved from Wikimedia Commons. An simple functioning of indexing system i propose is demonstrated here:

Deliverables
The Project aims to develop an effective captcha with the use of Wikimedia Commons API ,which would make Wikimedia Commons act as an database for our captcha images.
 * Aim to develop a proper indexing system for the images to be used for the captcha ...which would sort out the images that are not friendly to the user.
 * Develop effect's for captcha with the use of PHP's GD library.
 * Develop an user friendly audio captcha for Blind and Visually Impaired users .... by introducing various virtual cursors to the screen reader.

Project schedule

 * Community bonding period.
 * Extensive User research to explore more possibilities to develop an secure captcha.
 * Lay Down modular design for the project
 * Achieve the above by working with the mentor(s) and other community people interested in the project.


 * 3-4 weeks: Develop an indexing System
 * Integrate Commons Database with the Captcha using Wikimedia Commons API.
 * Merge it with the indexing system.
 * Develop captcha for "Selection of Object" type question

Milestone 1: Prototype for the first category of extension ready.


 * 3-4 weeks: Develop Audio Captcha and Effects for Captcha.
 * Adding virtual Cursor for the screen reader and provide appropriate back end processing.
 * Develop effects using PHP GD's library to produce captcha based upon effect.

Milestone 2: Working integrated prototype; Ready for integrated testing.


 * 2 weeks: Add unit tests.
 * 1 week: Ensuring proper integration and working with Mediawiki.
 * 2 weeks: Testing and documentation.

The above plan could go as expected or invariably re-distribute among the tasks.

Participation
As a regular follower to mediawiki I regularly hangout at #mediawiki-i18n and #mediawiki and will countinue to do so for period i am working .If face some doubts or needed advice i would head over to the mailing list.I will also post weekly update about the project on my blog here. I will use a local environment for running MediaWiki for development and will be commiting to gerrit every feature i make on day to day basis.If needed i will host talk at https://www.mediawiki.org/wiki/Talk:CAPTCHA

Past open source experience
Ever since the introduction to open source i have been an admirer to it and have worked with Mediawiki,Wordpress,Flask frameworks.I'm an active contributor to Open Source Developers Club in my college and have helped in building website for open source conference organized by the club. Also, to Bug:4365 and Bug 56504 merged.
 * Developed an Mediawiki extension for importing VideoJsPlayer to mediawiki and have contributed to few patches here and here for various projects with solution
 * Contributed to a patch to VideoJs
 * Contributed few patches to a website under development by codecademy user's.
 * Developed an Jquery Plugin for slideshow with fullsceen api support at Jquery Plugin Repository
 * Developed an wordpress plugin to port my Jquery Plugin to wordpress.
 * Other projects i have contributed to can be found here.

I've also frequently attended various open source meetups including Software Freedom Day, local meetups of Linux User Group, Firefox, etc.

Any other info
For the sake of demonstartion of Captcha on the basis of Selection of Particular Object an demonstartion has been prepared here .Apart from which i also have been looking for various inspiration for audio captcha and have found this specifically helpful.