User:AalekhN/GSoC proposal 2014

Multilingual, usable and effective Captchas

 * Public URL: https://www.mediawiki.org/wiki/User:AalekhN/GSoC_proposal_2014
 * Bugzilla report:
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=32695
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=5309


 * Announcement: https://www.mediawiki.org/wiki/Summer_of_Code_2014#Multilingual.2C_usable_and_effective_captchas.

Name and contact information
Name:Aalekh Nigam Email:aalekh1993@rediffmail.com IRC or IM networks/handle(s):aalekhN Location:New Delhi,India Timezone: Kolkata,INDIA, UTC+5:30 Typical working hours:12:00PM to 2:00AM until August,05:00PM to 2:00AM after August (Indian Standard Time)

Synopsis
Current CAPTCHAs implemented in many Wikimedia Projects are mostly broken, lacks localization and are many times vulnerable. This project aims to design an Multilingual, usable and effective Captcha, which will be much more secure (difficult for bots to solve) ,user friendly (easier for humans to solve) and Multilingual for Mediawiki Projects, the new solution will also be useful for Blind and Visually Impaired users.

Scope of Work
There could be four best approaches to produce required result for the project:

Captcha on the basis of Selection of Particular Object
The following type of image will be shown as follow:



the answer to given Captcha is shown below:

other possible questions for the Captchas could be "Select the images in which man is wearing sunglasses?" as shown below:

If the captcha combination gets wrong; Captcha reloads itself and shows Captcha with different images.

Ask User to click on the same image as provided
The Question for the given type of captcha looks like:

the answer to given question will be shown as follow: for better security issue I plan to provide 8 options for the Captcha rather than 4 options also we can provide multiple effects to an option for example:



the above given file is produced by providing pencil sketch+photo booth effect to the below given image and is not recognizable to large image databases like google images and tineye api.

Annotations Based Captcha
This type of captcha asks user to identify the objects in the image out of the given options.The question for the given type of captcha will be as shown below:



the correct option's to the given type of question's are 'Cat' and 'Mountain' respectively.

Image Rotation Based Captcha
The following type of captcha has been proposed by Google Research in an article here. Question of the given type of captcha will be to orient the images in an upside down position as shown here:



we can rely selection made by the people overtime for the images which are uploaded in an improper orientation or are symmetric.

For blind and visually impaired users
Rather than developing a completely new type of captcha the approach could be to improve the interface of pre-existing Audio Captcha.The approaches to improve the captcha could be:

1) Assigning a key to make an audio playback. This audio playback for the same sound can be made for 3 times. After which the voice and the number spoken will change.

2) Assigning a key to make delay in the captcha voice for small duration of time.

3) Assigning a key to reload a particular number in the captcha, rather than reloading the whole captcha again a key could be assigned to reload a specific portion/number of the captcha.

More information about the improvement of the captcha could be found here.

The effect used in the photographs can be used wisely in order to render human recognizable photographs also an an advantage images manipulated with different images can help us in prevention against the autonomous programs as mentioned here

Since, for any project of mass scale to be successful requires localization; so for this project,I propose the use of an indexing system which would sort out the unjustified and irrelevant images out of those retrieved from Wikimedia Commons. A simple functioning of indexing system i propose is demonstrated here:

We can Use of Apertium Api to make our captcha multilingual ,for example: Apertium api can be used to retrieve names of various object's in different languages to make our captcha multilingual.

Deliverables
The Project aims to develop an Captcha plugin for ConfirmEdit Extension with the use of Wikimedia Commons API, which would make Wikimedia Commons act as a database for our Captcha images.
 * Aim to develop a proper indexing system for the images to be used for the Captcha which would sort out the images that are not friendly to the user.
 * Develop effects for Captcha with the use of PHP's ImageMagick library.
 * Use Wikimedia Commons API to retrieve images based on category of the images
 * Use Apertium API to make catcha multilingual.

If Time Permits:
 * Improve audio Captcha for Blind and Visually Impaired users by introducing various new virtual cursors to the screen reader.

Project schedule

 * Community bonding period.
 * Extensive User research to explore best possibilities out of those suggested to develop a secure Captcha.
 * Lay Down modular design for the project
 * Achieve the above by working with the mentor(s) and other community people interested in the project.


 * 3-4 weeks: Develop an indexing System
 * Integrate Commons Database with the Captcha using Wikimedia Commons API.
 * Merge it with the indexing system.
 * Develop Captcha for "Selection of Object" and "Annotation based Captcha" and develop user friendly effects with ImageMagick Library.

Milestone 1: Prototype for the first and third category of extension ready.


 * 3-4 weeks: Make "Selection of Object" and "Annotation based Captcha" Multilingual.
 * Integrate "Selection of Object type captcha" and "Annotations based captcha" with wiktionary to make it Multilingual.
 * Develop "Rotation based captcha" and integrate it with Indexing System.

Milestone 2: Working integrated of 1st,3rd and 4th category of captcha prototype ready.


 * 1 weeks: Develop "Same image click captcha".
 * 1 weeks: Add unit test
 * 1 week: Ensuring proper integration and working with Mediawiki.
 * 2 weeks: Testing and documentation.

The above plan could go as expected or invariably re-distribute among the tasks.

Participation
As a regular follower to Mediawiki I regularly hangout at #mediawiki-i18n and #mediawiki and will continue to do so for period i am working .If face some doubts or needed advice i would head over to the mailing list.I will also post weekly update about the project on my blog here. I will use a local environment for running MediaWiki for development and will be committing to gerrit every feature i make on day to day basis. If needed i will host talk at https://www.mediawiki.org/wiki/Talk:CAPTCHA

About Me
I am Aalekh Nigam, a B.Tech, Electronics and Communication Engineering student, in JIIT, Noida, India. Since, the introduction to FOSS at a local Linux User Group Meetup I have been mostly involved Web Development and Android App Development. As an advantage to this project i have worked with various image API's including that of Flickr and Instagram. My first interaction with MediaWiki started about five months ago when I started building a extension for MediaWiki.This projects holds an special importance for me since it would bring life of thousands of editors and Users in ease along width making MediaWiki much more secure.

Past open source experience
Ever since the introduction to open source i have been an admirer to it and have worked with MediaWiki,WordPress,Flask frameworks.I'm an active contributor to Open Source Developers Club in my college and have helped in building website for open source conference organized by the club. Also, to Bug:4365 and Bug 56504 merged.
 * Developed an Mediawiki extension for importing VideoJsPlayer to MediaWiki and have contributed to few patches here and here for various projects with solution
 * Contributed to a patch to VideoJs
 * Contributed few patches to a website under development by codecademy user's.
 * Developed an Jquery Plugin for slideshow with fullsceen api support at JQuery Plugin Repository
 * Developed an WordPress plugin to port my JQuery Plugin to WordPress.
 * Other projects i have contributed to can be found here.

I've also frequently attended various open source meetups including Software Freedom Day, local meetups of Linux User Group, Firefox, etc.

Any other info
Apart from which i also have been looking for various inspiration for audio Captcha and have found this specifically helpful.Other Research Papers which were helpful in preparing effective captcha as per mentioned in the proposal are given here: Socially Adjusted CAPTCHAS, 2D Captchas from 3D Models, Usable Audio Captchas.