User:AalekhN/GSoC proposal 2014

Multilingual, usable and effective Captchas

 * Public URL: https://www.mediawiki.org/wiki/User:AalekhN/GSoC_proposal_2014
 * Bugzilla report:
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=32695
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=5309


 * Announcement: https://www.mediawiki.org/wiki/Summer_of_Code_2014#Multilingual.2C_usable_and_effective_captchas.

Name and contact information
Name:Aalekh Nigam Email:aalekh1993@rediffmail.com IRC or IM networks/handle(s):aalekhN Location:New Delhi,India Timezone: Kolkata,INDIA, UTC+5:30 Typical working hours:12:00PM to 2:00AM until August,05:00PM to 2:00AM after August (Indian Standard Time)

Synopsis
Current CAPTCHA's implemented in many Wikimedia Projects are mostly broken, lacks localization and are many times vulnerable. This project aims to design an Multilingual, usable and effective Captcha, which will be much more secure (difficult for bots to solve), user friendly (easier for humans to solve) and Multilingual for Mediawiki Projects.

Scope of Work
The Project aims to implement approach "Captcha for identifying Odd one out" for the Project. There are also some alternative approaches mentioned here. Also an Image Indexing System is planned to be developed which would improve the performance of the captcha system overtime.

Captcha for identifying Odd one out
The question of the given type of Captcha will be:



the answer to given Captcha is:



other possible questions for the Captcha's could be "Select the images in which man is wearing sunglasses?" as shown below:



If the Captcha combination gets wrong, Captcha reloads itself and shows the Captcha with different images. A demo representing Raw working prototype of the given type of captcha is

given here:

 Advantage Over Bots  : Logical questions of "Selecting Odd One Out" is hard for bots to verify, also with the use of indexing

system ( mentioned below) we can combine unrelated items as options; for example combining human and cat would be better option rather

than combining Cat and Tiger's since the two resembles much alike.

Categories Selection for Images
In order to make categories unrelated we can make super-set of the categories for example:-> The categories of artist,astronauts can be categorized under the Super Category of humans, similarly collection of an array can be made comprising of unrelated categories such as “people”, “animals”, “machines" etc, moreover this array of unrelated categories can be modified by administrators of Wiki's to add more categories according to his need.

Captcha for Mobile Device
The Captcha for the mobile devices will be shown over as an overlay of mobile frontend over the page.

For the "Odd one out" type question the layout of the question will be as shown:



If the user click's on an wrong option all the image options of the captcha will be reloaded and a new Captcha question will be presented.

Image Indexing System
Since, for any project of mass scale to be successful requires localization; so for this project, I propose the use of an indexing system which would sort out the unjustified and irrelevant images out of those retrieved from Wikidata.

A simple functioning of indexing system I proposed is demonstrated here:



Other Function of the Proposed Image Indexing System :

 * Prevent the reuse of images that are reloaded many number of type.
 * Provide options to the users that are closely not related to each other.


 * Provide User friendly effects to the images.

Advantage:
Indexing system is designed to improve overtime, thus providing user's with images that are globally acceptable and easily recognizable by humans.

Translation of question's asked in the captcha
Translation question's used in Captcha can be performed by using translatewiki.net.

Deliverables
The Project aims to develop an Captcha plugin for ConfirmEdit Extension with the use of Wikidata API, which would make Wikidata act as a database for our Captcha images.
 * To Develop Plugin for current ConfirmEdit Extension.
 * Develop a proper indexing system for the images to be used for the Captcha which would sort out the images that are not friendly to the user.
 * Develop effects for Captcha with the use of PHP's ImageMagick library, which makes images easily recognizable by humans but on the same time, makes captcha secure from bots.
 * Use Wikidata API to retrieve images based on category of the images also translation to different words provided at wikidata can help us make the Captcha Multilingual.

If Time Permits:
 * Improve audio Captcha for Blind and Visually Impaired users by introducing various new virtual cursors to the screen reader, which would help blind user's to reload the page.

Project schedule
Achieve the above by working with the mentor(s) and other community people interested in the project.
 * Community bonding period.
 * Extensive User research to explore best possibilities out of those suggested to develop a secure Captcha.
 * Get familiar with Wikidata API
 * Lay Down modular design for the project


 * 3-4 weeks: Develop Captcha and integrate it with Wikidata API
 * Develop Captcha for "Selection of Odd One Out" and "Annotation based Captcha".
 * Integrate Wikidata Database with the Captcha using Wikidata API.

Milestone 1: Prototype for the extension ready.


 * 3 weeks:Develop Indexing System for the Captcha.
 * Develop an indexing System and integrate it with Prototype of the extension build.
 * Provide Localization Support to the Captcha.

Milestone 2: Indexing System build and Working.


 * 2 weeks: Develop User friendly effects using PHP's ImageMagick Library.
 * 1 weeks: Add unit test
 * 1 week: Ensuring proper integration and working with Mediawiki.
 * 2 weeks: Testing and documentation.

The above plan could go as expected or invariably re-distribute among the tasks.

Participation
As a regular follower to Mediawiki I regularly hangout at #mediawiki-i18n and #mediawiki and will continue to do so for period and beyond I am working. If face some doubts or needed advice i would head over to the mailing list. I will also post weekly update about the project on my blog about the project here. I will use a local environment for running MediaWiki for development and will be committing to Gerrit every feature i make on day to day basis. If needed i will host talk at https://www.mediawiki.org/wiki/Talk:CAPTCHA

About Me
I am Aalekh Nigam, a B.Tech, Electronics and Communication Engineering student, in Jaypee Institute of Information Technology, Noida, India. Since, the introduction to FOSS at a local Linux User Group Meetup I have been mostly involved Web Development and Android App Development. As an advantage to this project I have worked with various image API's including that of Flickr and Instagram. My first interaction with MediaWiki started about five months ago when I started building a extension for MediaWiki. This projects holds an special importance for me since it would bring life of thousands of editors and Users in ease along width making MediaWiki much more secure.

Past open source experience
Ever since my introduction to open source I have been an admirer to it and have worked with MediaWiki,WordPress,Flask frameworks.I'm an active contributor to Open Source Developers Club in my college and have helped in building website for open source conference organized by the club. Also, to Bug:4365, Bug 56504 and Bug 35486 merged.
 * Developed an Mediawiki extension for importing VideoJsPlayer to MediaWiki and have contributed by working on about 10 bugs here and here for various projects, with solution
 * Contributed a patch to VideoJs
 * Contributed few patches to a website under development by codecademy user's.
 * Developed an Jquery Plugin for slideshow with fullsceen API support at JQuery Plugin Repository
 * Developed an WordPress plugin to port my JQuery Plugin to WordPress.
 * Other projects I have contributed to can be found here on my github account.

I've also frequently attended various open source meetups including Software Freedom Day, local meetups of Linux User Group, Firefox, etc.

Any other info
Apart from which I also have been looking for various inspiration for audio Captcha and have found this specifically helpful.

Research Paper
Research Paper's that were helpful in preparing approaches of effective captcha as per mentioned in the proposal are given here: Demo and Micro-task created for the project can be found on this page.
 * Socially Adjusted CAPTCHAS
 * 2D Captchas from 3D Models
 * Usable Audio Captchas.