CAPTCHA/Image completion captchas

Image completion captchas

 * Public URL: https://www.mediawiki.org/wiki/User:RuchirangaW/Image_completion_captchas
 * Announcement: https://www.mediawiki.org/wiki/Summer_of_Code_2014#Multilingual.2C_usable_and_effective_captchas

Name and contact information

 * Name:Thanuditha Ruchiranga Wickramasinghe
 * Email:truchiranga@gmail.com
 * IM networks:Google Talk : truchiranga@gmail.com
 * Google+      : truchiranga@gmail.com
 * Skype          :  Thanuditha Ruchiranga


 * LinkedIn:Thanuditha Ruchiranga
 * Git Hub:https://github.com/Ruchiranga
 * Location:Galle, Sri Lanka
 * Typical working hours:WeekDays: 6 pm - 12.00 mid nights, Weekends: Full Day
 * Till the end of June I would be somewhat busy with my academic work since I will be having an semester end examination during that period. But after that I can work full day on my project and cover any work lagging behind before 18th of August.

Synopsis
The CAPTCHAs currently implemented in Wikimedia projects are mostly broken and might lead to various security threats and vulnerabilities. Furthermore CAPTCHAs that imply English letters are very hard to be considered multilingual. Usually users find it very frustrating when its hard to identify one or more letters in a CAPTCHA. My project idea focuses on desiging multilingual CAPTCHAs that would be a very easy task for a human to solve but would be a very difficult task for a bot to solve.

Proposed idea

The complexity of the captcha can be considered in several phases as shown below. Each approach has different levels of security and the optimum approach is to be discussed and selected during the Community Bonding period. Optimum approach would be the approach that gives sufficient security level for the least amount of overhead work in preparing the image captcha.

Phase 1:

The captcha to solve will be an incomplete image as shown here.



The position of the missing square from the original image can be randomly selected. The removed piece can be placed under the remaining image along with some other non matching images(let us consider 4 non matching images for the moment). The user has to choose the correct piece that matches the removed part of the original image.

Phase 2:

Since image puzzle solving algorithms have been showing up lately, the above approach might be vulnerable. As a solution,  the pieces of images to select can be attached to the original image and the whole captcha can be displayed as one single image. This way, it would be difficult for a bot to identify the possible pieces to be placed. The pieces does not necessarily need to be of the same image size making identifying them much more complex, for a computer program.



Phase 3:

This idea can be made even more secure by placing the pieces not just below the image, but by placing the image choice pieces randomly above the original incomplete image once again giving a one final single image. This too makes it much more hard for a bot to identify the choices.



Phase 4: The level of security can be increased further by making modifications to the incomplete image too. We can lay a certain text across the missing patch in the incomplete image such that it would make it difficult to find the gradients at the edges of the missing parts. And additionally we can make the user type the text that is laid over the remaining image. That text also can be made hard to be read by a bot just like the words in current captchas by making them wavy and adding some other effects on them. Phase 5:

The details of the whole captcha image can also be reduced by laying white stripes continously over the final captcha image. This makes processing the captcha image very much hard for a bot or a computer program.



In choosing pictures for this, we can consider trying to select pictures having somewhat noticable gradients(i.e some noticable edges). This is because, for a plain kind of picture like a one showing the blue sky, it would be difficult even for a human to find the exact match for a certain position.


 * Possible mentors: Pau Giner, User:Emufarmers

Deliverables

 * This project aims to develop effective, very much less vulnerable, highly secure and user friendly captchas by using Wikimedia Commons as a database for the orginal captcha generation images.


 * Develop a more effecive and secure captcha generation system than the one which is currently being used


 * Develop a system for selecting images from the Wikimedia Commons image database that meets the necessary requirements for a image to be eligible to be used as a Captcha. This might include certain image processing algorithms.


 * Develop a system to bring a plain text to a desired level of unreadability. Once again image processing techniques will be used.


 * Develop a system to bring out a final captcha image by combining and including all the fore mentioned strategies.

Project Schedule

 * Community bonding period - Planning and deciding the best suited approach with the mentors as well as the other interested community people.


 * 2 weeks - Getting familiar with the wikidata API and collecting the necessary information on how to use the Wikimedia Commons as a database for generating captcha.


 * 1 week - Finding out the way to test the work done


 * 2 weeks - Develop a filter for selecting the most suitable images for a captcha out of all the images in Wikimedia Commons


 * 3 - 4 weeks - Develop a system that gives out a Captcha image with the desired level of security once a certain image is given


 * 2 weeks - Ensuring proper integration of the developed systems with Mediawiki


 * 2 weeks - Testing and documentation work

Participation
I can communicate through Skype and Gmail. And I will maintain a repository at github where I will regularly commit what I develop. I will also maintain a good communication with the mentors, as well as in wikimedia developer mailing lists.

About you
I am Ruchiranga Wickramasinghe, a first year undergraduate at University of Moratuwa, Sri Lanka. I am studying Computer Science and Engineering. I have mastered languages Java, C and C++, I have basic knowledge in HTML and I am currently learning Java Script, PHP and CSS. My interests are mainly on Image Processing, Algorithms and Artificial Intelligence. And I have great passion towards contributing for the open source community.

For more information you can refer my user page on media wiki: Ruchiranga Wickramasinghe

I first heard about this program through a GSoC meetup held in our university - GSoC Sri Lanka Meetup. And also through the DecadeOf GSoC program held at our University(University of Moratuwa) with the presence of 1. Mr. Chris DiBona, Director of Social Impact and Open Source - Google Inc. 2. Ms. Mary Radomile - Program Manager - Google Inc. 3. Ms. Stephanie Taylor - Program Manager - Google Inc. 4. Mr. Rohan Jayaweera, Sri Lanka Country Consultant for Google Inc.

Since in Sri Lanka, we do not have a summer or a summer vacation, my university academic work has to be carried out along with the program. But I am confidant about myself that I am perfectly able to manage both work loads and meet the requirements on time as planned. I will be having a semester end examination for about 4 weeks from the end of month of May till the end of June. But after that I can work whole day and cover the work lagging behind,

Past experience
I honestly am a newbie to the FOSS world. As far as I have understood, this movement will make a remarkable impact on the quality of the lives of millions of people around the globe. And so I see no reason why I should not be a part of something really big as this and serve for the betterment of the future of the man kind. And I hope that this notion would keep me focused on contributing to FOSS projects throughout my career.

As for the projects i have involved on my own, I have created a web application using JSP that enables a user to enter some data through a web page and the entered data would be stored in a MySQL server.

The project can be found in my github repository.