CAPTCHAs (short for "Completely Automated Public Turing test to tell Computers and Humans Apart") are utilized on Wikimedia wikis, via the ConfirmEdit extension, as a means of ostensibly preventing spam and deterring spammers. In most wikis, a user might hit a CAPTCHA when trying to create an account, create a new page, or add an external link to a page.
There are a number of problems with the current CAPTCHA implementation.
- They are only available in English (bugzilla:5309): the words used by our CAPTCHAs, however they are created, should be in the user's language. An unknown number of new users and edits are lost from non-English speaking people.
- They violate accessibility principles (bugzilla:4845).
- They don't effectively prevent bots from spamming.
Alternatives that might be implemented in the future[edit | edit source]
Image CAPTCHAs[edit | edit source]
Captcha images do not require text input which helps for mobile and internationalisation issue. Some ideas based on images:
- Find the different one (view prototype). Several images from the same category (e.g., people) are shown mixed with one image from a different category (e.g., cat). Humans should be able to recognise which is the different one. Note that in this case, the question is always the same (find the different one) and the categories used are not exposed to the user.
- Find all images of a kind (view prototype). Images from two or more categories are presented together. The user is explicitly asked to find all the images of a given type (e.g., all images of people wearing glasses).
- Tag images (view prototype). The user is presented with images that contain some tagged elements and options to pick he correct tag (e.g., is it a bird? is it a plane?).
Replacing CAPTCHA with a honeypot[edit | edit source]
One possibility for avoiding localizations issues with the CAPTCHA is simply to remove it and replace it with a honeypot.
A homegrown reCAPTCHA clone[edit | edit source]
Write a version of reCAPTCHA that uses document images that have been processed by MediaWiki's ProofreadPage extension for Wikisource: WikiCAPTCHA. In other words, a CAPTCHA that feeds data to ProofreadPage to augment its OCR processing. You might build on existing code. It is worth noting that "reCAPTCHA hold no specific patents for the technology behind their text CAPTCHA algorithms (At least none they discuss on their website or are able to be found on the US Patents & Trademark Office site)", according to one blogger ().
Filed as bug 32695.
Also discussed at Wikimania 2012 with the presentation Wikicaptcha: a ReCAPTCHA-like solution for Wikisource
Accessibility[edit | edit source]
The accessibility of our current CAPTCHA is extremely bad. If the user has impaired eyesight or uses a screenreader the text-based CAPTCHA is almost entire inaccessible to them. A handful of our larger wikis solve this via a volunteer-run account request system. Alternatives like image CAPTCHAs still violate accessibility principles (bugzilla:4845), so an alternative such as an audio CAPTCHA should be considered.
See also[edit | edit source]
- Admin tools development, the field of Wikimedia Engineering responsible for this and other tools
- Bug 38640
- Research:Account creation UX/CAPTCHA
- TEDxCMU -- Luis von Ahn -- Duolingo: The Next Chapter in Human Computation
- Recent discussions
- Captchas and non-English speakers I and II
- Wikipedia CAPTCHA repair (2011-11-03): «Now that the Wikipedia CAPTCHA has been comprehensively broken by Burzstein et. al. in their paper "Text-based CAPTCHA Strengths and Weaknesses" [...] I've reworked the 2005-era CAPTCHA-image-generating Python script in the CAPTCHA engine» – code still waiting for reviewers.
- Suggestion: replace CAPTCHA with better approaches (July 2012)