Core Platform Team/Initiative/Captcha/Initiative Description

Summary

We will improve our Captcha system to reduce the false negatives and false positives. Additionally, we will extend the Captcha system to generate microcontributions.

Significance and Motivation

Our Captcha system on Wikimedia sites was designed and implemented in the mid-2000s. Since that time, the industry has moved on, both in terms of better defenses, as well as better attacks on Captcha systems.

The issue has reached the WMF Board level with some community response to the long-time failure to adapt the Captcha system for visually-impaired people.

Additionally, recent spam attacks have shown that attackers are able to circumvent the Captcha system relatively easily, which puts undue burden on our communities to revert undesired contributions manually. Tim Starling concluded, in 2014, that our Captcha system was trivial to circumvent with off-the-shelf Open Source tools.

Localisation of interfaces has proven a key strength of the movement. However, the Captcha system only uses English vocabulary for the tests. Some number of users will be unable to detect words in the image because of their unfamiliarity with English language or Latin characters.

Modern Captcha systems, like reCAPTCHA, have moved away from meaningless tasks into meaningful generation of value. For example, reCAPTCHA includes image classification tasks that improve self-driving car AI. As the foundation considers options for microcontributions, the Captcha presents an opportunity to generate value out of users' work.

The W3C's recommendations on Captchas provide guidance on best ways to implement these kinds of systems. Improving our captcha to hew closer to these guidelines would make us better participants on the Web.

Outcomes

More human beings can pass the captcha (lower false negatives)
Fewer bots can pass the captcha (lower false positives)
Microcontributions are generated by Captcha
Visually-impaired new users are explicitly welcomed
Non-English-speakers are explicitly welcomed

Baseline Metrics

The ConfirmEdit extension has no mechanism for allowing visually-impaired people to confirm (false negative, a11y)
The ConfirmEdit extension uses only English vocabulary (false negative, i18n)
The ConfirmEdit extension is too easy to crack with off-the-shelf tools (false positive)
The ConfirmEdit extension is make-work and doesn't take advantage of the opportunity to do microcontributions

Target Metrics

Fewer spam attacks (TODO: quantify)
Higher participation rates (TODO: quantify)

Stakeholders

Security
Growth
(???) Is there a stakeholder for a11y?
Language
Scoring

Known Dependencies/Blockers

Captchas have been the subject of patents, such as US7929805B2.
At least one ticket T6845 has been open for 13 years. It may be hard to get organizational buy in to close this ticket.