Core Platform Team/Initiative/Captcha/Initiative Description

From mediawiki.org

< Captcha

Summary

We will improve our Captcha system to reduce the false negatives and false positives. Additionally, we will extend the Captcha system to generate microcontributions.

Significance and Motivation

Our Captcha system on Wikimedia sites was designed and implemented in the mid-2000s. Since that time, the industry has moved on, both in terms of better defenses, as well as better attacks on Captcha systems.

The issue has reached the WMF Board level with some community response to the long-time failure to adapt the Captcha system for visually-impaired people.

Additionally, recent spam attacks have shown that attackers are able to circumvent the Captcha system relatively easily, which puts undue burden on our communities to revert undesired contributions manually. Tim Starling concluded, in 2014, that our Captcha system was trivial to circumvent with off-the-shelf Open Source tools.

Localisation of interfaces has proven a key strength of the movement. However, the Captcha system only uses English vocabulary for the tests. Some number of users will be unable to detect words in the image because of their unfamiliarity with English language or Latin characters.

Modern Captcha systems, like reCAPTCHA, have moved away from meaningless tasks into meaningful generation of value. For example, reCAPTCHA includes image classification tasks that improve self-driving car AI. As the foundation considers options for microcontributions, the Captcha presents an opportunity to generate value out of users' work.

The W3C's recommendations on Captchas provide guidance on best ways to implement these kinds of systems. Improving our captcha to hew closer to these guidelines would make us better participants on the Web.

Outcomes
  • More human beings can pass the captcha (lower false negatives)
  • Fewer bots can pass the captcha (lower false positives)
  • Microcontributions are generated by Captcha
  • Visually-impaired new users are explicitly welcomed
  • Non-English-speakers are explicitly welcomed
Baseline Metrics
  • The ConfirmEdit extension has no mechanism for allowing visually-impaired people to confirm (false negative, a11y)
  • The ConfirmEdit extension uses only English vocabulary (false negative, i18n)
  • The ConfirmEdit extension is too easy to crack with off-the-shelf tools (false positive)
  • The ConfirmEdit extension is make-work and doesn't take advantage of the opportunity to do microcontributions
Target Metrics
  • Fewer spam attacks (TODO: quantify)
  • Higher participation rates (TODO: quantify)
Stakeholders
  • Security
  • Growth
  • (???) Is there a stakeholder for a11y?
  • Language
  • Scoring
Known Dependencies/Blockers
  • Captchas have been the subject of patents, such as US7929805B2.
  • At least one ticket T6845 has been open for 13 years. It may be hard to get organizational buy in to close this ticket.