Talk:CAPTCHA

Concept: Digitizing for Wikisource

Amgine

One thing we might consider is a variation on reCaptcha's early goal: let's use WMF captchas to help digitize scanned texts for Wikisource.


  • Scanned texts (dejavu?) are processed through OCR.
  • OCR issues are identified (e.g. scanned text 'word' caught by spell check as misspelling, image region clipped for use in captcha)
  • One of two images presented in captcha is drawn from a pool of OCR issues, the 'solution' for this image should match a spelling dictionary or fuzzy match the OCR text. Solutions are stored until a statistically significant percentage of results are exactly the same.
  • The other of two images presented in captcha must match solution exactly.
