Thread:Talk:CAPTCHA/Concept: Digitizing for Wikisource

One thing we might consider is a variation on reCaptcha's early goal: let's use WMF captchas to help digitize scanned texts for Wikisource.

Model:
 * Scanned texts (dejavu?) are processed through OCR.
 * OCR issues are identified (e.g. scanned text 'word' caught by spell check as misspelling, image region clipped for use in captcha)
 * One of two images presented in captcha is drawn from a pool of OCR issues, the 'solution' for this image should match a spelling dictionary or fuzzy match the OCR text. Solutions are stored until a statistically significant percentage of results are exactly the same.
 * The other of two images presented in captcha must match solution exactly.