One thing we might consider is a variation on reCaptcha's early goal: let's use WMF captchas to help digitize scanned texts for Wikisource.
- Scanned texts (dejavu?) are processed through OCR.
- OCR issues are identified (e.g. scanned text 'word' caught by spell check as misspelling, image region clipped for use in captcha)
- One of two images presented in captcha is drawn from a pool of OCR issues, the 'solution' for this image should match a spelling dictionary or fuzzy match the OCR text. Solutions are stored until a statistically significant percentage of results are exactly the same.
- The other of two images presented in captcha must match solution exactly.