Topic on Talk:CAPTCHA

Concept: Digitizing for Wikisource

1

Amgine (talkcontribs)

One thing we might consider is a variation on reCaptcha's early goal: let's use WMF captchas to help digitize scanned texts for Wikisource.

Model:

Scanned texts (dejavu?) are processed through OCR.
OCR issues are identified (e.g. scanned text 'word' caught by spell check as misspelling, image region clipped for use in captcha)
One of two images presented in captcha is drawn from a pool of OCR issues, the 'solution' for this image should match a spelling dictionary or fuzzy match the OCR text. Solutions are stored until a statistically significant percentage of results are exactly the same.
The other of two images presented in captcha must match solution exactly.