Transkribus Enabled Wikisources

From mediawiki.org

This page contains details about the Wikisources where Transkribus has been enabled as an OCR engine alongside Google and Tesseract

The models (and corresponding model IDs) available on each Wikisource
Sl. No. Wikisource Name of Model Model ID
1 Balinese Balinese palm-leaf manuscripts 16th century bali
2 Bengali Bengali printed books ben-print
3 German Dutch_XVII_Century de-17 *
Transkribus Dutch Handwriting de-hd-m1
Transkribus German Handwriting ger-hd-m1
15-16th Century German ger-15
4 English Transkribus B2022 English Model M4 en-b2022 *
Transkribus English Handwriting M3 en-handwritten-m3
Transkribus Print M1 en-print-m1
Transkribus Typewriter en-typewriter
5 Spanish Diario de Madrid 1788-1825 es-md *
SpanishRedonda_sXVI-XVII_extended_v1.2 es-redonda-extended-v1_2
6 Finnish NLF_Newseye_GT_FI_M2+ fin
7 French Transkribus French Model 1 fr-m1
8 Italian Transkribus Italian Handwriting M1 it-hd-m1
9 Polish Transkribus Polish M2 pl-m2
10 Russian Russian generic handwriting 2 rus-hd-m2
Russian print of the 18th century rus-print
11 Sanskrit Devanagari Mixed M1A san
12 Swedish Stockholm Notaries 1700 2.1 swe-2.1
The Swedish Lion I swe-lion-i
13 Yiddish The Dybbuk for Yiddish Handwriting yi-hd
14 Hindi Devanagari Mixed M1A dev
15 Czech Old Czech Handwriting (with spaces) cs-space *
Old Czech Handwriting (without spaces) cs-no-space
16 Danish 19th century Danish Gothic handwriting v.1.1 da-goth *
Danish gothic print 1859-1888 v4 da-goth-print
Gjentofte 1881-1913 Denmark da-gjen
17 Greek Ligorio 0.3 PyL el-ligo *
Noscemus GM 6 el-print
18 Estonian Estonian Court Records 19thC et-court
19 Hebrew Hebrew DiJeSt 2.0 he-dijest
20 Hungarian Hungarian handwriting 19th–20th cent. hu-hand-19
21 Latin Carolingian Minuscule Model CMM 9th-11th c. la-caro
UCL–University of Toronto #7 la-med
Pylaia_NeoLatin_Ravenstein la-neo
22 Dutch Admiraliteit Zeeland 1605-1609 compleet nl-1605 *
Dutch Mountains (18th Century) nl-mount
Dutch newspapers 17th century nl-news
23 Norwegian NorHand 1820-1940 no-1820 *
Sunnhordland Partition Protocols no-1874
24 Portuguese General Portuguese M1 pt-m1 *
SPJCL17C V4.2 pt-17
25 Romanian RTA2 (Romanian Transition Alphabet) ro-print
26 Slovenian Slovenian 18th century manuscript sl-hand-18
27 Slovak Handwritten Glagolitic sk-hand

For Wikisources with more than one model listed, the ones marked with * are currently active. All models, however, can be found on the OCR tool until a model selector is integrated into the wiki side itself (see T279405)


The Transkribus Pilot project is being undertaken in collaboration with IIIT Hyderabad and the Balinese Community. We have successfully integrated the Balinese OCR model created by IIIT Hyderabad into Wikimedia OCR.