User:TJones (WMF)/Notes/Language Detection Evaluation/Corpus Info

Language Identification Corpus Information
%lang  %total  lang 77.3%  41.3%  English 5.5%   3.0%  Spanish 2.6%   1.4%  Chinese 2.5%   1.3%  Portuguese 1.3%   0.7%  Arabic 1.3%   0.7%  French 1.2%   0.6%  Tagalog 1.0%   0.6%  German 0.8%   0.4%  Malay 0.6%   0.3%  Russian 0.6%   0.3%  Turkish 0.5%   0.3%  Indonesian 0.5%   0.3%  Persian 0.5%   0.3%  Swahili 0.4%   0.2%  Korean 0.3%   0.1%  Bengali 0.3%   0.1%  Bulgarian 0.3%   0.1%  Hindi 0.3%   0.1%  Italian 0.3%   0.1%  Norwegian 0.1%   0.1%  Croatian 0.1%   0.1%  Dutch 0.1%   0.1%  Estonian 0.1%   0.1%  Finnish 0.1%   0.1%  Greek 0.1%   0.1%  Hmong 0.1%   0.1%  Japanese 0.1%   0.1%  Kannada 0.1%   0.1%  Latin 0.1%   0.1%  Polish 0.1%   0.1%  Serbian 0.1%   0.1%  Somali 0.1%   0.1%  Swedish 0.1%   0.1%  Tamil 0.1%   0.1%  Thai 0.1%   0.1%  Uzbek
 * 1452 zero result queries
 * 775 (53.4%) are tagged as being in some language