User:Tommaso Petrolito/sandbox

From mediawiki.org

Wordnets in the World[edit]

Wordnets in the World
Language Resource name Developer(s) Contact Online Browsing License Point of Access
Afrikaans Afrikaans WordNet North-West University, South Africa Gerhard van Huyssteen Ané Bekker NO OPEN FOR ACADEMIC USE Afrikaans WordNet
Albanian AlbaNet Vlora University, Vlora, Albania Ervin Ruci YES OPEN (GPL) AlbaNet
Arabic Arabic WordNet Arabic WordNet Horacio Rodriguez NO OPEN AWN - ArabicWN
Multilingual (Arabic/English/Malaysian/Indonesian/Finnish/Hebrew/Japanese/Persian/Thai/French) Open Multilingual Wordnet Linguistics and Multilingual Studies, NTU Francis Bond NO OPEN Open Multilingual Wordnet
Multilingual (Hindi/Indonesian/Japanese/Lao/Mongolian/Burmese/Nepali/Sinhala/Thai/Vietnamese) Asian WordNet National Electronics and Computer Technology Center (NECTEC) - Thai Computational Linguistics Laboratory (TCL) - NICT, Kyoto, Japan Virach Sornlertlamvanich at tcllab/Virach Sornlertlamvanich at nectec Hitoshi Isahara YES OPEN BSD (only Thai WordNet downloadable right now), BROWSE ONLINE ONLY Asian WordNet Browse Online
Multilingual (Malaysian/Indonesian) Wordnet Bahasa Linguistics and Multilingual Studies, NTU wn-msa-developers Francis Bond YES OPEN Wordnet Bahasa
Multilingual (Bantu languages) African WordNet University of South Africa (UNISA) in Pretoria - North-West University, South Africa Sonja Bosch Handré Groenewald N/A N/A N/A
Multilingual (English/Spanish/Catalan/Basque/Italian) Web EuroWordnet Interface 0.2 (by LSI-UPC) University of the Basque Country - Department of Software, Technical University of Catalonia (UPC) Eneko Agirre Arantza Díaz-Ilarraza German Rigau YES BROWSE ONLINE ONLY Web EuroWordnet Interface 0.2 (by LSI-UPC)
Bulgarian BulNet Institute for Bulgarian Language (IBL), Bulgarian Academy of Sciences, Sofia, Bulgaria Svetla Koeva NO RESTRICTED BulNet - Bulgarian WordNet WebSite ELDA/ELRA Bulgarian WordNet (BulNet)
Multilingual (Bulgarian/Czech/Greek/Romanian/Serbian/Turkish) BalkaNet DATABASE LABORATORY, COMPUTER ENGINEERING AND INFORMATICS DEPARTMENT UNIVERSITY OF PATRAS Dimitrios N. Christodoulakis George Atanassov Totkov YES (Not Working) N/A BalkaNet
Chinese Academia Sinica Bilingual Ontological Wordnet Academica Sinica, Taipei, Republic of China (Taiwan) Chu-Ren Shu-Kai Hsieh YES BROWSE ONLINE ONLY Academica Sinica Bilingual Ontological Wordnet
Croatian Croatian WordNet (CroWN) University of Zagreb, Faculty of Humanities and Social Sciences, Department/Institute of Linguistics Ida Raffaelli Marko Tadic NO OPEN CC-BY-NC-SA META-SHARE - Croatian WordNet (CroWN)
Czech Croatian WordNet (CroWN) Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Karel Pala NO OPEN OLD VERSION: CC-BY-NC-SA - NEW VERSION: RESTRICTED Czech WordNet 1.9 PDT ELDA/ELRA Czech WordNet
Danish DanNet Center for Sprogteknologi, Københavns Universitet, Copenhagen, Denmark - Society for Danish Language and Literature, DSL, Denmark Bolette Sandford Pedersen Jørg Asmussen YES OPEN DanNet
Dutch EuroWordNet Dutch University of Amsterdam Piek Vossen NO RESTRICTED ELDA/ELRA EuroWordNet Dutch
Dutch Combinatorial and Relational Network as Toolkit for Dutch Language Technology (Cornetto) Language and Communication, Faculty of Arts, Vrije Universiteit Amsterdam - Nederlandse Taalunie Piek Vossen YES RESTRICTED (Browse Online Free) Cornetto WebSite - BROWSE ONLINE - TST Centrale Cornetto
English WordNet 3.01 Princeton University Christiane D. Fellbaum YES OPEN WordNet
English EuroWordNet English University of Sheffield Yorick Wilks NO RESTRICTED ELDA/ELRA EuroWordNet English Addition to English WordNet
Estonian Estonian Wordnet (EstWN) University of Tartu Heili Orav YES OPEN CC - BY - NC Estonian Estonian Wordnet (EstWN) META-SHARE Estonian WordNet(EstWordNet) ELDA/ELRA EuroWordNet Estonian
Finnish FinnWordNet University of Helsinki Krister Lindén YES OPEN CC-BY FinnWordNet – the Finnish WordNet
French EuroWordNet French Université d’Avignon et des Paysdu Vaucluse (AVI) Laboratoire d’Ínformatique Memodata, Caen Marc Elbeze Dominique Dutoit NO RESTRICTED ELDA/ELRAEuroWordNet French
French WOLF ALPAGE team: INRIA, Université Paris Diderot - Paris 7 Benoît Sagot NO OPEN Cecill-C WOLF
German GermaNet Universität Tübingen Erhard Hinrichs NO OPEN FOR ACADEMIC USE GermaNet
German EuroWordNet German Universität Tübingen Erhard Hinrichs NO RESTRICTED ELDA/ELRA EuroWordNet German
Hebrew Hebrew WordNet University of Haifa, Israel Shuly Wintner YES OPEN Hebrew WordNet BROWSE ONLINE (on MultiWordNet)
Hindi Hindi WordNet Indian Institute of Technology Bombay Powai, Mumbai Pushpak Bhattacharya YES OPEN GNU FDL Hindi WordNet
Hungarian Hungarian WordNet (HuWN) University of Szeged et al Veronika Vincze NO OPEN MS-NC-NoReD Hungarian WordNet (HuWN)
Icelandic NordicNet (HuWN) Viki, Iceland Heidar Jon Hannesson NO N/A N/A
Italian ItalWordNet (EuroWordNet Italian) Istituto di Linguistica Computazionale C.N.R., Pisa, Italy Nicoletta Calzolari Adriana Roventini Rita Marinelli YES RESTRICTED ItalWordNet ELDA/ELRA
Multilingual (Italian/Spanish/Portuguese/Hebrew/Romanian/Latin) MultiWordNet Fondazione Bruno Kessler, Center for Communication and Information Technology, Human Language Technology Group, Trento, Italy Christian Girardi MultiWordNet Staff YES OPEN CC-BY 3.0 (Filled Request Form required) MultiWordNet
Irish Líonra Séimeantach na Gaeilge (LSG) Irish Language Semantic Network Saint Louis University, Missouri, USA Kevin Scannel YES (3D Browser) GNU GPL v3 Líonra Séimeantach na Gaeilge Webpage wordnet-gaeilge on Google Code
Japanese Japanese WordNet National Institute of Information and Communications Technology of Japan (NICT), Kyoto, Japan Hitoshi Isahara (Kyoto) Francis Bond YES OPEN JP EN Japanese WordNet
Multilingual (Japanese/Chinese/German) Multi-Lingual Semantic Network project Darren Cook Darren Cook YES MIT License (equivalent to the BSD and LGPL) Multi-Lingual Semantic Network project
Multilingual (Hindi/Assamese/Bengali/Bodo/Gujarati/Kannada/Kashmiri/Konkani/Malayalam/Meitei/Marathi/Nepali/Sanskrit/Tamil/Telugu/Punjabi/Urdu/Oriya) IndoWordNet Center for Indian Language Technology - CFILT, Indian Institute of Technology Bombay, Powai, Mumbai Prof. Pushpak Bhattacharya YES BROWSE ONLINE ONLY IndoWordNet
Kannada IndianNet N/A Senthil Nathan N/A N/A N/A
Korean N/A N/A Seo Jung Yun N/A N/A N/A
Korean KorLex (Korean WordNet) Pusan National University Aesun Yoon YES (Not Working) BROWSE ONLINE ONLY KorLex (Korean WordNet) (KorLex) Browse Online (login as ‘guest’)
French Alexandria MEMODATA, CAEN (FRANCE) Dominique Dutoit YES BROWSE ONLINE ONLY Memodata - Alexandria Browse Online
Latin MultiWordNet Latin University of Verona Stefano Minozzi YES CC-BY 3.0 (Filled Request Form required) MultiWordNet
Latvian N/A Institute of Mathematics and Computer Science, University of Latvia Normunds Gruzitis N/A N/A N/A
Macedonian Macedonian WordNet Ss. Cyril and Methodius University & Staffordshire University Martin Saveski & Igor Trajkovski NO CC BY-NC 3.0 (Upon request to Dr. Saveski ) Automatic Construction of Wordnets by Using Machine Translation and Language Modeling
Maltese N/A University of Malta Michael Spagnol N/A N/A N/A
Marathi Marathi Wordnet Indian Institute of Technology Bombay Powai, Mumbai Pushpak Bhattacharya YES BROWSE OLINE ONLY Marathi WordNet
Moldavian BalkaNet Kishinev, Moldova, Institute of Mathematics of Academy of Sciences of Moldova Valentina Demidova N/A N/A BalkaNet
Nepali N/A Kishinev, Moldova, Kathmandu University Niraj Shrestha N/A N/A N/A
Norwegian Norwegian wordnet Bergen University, Norway Helge Dyvik N/A N/A N/A
Oriya N/A Utkal University Sanghamitra Mohanty N/A N/A N/A
Persian PersiaNet PersiaNet New Jersey, USA F. Keyvan N/A N/A N/A
Persian FarsNet Shahid Beheshti University, Tehran, Iran, and Iran Telecommunication Research Center (ITRC) Mehrnoush Shamsfard YES OPEN FOR ACADEMIC USE FarsNet Web Application
Persian Persian Wordnet University of Tehran, NLP Lab, Tehran, Iran Heshaam Faili Mortaza Montazery NO OPEN Persian Wordnet
Polish plWordNet (Slowosiec) Wroclaw University of Technology Dr. Maciej Piasecki YES OPEN FOR ACADEMIC USE plWordNet (Slowosiec) Browse Online
Polish Polish WordNet Adam Mickiewicz University (Poznan’, Poland) Dr. Prof. Z. Vetulani NO Not available yet, but will be Polish WordNet
Portuguese WordNet.PT – Portuguese WordNet Centro de Linguística da Universidade de Lisboa Palmira Marrafa YES BROWSE ONLINE ONLY WordNet.PT Browse Online
Portuguese OpenWN-PT (Brazilian Portuguese Wordnet) Fundação Getúlio Vargas University, Escola de Matemática Aplicada, Rio de Janeiro, Brazil Alexandre Rademaker YES OPEN OpenWN-PT on GitHub
Romanian BalkaNet Romanian Alexandru Ioan Cuza University, Iasi, Romania and Institute for Artificial Intelligence, Romanian Academy, Bucharest Dan Cristea Dan Tufis YES OPEN Browse Online BalkaNet Romanian at MultiWordNet BalkaNet
Romanian Romanian WordNet Institute for Artificial Intelligence, Romanian Academy, Bucharest Dan Tufis YES BROWSE ONLINE ONLY Romanian WordNet Browser
Russian Russian Wordnet (Русский Wordnet) Wordnet.ru Ilya Gelfenbeyn NO OPEN Wordnet.ru
Russian RussNet University of Saint Petersburg, Russia Irina Azarova NO RESTRICTED (sample available for free) RussNet
Russian Russian WordNet Center for Information Research at the Computer Center of the Moscow State University Tatyana Yudina N/A N/A N/A
Sanskrit N/A Utkal University Sanghamitra Mohanty N/A N/A ilts-utkal.org
Sanskrit Sanskrit Wordnet Indian Institute of Technology, Centre for Indian Language Technology Bombay Powai, Mumbai Malhar Kulkarni Pushpak Bhattacharya YES OPEN GNU FDL (Filled Request Form required) Sanskrit Wordnet Browse Online
Serbian Balkanet Serbian Faculty of Mathematics, University of Belgrade, Serbia Gordana Pavlovic-Lazetic Dusko Vitas N/A N/A BalkaNet
Serbian Serbian Wordnet (SrpWN) Faculty of Mathematics, University of Belgrade, Serbia Gordana Pavlovic-Lazetic Cvetana Krstev NO OPEN CC_BY-NC Serbian Wordnet (SrpWN)
Slovenian sloWNet Dept. of Translation, Faculty of Arts, University of Ljubljana and Dept. for Knowledge Technologies, Jozef Stefan Institute Darja Fiser Cvetana Krstev YES OPEN Browse Online or email Darja Fiser for a download link
Spanish EuroWordNet Spanish UNED/UPC/UB Felisa Verdejo NO RESTRICTED ELDA/ELRA
Swedish N/A Dept. of Swedish University of Goteborg Maria Toporowska Gronostaj N/A N/A N/A
Swedish N/A Department of Linguistics & Phonetics, Lund University Åke Viberg N/A N/A N/A
Tamil Tamil Wordnet AU-KBC Research Centre MIT Campus of Anna University Chromepet Lalitha Devi Sobha NO OPEN Tamil WordNet project Download Tamil WordNet
Turkish BalkaNet Turkish Center for Turkish Language and Speech Processing Kemal Oflazer YES (Not Working) N/A Browse Online (Not Working) BalkaNet

Wordnet Annotated Corpora in the World[edit]

Wordnet Annotated Corpora in the World
Language Name SemCor Words Taggable Tagged Developer Contact Online Browsing License Point of Access
Bulgarian BulSemCor NonSemCor 101,062 N/A 99,480[1] Department of Computational Linguistics, Bulgarian Academy of Sciences, Sofia, Bulgaria Svetla Koeva YES BROWSE ONLINE ONLY (downloadable excerpts freely under META-SHARE NoRedistribution Non-Commercial license) BulSemCor
Basque EPEC Eusemcor (Basque Semcor) NonSemCor 300,000 N/A N/A University of the Basque Country, IXA Group, Natural Language Processing Eneko Agirre Mikel Esnaola YES BROWSE ONLINE ONLY EPEC Eusemcor Improving the BasqueWordNet by corpus annotation.
Spanish spsemcor NonSemCor 850,000 N/A 23,307 University of the Basque Country, IXA Group, Natural Language Processing German Rigau YES BROWSE ONLINE ONLY spsemcor Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses.
Dutch DutchSemCor NonSemCor 500 Mln N/A 282,503[2] Language and Communication, Faculty of Arts, Vrije Universiteit Amsterdam - Tilburg centre for Creative Computing, Faculty of Arts, University of Tilburg - ISLA, Faculty of Science, University of Amsterdam Piek Vossen NO N/A (downloadable excerpts and statistics free) DutchSemCor
English SemCor3.0-all SemCor 359,732 N/A 192,639 Princeton University Christiane D. Fellbaum NO OPEN SemCor
English SemCor3.0-verbs SemCor 316,814 N/A 41,497 Princeton University Christiane D. Fellbaum NO OPEN SemCor
English Princeton WordNet Gloss Corpus NonSemCor 1,621,129 656,066 449,355 Princeton University Christiane D. Fellbaum NO OPEN Princeton WordNet Gloss Corpus
English MASC NonSemCor 504,299 N/A 100,000 Vassar College, Department of Computer Science, Columbia University, Center for Computational Learning Systems, International Computer Science Institute, Berkeley Nancy Ide Rebecca J. Passonneau Collin F. Baker NO OPEN (distributed without license or other restrictions.) About MASC Download MASC
English Senseval NonSemCor 5,000 2,212 2,212 University of Pennsylvania Nancy Ide Benjamin Snyder Martha Palmer NO OPEN (distributed without license or other restrictions at the Senseval-3 website) Senseval-3 Data, "English all words" task
Japanese Jsemcor SemCor 380,000 150,000 58,000 National Institute of Information and Communications Technology of Japan (NICT), Kyoto, Japan Francis Bond NO OPEN Japanese WordNet: Current Release & Downloads
Multilingual (English/Chinese/Indonesian/Japanese) NTU-MC NonSemCor English (115,843) Chinese (105,879) Indonesian (55,865) Japanese (49,144) English (62,619) Chinese (67,159) Indonesian (36,712) Japanese (20,049) English (51,147) Chinese (36,173) Indonesian (27,796) Japanese (15,395) Nanyang Technological University, Division of Linguistics and Multilingual Studies, Singapore Francis Bond NO OPEN CC BY Tagging is still underway snapshots available here
German WebCaGe NonSemCor N/A N/A 10,750 Universität Tübingen Erhard Hinrichs NO OPEN BY-SA WebCaGe
Multilingual (English/Romanian) SemCor En-Ro corpus SemCor Romanian (175,603) English (178,499) Romanian (88,874) Romanian (48,392) Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy Dan Tufiş YES (at MultiSemCor Browser) OPEN MS Commons-BY-NC-ND META-SHARE SemCor-Ro
Romanian RoSemCor SemCor N/A N/A N/A Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy Dan Tufiş Dan Cristea NO N/A ELDA/ELRA RoSemCor
Multilingual (English/Italian) MultiSemCor+ SemCor English (258,499) Italian (268,905) Italian (121,175)[3] English (119,802) Italian (92,420) Fondazione Bruno Kessler, Center for Communication and Information Technology, Human Language Technology Group, Trento, Italy Christian Girardi YES OPEN CC-BY 3.0 (Filled Request Form required) MultiSemCor Webpage MultiSemCor Browser
Italian ISST (Italian Syntactic-Semantic Treebank) NonSemCor 305,547 N/A 81,236 National Research Council, Institute of Computational Linguistics, Pisa, Italy Simonetta Montemagni NO OPEN FOR ACADEMIC USE ISST (Italian Syntactic-Semantic Treebank)

Citations[edit]

WordNets[edit]

  • als Ervin Ruci (2008) On the current state of Albanet and related applications, Technical Report, University of Vlora
  • arb Horacio Rodríguez, David Farwell, Javi Farreres, Manuel Bertran, Musa Alkhalifa, M. Antonia Martí, William Black, Sabri Elkateb, James Kirk, Adam Pease, Piek Vossen, and Christiane Fellbaum. Arabic WordNet: Current State and Future Extensions in: Proceedings of the Fourth International GlobalWordNet Conference - GWC 2008, Szeged, Hungary, January 22-25, 2008
  • cat, eus, glg, spa, Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012) Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.
  • eng Christiane Fellbaum. (ed.) (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press; George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41.
  • fre Benoit Sagot and Darla Fišer (2008) Building a free French wordnet from multilingual resources, E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
  • heb Noam Ordan and Shuly Wintner (2007) Hebrew WordNet: a test case of aligning lexical databases across languages. International Journal of Translation 19(1):39–58, 2007
  • ita Emanuele Pianta, Luisa Bentivogli and Christian Girardi. (2002) MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302.
  • ind,zsm Nurril Hirfana Mohamed Noor, Suerya Sapuan and Francis Bond (2011) Creating the open Wordnet Bahasa In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pages 258–267. Singapore
  • jpn Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki (2008) Development of Japanese WordNet. In LREC-2008, Marrakech.
  • fas Montazery, Mortaza and Heshaam Faili (2010) Automatic Persian WordNet Construction, the 23rd International conference on computational linguistics pp. 846–850
  • fin Lindén K., Carlson. L., (2010) FinnWordNet — WordNet påfinska via översättning,LexicoNordica — Nordic Journal of Lexicography, 17:119–140
  • rom Dan Tufiş, Verginica Barbu Mititelu, Dan Ştefănescu, Radu Ion, TheRomanian Wordnet in a Nutshell. Language and Evaluation, Springer, Vol. 47, no. 2, 2013, ISSN 1574-020X, DOI: 10.1007/s10579-013-9230-7
  • pol Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009) A Wordnet from the Ground Up.] Wroclaw: Oficyna Wydawnicza Politechniki Wroclawskiej, Poland.
  • por Valeria de Paiva and Alexandre Rademaker (2012) Revisiting a Brazilian wordnet. In Proceedings of Global Wordnet Conference, Matsue. Global Wordnet Association. (also with Gerard de Melo's contribution)
  • tha Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T., Mokarat C., Sornlertlamvanich V., Isahara H. (2009) Thai Wordnet Construction, Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) Suntec, Singapore

Wordnet Annotated Corpora[edit]

References[edit]

  1. Both lexical and function words were subject to annotation
  2. 282,503 tagged manually by two annotators, 400,000+ by at least one annotator, and millions automatically
  3. According to Bentivogli and Pianta (2005), 23,4% of Italian words still need to be tagged, so we can estimate (given that 92,820 is the 76,6%) the taggable words at 121,175