User:Tommaso Petrolito/sandbox
Wordnets in the World[edit]
Wordnet Annotated Corpora in the World[edit]
Citations[edit]
WordNets[edit]
- als Ervin Ruci (2008) On the current state of Albanet and related applications, Technical Report, University of Vlora
- arb Horacio Rodríguez, David Farwell, Javi Farreres, Manuel Bertran, Musa Alkhalifa, M. Antonia Martí, William Black, Sabri Elkateb, James Kirk, Adam Pease, Piek Vossen, and Christiane Fellbaum. Arabic WordNet: Current State and Future Extensions in: Proceedings of the Fourth International GlobalWordNet Conference - GWC 2008, Szeged, Hungary, January 22-25, 2008
- cat, eus, glg, spa, Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012) Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.
- eng Christiane Fellbaum. (ed.) (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press; George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41.
- fre Benoit Sagot and Darla Fišer (2008) Building a free French wordnet from multilingual resources, E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
- heb Noam Ordan and Shuly Wintner (2007) Hebrew WordNet: a test case of aligning lexical databases across languages. International Journal of Translation 19(1):39–58, 2007
- ita Emanuele Pianta, Luisa Bentivogli and Christian Girardi. (2002) MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302.
- ind,zsm Nurril Hirfana Mohamed Noor, Suerya Sapuan and Francis Bond (2011) Creating the open Wordnet Bahasa In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pages 258–267. Singapore
- jpn Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki (2008) Development of Japanese WordNet. In LREC-2008, Marrakech.
- fas Montazery, Mortaza and Heshaam Faili (2010) Automatic Persian WordNet Construction, the 23rd International conference on computational linguistics pp. 846–850
- fin Lindén K., Carlson. L., (2010) FinnWordNet — WordNet påfinska via översättning,LexicoNordica — Nordic Journal of Lexicography, 17:119–140
- rom Dan Tufiş, Verginica Barbu Mititelu, Dan Ştefănescu, Radu Ion, TheRomanian Wordnet in a Nutshell. Language and Evaluation, Springer, Vol. 47, no. 2, 2013, ISSN 1574-020X, DOI: 10.1007/s10579-013-9230-7
- pol Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009) A Wordnet from the Ground Up.] Wroclaw: Oficyna Wydawnicza Politechniki Wroclawskiej, Poland.
- por Valeria de Paiva and Alexandre Rademaker (2012) Revisiting a Brazilian wordnet. In Proceedings of Global Wordnet Conference, Matsue. Global Wordnet Association. (also with Gerard de Melo's contribution)
- tha Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T., Mokarat C., Sornlertlamvanich V., Isahara H. (2009) Thai Wordnet Construction, Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) Suntec, Singapore
Wordnet Annotated Corpora[edit]
- bul Svetla Koeva, Svetlozara Leseva, Ekaterina Tarpomanova, Borislav Rizov, Tsvetana Dimitrova, Hristina Kukova. Bulgarian Sense Annotated Corpus - Results and Achievements. In Tadić, M., Dimitrova- Vulchanova, M. and Koeva, S. (eds.): Proceedings of the 7th International Conference of Formal Approaches to South Slavic and Balkan Languages (FASSBL-7), 4-6 October 2010, Dubrovnik, Croatia, pp. 41-48. ISBN 978-953-55375-2-6.
- baq Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, EliIzagirre, Karmele Mendizabal, Eli Pociello, and Mikel Quintian. 2006. Improving the Basque WordNet by corpus annotation. In Proceedings of the Third International WordNet Conference, pages 287-290.
- dut Vossen P., Görög, A., Laan, F., Van Gompel, M., Izquierdo, R. , Van den Bosch, A. (2011). DutchSemCor: building a semantically annotated corpus for Dutch. In: Proceedings of Electronic Lexicography in the 21st century: New Applications for new users (eLEX2011), Bled, Slovenia, November 10-12, 2011
- eng George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. Using a semantic concordance for sense identification. In Proceedings of the ARPA Human Language Technology Workshop, pages 240-243; George A. Miller, Claudia Leacock, Randee Tengi, and Ross T. Bunker. (1993). "A Semantic Concordance." In: Proceedings of the 3 DARPA Workshop on Human Language Technology
- eng Ide, N. (2012). MultiMASC: An Open Linguistic Infrastructure for Language Research. Proceedings of the Fifth Workshop on Building and Using Comparable Corpora, held in conjunction with LREC 2012, Istanbul.
- eng Benjamin Snyder and Martha Palmer (2005) The English All-Words Task, in Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), 2004
- ger Verena Henrich, Erhard Hinrichs, and Tatiana Vodolazova: WebCAGe -- A Web-Harvested Corpus Annotated with GermaNet Senses. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, April 2012, pp. 387-396.
- ita Luisa Bentivogli, Emanuele Pianta and Marcello Ranieri MultiSemCor: an English Italian aligned corpus with a shared inventory of senses In Proceedings of the Meaning Workshop 2005, Trento, Italy, February 3-4, 2005, p. 90; Luisa Bentivogli and Emanuele Pianta. 2005. Exploiting parallel texts in the creation of multilingual semantically annotated resources: the multisemcor corpus. Natural Language Engineering, 11(3):247�261.
- ita Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci, Antonio Zampolli, Francesca Fanciulli, Maria Massetani, Remo Raffaelli, Roberto Basili, Maria Teresa Pazienza, Dario Saracino, Fabio Zanzotto, Nadia Mana, Fabio Pianesi, Rodolfo Delmonte, 2003. "Building the Italian Syntactic-Semantic Treebank", in Anne Abeillé (a cura di), Building and using Parsed Corpora, Language and Speech series, Kluwer, Dordrecht, pp. 189-210 ; Stefano Dei Rossi, Giulia Di Pietro, Maria SimiEvalita 2011:Description and Results of the SuperSense Tagging Task
- jpn Francis Bond, Timothy Baldwin, Richard Fothergill and Kiyotaka Uchimoto (2012) Japanese SemCor: A Sense-tagged Corpus of Japanese in The 6th International Conference of the Global WordNet Association (GWC-2012), Matsue.
- rum Monica Lupu, Diana Trandabat and Maria Husarciuc. A Romanian SemCor aligned to the English and Italian MultiSemCor. In 1st ROMANCE FrameNet Workshop at EUROLAN 2005 Summer School, Proceedings, pages 20{27, Cluj-Napoca, Romania, July 2005.
- spa Castellón I., Climent S., Coll-Florit M., Lloberes M. and Rigau G. Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses. Proceedings of the 6th Global WordNet Conference (GWC'12), Matsue, Japan. January, 2012.
References[edit]
- ↑ Both lexical and function words were subject to annotation
- ↑ 282,503 tagged manually by two annotators, 400,000+ by at least one annotator, and millions automatically
- ↑ According to Bentivogli and Pianta (2005), 23,4% of Italian words still need to be tagged, so we can estimate (given that 92,820 is the 76,6%) the taggable words at 121,175