Heili Orav & Kadri Vider

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Critical Thinking Course Introduction and Lesson 1
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.
English Lexicography.
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
Extracting an Inventory of English Verb Constructions from Language Corpora Matthew Brook O’Donnell Nick C. Ellis Presentation.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Heather Denny: AUT University Helen Basturkmen University of Auckland.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Syntactically annotated corpora of Estonian Heli Uibo Institute of Computer Science University of Tartu
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
SeL-LR&SD, LREC 2010, Valletta, Malta 1 Semantic Annotation for Semi- Automatic Positioning of the Learner Kiril Simov, Petya Osenova Linguistic Modelling.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Zdroje jazykových dat Word senses Sense tagged corpora.
Be.wi-ol.de User-friendly ontology design Nikolai Dahlem Universität Oldenburg.
Multilingual Search Shibamouli Lahiri
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
Semantic Technologies for Advanced
Chapter 7 Verbal Intercultural Communication
Lexicons, Concept Networks, and Ontologies
SENSEVAL: Evaluating WSD Systems
WP3: Supporting RTD in Language Technologies
Natural Language Processing (NLP)
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Irion Technologies (c)
WordNet: A Lexical Database for English
LANGUAGE, CULTURE, & SOCIETY
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
Build ontologies from texts and using them for IR
CSE 635 Multimedia Information Retrieval
C SC 620 Advanced Topics in Natural Language Processing
C SC 620 Advanced Topics in Natural Language Processing
Knowledge Representation for Natural Language Understanding
Natural Language Processing (NLP)
By Medha Tare & Susan A. Gelman
User’s Perspective Laurie Gerber.
Natural Language Processing (NLP)
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

Heili Orav & Kadri Vider heili@ut.ee; kadriv@ut.ee Concerning the Difference Between a Conception and its Application in the Case of the Estonian WordNet Heili Orav & Kadri Vider heili@ut.ee; kadriv@ut.ee

Estonian WordNet Today Estonian wordnet contains:

Resources Monolingual resources Bilingual resources EKSS (Explanatory Dictionary of Estonian) Dictionaries of synonyms and antonyms via complex query of KeeleWeb (http://ee.www.ee/) Bilingual resources Estonian-English Dictionary English-Estonian Dictionary Other tools of Estonian language technology

Word Sense Disambiguation New challenge for EstWordNet 100,000 tokens from Corpus of Estonian Literary Language Morphologically disambiguated text Manual sense-tagging => RISE PROBLEMS =>

Word senses in EstWN - too broad or too narrow? ?? over-generalisation, e.g. : kuduma * weave, tissue [of textiles; create a piece of cloth by interlacing strands, such as wool or cotton] kuduma * knit [make textiles by knitting] !!! Good to use translation equivalents for test ?? over-differentiation, e.g. : kool 1 school [polysemic sense that applies both to the institution and the building] => kool 2 schoolhouse [school building] => kool 3 school [educational institution]

Metaphors It is possible to distinguish between two main types of knowledge in the comprehension of a text (Õim, 1983): semantic knowledge is knowledge of extralinguistic reality; pragmatic knowledge is knowledge regulating communication (social norms, conventions). Because EstWN is based on the existing traditional dictionaries and a text corpus (providing usage information), one might suppose that the semantic information in the database reflects semantic knowledge. The addition of metaphors to the thesaurus would make it a thesaurus that combines semantic and pragmatic knowledge. It would increase the size of the thesaurus to a remarkable degree. For this reason until now we have tried to avoid the addition of metaphors.

CWC = Conceptual word combinations in EstWordNet Phraseological unit - is not a sum of its constituents, but constitute a conceptual whole Other word combinations - meaning of the whole is sum of their constituents How to recognise them in text? What is useful to annotate in WSD task?

CWCs of different types (1) Synonyms, e.g. {meenutama, meelde tuletama} ‘recall, remember’ (2) Specific hierarchical nodes, e.g. {ruumiline omadus} ‘spatial characteristic’, {üleloomulik olend} ‘supernatural creature’ (3) Technical terms, e.g. {ilmaütlev kääne} ‘abessive case’, {kreeka tähestik} ‘Greek alphabet’ (4) Explanations, e.g. {kultiveerima, kultuurina kasvatama} ‘cultivate, grow as a culture’ Only (1) and (3) are relevant from the perspective of WSD task

References Kaalep, H.-J, & Muischnek, K. (2003). Inconsistent selectional criteria in semi-automatic multi-word unit extraction. In Complex 2003, 7th conference on computational lexicography and corpus research (pp. 27-36). Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. Kahusk, N., Orav,H., & Õim, H. (2001). Sensiting infleectionality: Estonian task for SENSEVAL-2. In Proceedings of SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguating Systems (pp. 25-28). Toulouse, France: CNRS-Institut de Recerche en Informatique de Toulouse, and Univeresite des Sciences Sociales. Kahusk,N. & Vider, K. (2002). Estonian wordnet benefits from word sense disambiguation. In Proceedings of the 1st international global wordnet conference (pp. 26-31). Mysore, India: Central Institute of Indian Languages. Miller, G. A. (1979). Images and models, similes and metaphors. In A. Ortony (Ed.), Metaphor and thought (2nd ed.). Cambridge University Press. Õim, A. (1993). Fraseoloogiasõnaraamat. Tallinn, Estonia: Eesti Teaduste Akadeemia Keele ja Kirjanduse Instituut. Õim, H. (1983). Semantika i teoria ponimania jazyka. Analiz leksiki i tekstov direktivnogo obwenia estonskogo jazyka [Semantics and language understanding theory. Analysis of lexicon and texts of Estonian directive communication] Doctoral dissertation, University of Tartu.