Zdroje jazykových dat Word senses Sense tagged corpora.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Computational Lexicography Frank Van Eynde Centre for Computational Linguistics.
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Extracting Knowledge-Bases from Machine- Readable Dictionaries: Have We Wasted Our Time? Nancy Ide and Jean Veronis Proc KB&KB’93 Workshop, 1993, pp
Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
A Library of Generic Concepts for Composing Knowledge Bases Ken Barker, Bruce UTAustin Peter
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Extracting an Inventory of English Verb Constructions from Language Corpora Matthew Brook O’Donnell Nick C. Ellis Presentation.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
Taylor 6 Polysemy & Meaning Chains. Overview Many linguistic categories are associated with several prototypes. This chapter will talk about family resemblance.
Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas.
School of Computing and Mathematics, University of Huddersfield Knowledge Engineering: Issues for the Planning Community Lee McCluskey Department of Computing.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpora and Language Teaching
Domain-Specific Software Engineering Alex Adamec.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Sociopolitical Domain as a Bridge from General Words to Terms of Specific Domains Research Computing Center of Moscow State University NCO Center for Information.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
Word Sense Disambiguation (WSD)
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.
1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Wordnet - A lexical database for the English Language.
Using Semantic Relatedness for Word Sense Disambiguation
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Introduction Chapter 1 Foundations of statistical natural language processing.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
1 Word senses: a computational response Adam Kilgarriff.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
COMP6215 Semantic Web Technologies
Lexicons, Concept Networks, and Ontologies
ece 627 intelligent web: ontology and beyond
Statistical NLP: Lecture 9
WordNet WordNet, WSD.
A method for WSD on Unrestricted Text
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Zdroje jazykových dat Word senses Sense tagged corpora

Lev V. Ščerba: And indeed, every sufficiently complex word must actually become the subject of a scientific monograph; therefore it is hard to expect in the near future the completion of a good dictionary.

Word sense disambiguation The different meanings of polysemous words are known as “senses” and the process of deciding which is being used in a particular context “word sense disambiguation”

Lexical Acquisition Bottleneck In NLP many systems do not perform in practice as well as they could with adequate dictionary resources, due to the cost of production, adaptation, and maintenance of these resouces Solutions –Reusing existing dictionaries and ontologies as lexicons –Deriving disambiguation information directly from corpora

Usefulness of WSD NLP tools: –Systems – carries out some task of “interest for its own sake” (e.g. MT,IR); applications potentially interesting for non-linguists –Components – interesting for linguists and language engineers; e.g. WSD

Early approaches Preference semantics – 1970’s –Selectional constraints (e.g. ANIMATE for subject of “to drink”) Word experts – 1980’s –Hand crafted disambiguators constructed for each word separately –Limited applicability Polaroid words –Gradual disambiguation (grammar, parser, lexicon, semantic interpreter, knowledge representation language)

Dictionary Based Approaches Since 1980’s – dictionary publishers started to produced “Machine Readable Dictionaries” (now - m. tractable d.) Wider polysemy than in the systems described so far

Two claims about sense distribution One sense per discourse –There is a very strong tendency for multiple uses of a word to share the same sense in a well-written discourse One sense per collocation –With a high probability an ambiguous word has only one sense in a given collocation

Taxonomy of WSD Algorithms Knowledge based Corpus based –Tagged corpora –Untagged corpora Hybrid approaches

Word Senses and Lexicons Sense tagging = attaching senses from some lexicon to words in text Sense-enumerative dictionary

Deficiencies of dictionaries Omissions and oversights Coverage of names Ghost words – Dord=density (D or d) Differentiating senses (P.Hanks: A serious problem for computer applications if that dictionaries compiled for human users focus on giving lists of meanings for each entry, without saying much about how one meaning may be distinguished from another in text)

Two levels of sense distinction Homography –Two senses of a word are homographic when there is no obvious semantic relation between them (e.g. a ball – a dance or a rounded object) –Risk of amateur etymology Polysemy

Distinguishing senses P.Hanks: No generally agreed criteria exist for what counts as a sense, or for how to distinguish one sense from another Zeugma: Arthur and his driving license expired last Thursday. Polysemy vs. vagueness (e.g. mountain )

The Bank Model Assumption A – Words have a finite set of clearly distinct, well-defined sense Assumption B – Native speakers of … know instantly and effortlessly which meaning applies in a given situation Criticism of the bank model: Kilgarriff (“I don’t believe word senses”), Pustejovsky (Generative lexicon), and many others…

NLP Lexicons Longman Dictionary of Contemporary English (LDOCE) – three-level embedded structure for sense distinctions (homographs,senses,optional subsenses) Roget’s Thesaurus Cambridge International Dictionary of English COBUILD English Language Dictionary WordNet

Thesaurus

Ontology

There is little agreement on what an ontology is… In general, an ontology can be described as an inventory of the objects, processes, etc. in a domain, as well as a specification of (some of ) the relation that hold among them. Aristotle: genus (category to which something belongs)and differentiae (property that uniquely distinguish the category member from their parent and from one another) Nodes (concepts) in the hierarchy related by subsumption

Ontologies in different traditions Philosophical Cognitive Artificial intelligence Lexical semantics Lexicography Information science

Princeton WordNet Lexical semantic network structured around the notion of synsets Synset - skupina literálů téhož slovního druhu, které jsou v určitém kontextu vzájemně zaměnitelné („ set of synonyms “) Inspired by psycholinguistic theories of human lexical memory broad coverage, rich lexical information, freely available too fine-grained for practical NLP tasks Relations between two synsets: homonymy, hyperonymy, meronymy …

EuroWordNet (i) Multilingual database containing several monoloingual wordnets structured along the same lines as the Princeton WordNet1.5 English,Dutch,German,Spanish,French,Italian, Czech,Estonian Inter-Lingual-Index

EuroWordNet (ii) Princeton WordNet 1.5EuroWordNet note, observe, make a remark, remark prohodit, poznamenat, připomenout anmerken, bemerken...

Sense tagged corpora “interest” corpus –2kS containing the word “interest” SENSEVAL – –WSD evaluation exercise, first run in 1998 SEMCOR – Subset of the English Brown corpus,700kW –More than 200kW sense-tagged according to Princeton WordNet 1.6

Final remarks Similarity of POS- and sense tagging Mapping lexical resources