Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.

Slides:

Advertisements

Similar presentations

An Introduction to GATE

Advertisements

CICWSD: programming guide

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Evaluation of NLP Systems Martin Hassel KTH NADA Royal Institute of Technology Stockholm

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.

Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.

Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.

Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,

L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Unsupervised Word Sense Disambiguation REU, Summer, 2009.

HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.

Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.

Compiler Construction CPCS302 Dr. Manal Abdulaziz.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Utilizing vector models for automatic text lemmatization Ladislav Gallay Supervisor: Ing. Marián Šimko, PhD. Slovak University of Technology Faculty of.

July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.

Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Natural Language Processing (NLP)

Compiler Lecture 1 CS510.

WordNet WordNet, WSD.

A method for WSD on Unrestricted Text

Text Mining & Natural Language Processing

Natural Language Processing (NLP)

Unsupervised Word Sense Disambiguation Using Lesk algorithm

Artificial Intelligence 2004 Speech & Natural Language Processing

Natural Language Processing (NLP)

Presentation transcript:

Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional Development Fund General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS - Andrei Mincă - SenDiS – WSD model, components, algorithms, methods & results

Page 2 SenDiS WSD model

Page 3 SenDiS System components

Page 4 SenDiS  Order Lexicon Network (OLN)  Build Meaning Semantic Signatures (BMSS)  Compare Meaning Semantic Signatures (CMSS)  Compute WSD Variants (CwsdV) WSD phases

Page 5 SenDiS  Input: unordered lexicon network  lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN)  Output: ordered lexicon network OLN Algorithms

Page 6 SenDiS  Input a lexicon network (not necessarily ordered) a meaning ( ID )  Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above  Output : a semantic interpretation (signature) for the meaning BMSS Algorithms

Page 7 SenDiS  Input: two or more semantic signatures  comparison depends on the nature of the semantic signatures  Output: degrees of similarity CMSS Algorithms

Page 8 SenDiS  Input : a matrix with degrees of similarity between the context words sense  Output : one or several WSD variants with the highest cost CwsdV Algorithms

Page 9 SenDiS  Input text list of meanings lexicon network  Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants  Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods

Page 10 SenDiS  tokenization  part-of-speech tagging  lemmatization  sense interpretations  chunking  parsing general WSD requirements

Page 11 SenDiS  Performance indicators P - precision P = noCorrectlyDisambiguated_TargetWords / noDisambiguated_TargetWords R - recall R = noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R)  state-of-the-art results (F-measure) lexical sample task coarse-grained: ~ 90% fine-grained: ~ 73% All-words task coarse-grained: ~83% fine-grained: ~ 65% Testing WSD

Page 12 SenDiS  A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNs x nOLNs x nBMSSs x nCMSSs x nCwsdVs x nWSDMs x nCorpusTests

Page 13 SenDiS Results Senseval 2 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 224WN_ex meaning interpretations only for recognized lemmas 225WN_ex % coverage for GRAALAN Inflection Form Entries 225WN_ex % IFEs + corpus target words lemmas tags Senseval 3 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 254WN_ex no IFEs 265WN_ex % IFEs 256WN_ex % IFEs + corpus target words lemmas tags Semcor No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 33,855WN_ex % IFEs 33,866WN_ex % IFEs + corpus target words lemmas tags

Page 14 SenDiS Tagged glosses as a Test Corpus WN_ex No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 206,941WN_ex only corpus target words lemmas tags 158,378WN_ex % IFEs 158,667WN_ex % IFEs + corpus target words lemmas tags LLR_99% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 106,899LLR_99% no IFEs 110,596LLR_99% % IFEs 110,635LLR_99% % IFEs + corpus target words lemmas tags LLE_2% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 2,927LLE_2% no IFEs 3,125LLE_2% % IFEs 3,071LLE_2% % IFEs + corpus target words lemmas tags

Page 15 SenDiS