JHU WORKSHOP - 2003 July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva,

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
1(21) HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield) Roberto Basili (University of Tor Vergata, Rome) Hamish Cunningham.
An Attack on Data Sparseness JHU –Tutorial June
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Complex Linguistic Features for Text Classification: A Comprehensive Study Alessandro Moschitti and Roberto Basili University of Texas at Dallas, University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
A Language Independent Method for Question Classification COLING 2004.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Automatic recognition of discourse relations Lecture 3.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Natural Language Processing Vasile Rus
Language Identification and Part-of-Speech Tagging
Using UMLS CUIs for WSD in the Biomedical Domain
Statistical NLP: Lecture 9
Automatic Detection of Causal Relations for Question Answering
Automatic Extraction of Hierarchical Relations from Text
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

JHU WORKSHOP July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek

JHU WORKSHOP July 30th, 2003 Our Hypotheses ● A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling ● Semantic category information will also help improve machine translation ● A noun-centric approach initially will allow bootstrapping for other syntactic categories

JHU WORKSHOP July 30th, 2003 An Example ● Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday ● Humans aboard space_vehicle dodge satellite timeref.

JHU WORKSHOP July 30th, 2003 Our Progress – Preparing the data- Pre-Workshop ● Identify a tag set ● Create a Human annotated corpus ● Create a double annotated corpus ● Process all data for named entity and noun phrase recognition using GATE Tools ● Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment

JHU WORKSHOP July 30th, 2003 The Semantic Classes for Annotators ● A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version ● Rationale: The number of semantic classes was small The classes are somewhat reliable since they were used by a team of lexicographers to code Noun senses Adjective preferences Verb preferences

JHU WORKSHOP July 30th, 2003 Semantic Classes Abstract T B Movable N Animate Q Plant PAnimal AHuman H Inanimate I Liquid LGas GSolid S Concrete C D FMNon-movable J Target Classes Annotated Evidence - - PhysQuant 4 Organic 5

JHU WORKSHOP July 30th, 2003 More Categories ● U: Collective ● K: Male ● R: Female ● W: Not animate ● X: Not concrete or animal ● Z: Unmarked We allowed annotators to choose “none of the above” (? in the slides that follow)

JHU WORKSHOP July 30th, 2003 Our Progress – Data Preparation ● Assess annotation format and define uniform descriptions for irregular phenomena and normalize them ● Determine the distribution of the tag set in the training corpus ● Analyze inter-annotator agreement ● Determine a reliable set of tags – T ● Parse all training data

JHU WORKSHOP July 30th, 2003 Doubly Annotated Data ● Instances (headwords): ● 8,950 instances without question marks. ● 8,446 of those are marked the same. ● Inter-annotator agreement is 94% (83% including question marks) ● Recall – these are non named entity noun phrases

JHU WORKSHOP July 30th, 2003 Distribution of Double Annotated Data

JHU WORKSHOP July 30th, 2003 Agreement of doubly marked instances

JHU WORKSHOP July 30th, 2003 Inter-annotator agreement – for each category 2

JHU WORKSHOP July 30th, 2003 Category distribution among agreed part 69%

JHU WORKSHOP July 30th, 2003 A few statistics on the human annotated data ● Total annotated 262,230 instances 48,175 with ? ● 214,055 with a category of those Z.5% W and X.5% 4, 5 1.6%

JHU WORKSHOP July 30th, 2003 Our progress – baselines ● Determine baselines for automatic tagging of noun phrases ● Baselines for tagging observed words in new contexts (new instances of known words) ● Baselines for tagging unobserved words Unseen words – not in the training material but in dictionary Novel words – not in the training material nor in the dictionary/Wordnet

JHU WORKSHOP July 30th, 2003 Overlap of dictionary and head nouns (in the BNC) ● 85% of NP’s covered ● only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered

JHU WORKSHOP July 30th, 2003 Preparation of the test environment ● Selected the blind portion of the human annotated data for late evaluation ● Divided the remaining corpus into training and held-out portions Random division of files Unambiguous words for training – ambiguous for testing

JHU WORKSHOP July 30th, 2003 Baselines using only (target) words Error RateUnseen words marked with MethodValid training instances blame 15.1%the first classMaxEntropy count  3 Klaus 12.6%most frequent class MaxEntropy count  3 Jerry 16%most frequent class VFIallFabio 13%most frequent class NaiveBayesallFabio

JHU WORKSHOP July 30th, 2003 Baselines using only (target) words and preceeding adjectives Error RateUnseen words marked with MethodValid training instances blame 13%most frequent class MaxEntropy count  3 Jerry 13.2%most frequent class MaxEntropyallJerry 12.7%most frequent class MaxEntropy count  3 Jerry

JHU WORKSHOP July 30th, 2003 Baselines using multiple knowledge sources ● Experiments in Sheffield ● Unambiguous tagger (assign only available semantic categories) ● bag-of-words tagger (IR inspired) window size 50 words nouns and verbs ● Frequency-based tagger (assign the most frequent semantic category)

JHU WORKSHOP July 30th, 2003 Baselines using multiple knowledge sources (cont’d) ● Frequency-based tagger 16-18% error rate ● bag-of-words tagger 17% error rate ● Combined architecture % error rate

JHU WORKSHOP July 30th, 2003 Bootstrapping to Unseen Words ● Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data ● Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.

JHU WORKSHOP July 30th, 2003 Bootstrapping: statistics 6,656 different unambiguous lemmas in the (visible) human tagged corpus...these contribute to 166,249 instances of data...134,777 instances were considered correct by the annotators ! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way

JHU WORKSHOP July 30th, 2003 Bootstrapping baselines Method% correct labelled instances Assigning the most frequent semantic tag (i.e. Abstract) 52% Using one previous word (Adjective, Noun, or Verb) (using Naive Bayes Classifier) (with reliable tagged instances) 45% (with all instances) 44.3% 1 previous and 1 following word (Adjective, Noun, or Verb) (using Naive Bayes Classifier) (with reliable tagged instances) 46.8% (with all instances) 44.5% ● Test Instances (instances of ambiguous words) : 62,853

JHU WORKSHOP July 30th, 2003 Metrics for Intrinsic Evaluation ● Need to take into account the hierarchical structure of the target semantic categories ● Two fuzzy measures based on: dominance between categories edge distance in the category tree / graph ● Results wrt inter annotator agreement is almost identical to exact match

JHU WORKSHOP July 30th, 2003 What’s next ● Investigate respective contribution of (independent) features ● Incorporate syntactic information ● Refine some coarse categories Using subject codes Using genus terms Re-mapping via Wordnet

JHU WORKSHOP July 30th, 2003 What’s next (cont’d) ● Reduce the number of features/values via external resources: lexical vs. semantic models of the context use selectional preferences ● Concentrate on complex cases (e.g. unseen words) ● Preparation of test data for extrinsic evaluation (MT)