Download presentation
Presentation is loading. Please wait.
Published byShon Jefferson Modified over 9 years ago
1
JHU WORKSHOP - 2003 July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek
2
JHU WORKSHOP - 2003 July 30th, 2003 Our Hypotheses ● A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling ● Semantic category information will also help improve machine translation ● A noun-centric approach initially will allow bootstrapping for other syntactic categories
3
JHU WORKSHOP - 2003 July 30th, 2003 An Example ● Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday ● Humans aboard space_vehicle dodge satellite timeref.
4
JHU WORKSHOP - 2003 July 30th, 2003 Our Progress – Preparing the data- Pre-Workshop ● Identify a tag set ● Create a Human annotated corpus ● Create a double annotated corpus ● Process all data for named entity and noun phrase recognition using GATE Tools ● Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment
5
JHU WORKSHOP - 2003 July 30th, 2003 The Semantic Classes for Annotators ● A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version ● Rationale: The number of semantic classes was small The classes are somewhat reliable since they were used by a team of lexicographers to code Noun senses Adjective preferences Verb preferences
6
JHU WORKSHOP - 2003 July 30th, 2003 Semantic Classes Abstract T B Movable N Animate Q Plant PAnimal AHuman H Inanimate I Liquid LGas GSolid S Concrete C D FMNon-movable J Target Classes Annotated Evidence - - PhysQuant 4 Organic 5
7
JHU WORKSHOP - 2003 July 30th, 2003 More Categories ● U: Collective ● K: Male ● R: Female ● W: Not animate ● X: Not concrete or animal ● Z: Unmarked We allowed annotators to choose “none of the above” (? in the slides that follow)
8
JHU WORKSHOP - 2003 July 30th, 2003 Our Progress – Data Preparation ● Assess annotation format and define uniform descriptions for irregular phenomena and normalize them ● Determine the distribution of the tag set in the training corpus ● Analyze inter-annotator agreement ● Determine a reliable set of tags – T ● Parse all training data
9
JHU WORKSHOP - 2003 July 30th, 2003 Doubly Annotated Data ● Instances (headwords): 10960 ● 8,950 instances without question marks. ● 8,446 of those are marked the same. ● Inter-annotator agreement is 94% (83% including question marks) ● Recall – these are non named entity noun phrases
10
JHU WORKSHOP - 2003 July 30th, 2003 Distribution of Double Annotated Data
11
JHU WORKSHOP - 2003 July 30th, 2003 Agreement of doubly marked instances
12
JHU WORKSHOP - 2003 July 30th, 2003 Inter-annotator agreement – for each category 2
13
JHU WORKSHOP - 2003 July 30th, 2003 Category distribution among agreed part 69%
14
JHU WORKSHOP - 2003 July 30th, 2003 A few statistics on the human annotated data ● Total annotated 262,230 instances 48,175 with ? ● 214,055 with a category of those Z.5% W and X.5% 4, 5 1.6%
15
JHU WORKSHOP - 2003 July 30th, 2003 Our progress – baselines ● Determine baselines for automatic tagging of noun phrases ● Baselines for tagging observed words in new contexts (new instances of known words) ● Baselines for tagging unobserved words Unseen words – not in the training material but in dictionary Novel words – not in the training material nor in the dictionary/Wordnet
16
JHU WORKSHOP - 2003 July 30th, 2003 Overlap of dictionary and head nouns (in the BNC) ● 85% of NP’s covered ● only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered
17
JHU WORKSHOP - 2003 July 30th, 2003 Preparation of the test environment ● Selected the blind portion of the human annotated data for late evaluation ● Divided the remaining corpus into training and held-out portions Random division of files Unambiguous words for training – ambiguous for testing
18
JHU WORKSHOP - 2003 July 30th, 2003 Baselines using only (target) words Error RateUnseen words marked with MethodValid training instances blame 15.1%the first classMaxEntropy count 3 Klaus 12.6%most frequent class MaxEntropy count 3 Jerry 16%most frequent class VFIallFabio 13%most frequent class NaiveBayesallFabio
19
JHU WORKSHOP - 2003 July 30th, 2003 Baselines using only (target) words and preceeding adjectives Error RateUnseen words marked with MethodValid training instances blame 13%most frequent class MaxEntropy count 3 Jerry 13.2%most frequent class MaxEntropyallJerry 12.7%most frequent class MaxEntropy count 3 Jerry
20
JHU WORKSHOP - 2003 July 30th, 2003 Baselines using multiple knowledge sources ● Experiments in Sheffield ● Unambiguous tagger (assign only available semantic categories) ● bag-of-words tagger (IR inspired) window size 50 words nouns and verbs ● Frequency-based tagger (assign the most frequent semantic category)
21
JHU WORKSHOP - 2003 July 30th, 2003 Baselines using multiple knowledge sources (cont’d) ● Frequency-based tagger 16-18% error rate ● bag-of-words tagger 17% error rate ● Combined architecture 14.5-15% error rate
22
JHU WORKSHOP - 2003 July 30th, 2003 Bootstrapping to Unseen Words ● Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data ● Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.
23
JHU WORKSHOP - 2003 July 30th, 2003 Bootstrapping: statistics 6,656 different unambiguous lemmas in the (visible) human tagged corpus...these contribute to 166,249 instances of data...134,777 instances were considered correct by the annotators ! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way
24
JHU WORKSHOP - 2003 July 30th, 2003 Bootstrapping baselines Method% correct labelled instances Assigning the most frequent semantic tag (i.e. Abstract) 52% Using one previous word (Adjective, Noun, or Verb) (using Naive Bayes Classifier) (with reliable tagged instances) 45% (with all instances) 44.3% 1 previous and 1 following word (Adjective, Noun, or Verb) (using Naive Bayes Classifier) (with reliable tagged instances) 46.8% (with all instances) 44.5% ● Test Instances (instances of ambiguous words) : 62,853
25
JHU WORKSHOP - 2003 July 30th, 2003 Metrics for Intrinsic Evaluation ● Need to take into account the hierarchical structure of the target semantic categories ● Two fuzzy measures based on: dominance between categories edge distance in the category tree / graph ● Results wrt inter annotator agreement is almost identical to exact match
26
JHU WORKSHOP - 2003 July 30th, 2003 What’s next ● Investigate respective contribution of (independent) features ● Incorporate syntactic information ● Refine some coarse categories Using subject codes Using genus terms Re-mapping via Wordnet
27
JHU WORKSHOP - 2003 July 30th, 2003 What’s next (cont’d) ● Reduce the number of features/values via external resources: lexical vs. semantic models of the context use selectional preferences ● Concentrate on complex cases (e.g. unseen words) ● Preparation of test data for extrinsic evaluation (MT)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.