Download presentation
Presentation is loading. Please wait.
Published bySheena Hawkins Modified over 9 years ago
1
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22 February 27, 2013 CSCE 771 Natural Language Processing
2
– 2 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Overview Last Time Dialogues Human conversationsToday Slides from Lecture24 Dialogue systems Dialogue Manager Design Finite State, Frame-based, Initiative: User, System, Mixed VoiceXML Information ExtractionReadings Chapter 24, Chapter 22
3
– 3 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Information extraction Information extraction – turns unstructured information buried in texts into structured dataInformation extraction – turns unstructured information buried in texts into structured data Extract proper nouns – “named entity recognition”Extract proper nouns – “named entity recognition” Reference resolution – \Reference resolution – \ named entity mentions Pronoun references Relation Detection and classificationRelation Detection and classification Event detection and classificationEvent detection and classification Temporal analysisTemporal analysis Template fillingTemplate filling
4
– 4 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Template Filling Example template for “airfare raise”
5
– 5 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.1 List of Named Entity Types
6
– 6 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.2 Examples of Named Entity Types
7
– 7 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.3 Categorical Ambiguities
8
– 8 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.4 Categorical Ambiguity
9
– 9 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.5 Chunk Parser for Named Entities
10
– 10 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.6 Features used in Training NER Gazetteers – lists of place names www.geonames.com www.census.gov
11
– 11 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.7 Selected Shape Features
12
– 12 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.8 Feature encoding for NER
13
– 13 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.9 NER as sequence labeling
14
– 14 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.10 Statistical Seq. Labeling
15
– 15 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Evaluation of Named Entity Rec. Sys. Recall terms from Information retreivalRecall terms from Information retreival Recall = #correctly labeled / total # that should be labeled Precision = # correctly labeled / total # labeled F- measure where β weights preferencesF- measure where β weights preferences β=1 balanced β>1 favors recall β<1 favors precision
16
– 16 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin NER Performance revisited NER performance revisited Recall, Precision, F High performance systems »F ~.92 for PERSONS and LOCATIONS and ~.84 for ORG Practical NER Make several passes on text 1. Start by using highest precision rules (maybe at expense of recall) make sure what you get is right 2. Search for substring matches or previously detected names using probabilistic searches string matching metrics(Chap 19) 3. Name lists focused on domain 4. Probabilistic sequence labeling techniques using previous tags
17
– 17 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Relation Detection and classification Consider Sample text:Consider Sample text: Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has increased fares by [MONEY $6] per round trip on flights to some cities also served by lower-cost carriers. [ORG American Airlines], a unit of [ORG AMR Corp.], immediately matched the move, spokesman [PERSON Tim Wagner] said. [ORG United Airlines] an unit of [ORG UAL Corp.], said the increase took effect [TIME Thursday] and applies to most routes where it competes against discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver] to [LOC San Francisco]. After identifying named entities what else can we extract?After identifying named entities what else can we extract? RelationsRelations
18
– 18 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.11 Example semantic relations
19
– 19 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.12 Example Extraction
20
– 20 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.13 Supervised Learning Approaches to Relation Analysis Algorithm two step process 1.Identify whether pair of named entities are related 2.Classifier is trained to label relations
21
– 21 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Factors used in Classifying Features of the named entities Named entity types of the two arguments Concatenation of the two entity types Headwords of the arguments Bag-of-words from each of the arguments Words in text Words in text Bag-of-words and Bag-of-digrams Stemmed versions Distance between named entities (words / named entities) Syntactic structure Syntactic structure Parse related structures
22
– 22 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.14 a-part-of relation
23
– 23 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.15 Sample features Extracted
24
– 24 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Bootstrapping Example “Has a hub at” Consider the pattern / * has a hub at * / Google search 22.4 Milwaukee-based Midwest has a hub at KCI 22.5 Delta has a hub at LaGuardia … Two ways to fail 1.False positive: e.g. a star topology has a hub at its center 2.False negative? Just miss 22.11 No frill rival easyJet, which has established a hub at Liverpool
25
– 25 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.16 Bootstrapping Relation Extraction
26
– 26 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Using Features to restrict patterns 22.13 Budget airline Ryanair, which uses Charleroi as a hub, scrapped all weekend flights / [ORG], which uses a hub at [LOC] /
27
– 27 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Semantic Drift Note it will be difficult (impossible) to get annotated materials for training Accuracy of process is heavily dependant on initial sees Semantic Drift – Occurs when erroneous patterns(seeds) leads to the introduction of erroneous tuples
28
– 28 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.17 Temporal and Durational Expressions Absolute temporal expressions Relative temporal expressions
29
– 29 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.18 Temporal lexical triggers
30
– 30 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.19 MITRE’s tempEx tagger-perl
31
– 31 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.20 Features used to train IOB
32
– 32 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.21 TimeML temporal markup
33
– 33 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Temporal Normalization iSO 8601 - standard for encoding temporal values YYYY-MM-DD
34
– 34 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.22 Sample ISO Patterns
35
– 35 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Event Detection and Analysis Event Detection and classification
36
– 36 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.23 Features for Event Detection Features used in rule-based and statistical techniques
37
– 37 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Fig 22.24 Allen’s 13 temporal Relations
38
– 38 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.24 continued
39
– 39 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.25 Example from Timebank Corpus
40
– 40 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Template Filling
41
– 41 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.26 Templates produced by Faustus 1997
42
– 42 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.27 Levels of processing in Faustus
43
– 43 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.28 Faustus Stage 2
44
– 44 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.29 The 5 Partial Templates of Faustus
45
– 45 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.30 Articles in PubMed
46
– 46 – CSCE 771 Spring 2013 Slide from Speech and Language Processing -- Jurafsky and Martin Figure 22.31 biomedical classes of named entities
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.