Download presentation
Presentation is loading. Please wait.
1
CSCE 590 Web Scraping – Information Retrieval
Topics Information Retrieval Named Entities Readings: SLP by Jurafsky and Martin Chap 22 March 2, 2017
2
Information Retrieval
Information Retrieval - knowledge Extraction from Natural Language Speech and Language Processing –Chap 22 Online book:
3
Information extraction
Information extraction – turns unstructured information buried in texts into structured data Extract proper nouns – “named entity recognition” Reference resolution – \ named entity mentions Pronoun references Relation Detection and classification Event detection and classification Temporal analysis Template filling
4
Template Filling Example template for “airfare raise”
5
Figure 22.1 List of Named Entity Types
6
Figure 22.2 Examples of Named Entity Types
7
Figure 22.3
8
Figure 22.4 Categorical Ambiguity
9
Figure 22.5 Chunk Parser for Named Entities
10
Figure 22.6 Features used in Training NER
Gazetteers – lists of place names
11
Figure 22.7 Selected Shape Features
12
Figure 22.8 Feature encoding for NER
13
Figure 22.9 NER as sequence labeling
14
Figure 22.10 Statistical Seq. Labeling
15
Evaluation of Named Entity Rec. Sys.
Recall terms from Information retreival Recall = #correctly labeled / total # that should be labeled Precision = # correctly labeled / total # labeled F- measure where β weights preferences β=0 balanced β>1 favors recall β<1 favors precision
16
Relation Detection and classification
Consider Sample text: Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has increased fares by [MONEY $6] per round trip on flights to some cities also served by lower-cost carriers. [ORG American Airlines], a unit of [ORG AMR Corp.], immediately matched the move, spokesman [PERSON Tim Wagner] said. [ORG United Airlines] an unit of [ORG UAL Corp.], said the increase took effect [TIME Thursday] and applies to most routes where it competes against discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver] to [LOC San Francisco]. After identifying named entities what else can we extract? Relations
17
Figure 22.11 Example relations
18
Figure 22.12 Example Extraction
19
Figure 22.13 Supervised Learning Approaches to Relation Analysis
Algorithm two step process Identify whether pair of named entities are related Classifier is trained to label relations
20
Factors used in Classifying
Features of the named entities Named entity types of the two arguments Concatenation of the two entity types Headwords of the arguments Bag-of-words from each of the arguments Words in text Bag-of-words and Bag-of-digrams Stemmed versions Distance between named entities (words / named entities) Syntactic structure Parse related structures
21
Figure 22.14
22
Figure 22.15 Sample features Extracted
23
Figure 22.16 Bootstrapping Relation Extraction
24
Semantic Drift Note it will be difficult (impossible) to get annotated materials for training Accuracy of process is heavily dependant on initial sees Semantic Drift – Occurs when erroneous patterns(seeds) leads to the introduction of erroneous tuples
25
Figure 22.17 Temporal and Event
Absolute temporal expressions Relative temporal expressions
26
Figure 22.18
27
Figure 22.19
28
Figure 22.20
29
Figure 22.21
30
Figure 22.22
31
Figure 22.23
32
Figure 22.24
33
Figure continued
34
Figure 22.25
36
Figure 22.26
37
Figure 22.27
38
Figure 22.28
39
Figure 22.29
40
Figure 22.30
41
Figure 22.31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.