Download presentation
Presentation is loading. Please wait.
Published byEustacia Regina Sparks Modified over 8 years ago
1
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
2
9/26/2016 Speech and Language Processing - Jurafsky and Martin 2 Information Extraction Turns out that these partial/parsing and chunking methods for syntax, given us the means to address shallow semantic problems as well Figure out the entities (the players, props, instruments, locations, etc.) in a text Figure out how they’re related Figure out what kind of events/activities they’re all up to And do each of those tasks in a loosely-coupled data-driven manner
3
9/26/2016 Speech and Language Processing - Jurafsky and Martin 3 Information Extraction Ordinary newswire text is often used in typical examples. And there’s an argument that there are useful applications there But the real interest/money is in specialized domains Bioinformatics Electronic medical records Stock market analysis Intelligence analysis Social media
4
9/26/2016 Speech and Language Processing - Jurafsky and Martin 4 Example App
5
9/26/2016 Speech and Language Processing - Jurafsky and Martin 5 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
6
9/26/2016 Speech and Language Processing - Jurafsky and Martin 6 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York.
7
9/26/2016 Speech and Language Processing - Jurafsky and Martin 7 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
8
9/26/2016 Speech and Language Processing - Jurafsky and Martin 8 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
9
9/26/2016 Speech and Language Processing - Jurafsky and Martin 9 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
10
9/26/2016 Speech and Language Processing - Jurafsky and Martin 10 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
11
9/26/2016 Speech and Language Processing - Jurafsky and Martin 11 NER Find and classify all the named entities in a text. What’s a named entity? A reference to an entity via the mention of its name. Colorado Rockies This is a subset of the possible mentions... Rockies, the team, it, they... Find means identify the exact span of the mention. Classify means determine the category of the entity being referred to.
12
9/26/2016 Speech and Language Processing - Jurafsky and Martin 12 NER Approaches As with partial parsing and chunking there are two basic approaches (and hybrids) Rule-based (regular expressions) Lists of names Patterns to match things that look like names Patterns to match the environments that classes of names tend to occur in. ML-based approaches Get annotated training data Extract features Train systems to replicate the annotation
13
9/26/2016 Speech and Language Processing - Jurafsky and Martin 13 ML Approach
14
9/26/2016 Speech and Language Processing - Jurafsky and Martin 14 Encoding for Sequence Labeling We can use the same IOB encoding here that we used for chunking: For N classes we have 2*N+1 tags An I and B for each class and a O for outside any class. Each token in a text gets a tag.
15
9/26/2016 Speech and Language Processing - Jurafsky and Martin 15 NER Features
16
9/26/2016 Speech and Language Processing - Jurafsky and Martin 16 NER as Sequence Labeling
17
9/26/2016 Speech and Language Processing - Jurafsky and Martin 17 NER Evaluation Suppose you employ this scheme. What’s the best way to measure performance. Probably not the per-tag accuracy we used for POS tagging. Why? It’s not measuring what we care about We need a metric that looks at the chunks not the tags
18
9/26/2016 Speech and Language Processing - Jurafsky and Martin 18 Example Suppose we were looking for Location mentions. If the system simply said O all the time it would do pretty well on a per-label basis since most words reside outside any location.
19
9/26/2016 Speech and Language Processing - Jurafsky and Martin 19 Precision/Recall/F Precision: The fraction of entities the system returned that were right “Right” means the boundaries and the label are correct given some labeled test set. Recall: The fraction of entities that system got from those that it should have gotten. F: Simple harmonic mean of those two numbers.
20
9/26/2016 Speech and Language Processing - Jurafsky and Martin 20 Entity-level evaluation We need to evaluation P/R/F at the entity level. But we may not care equally about all kinds of entities So we might weight them differently in the evaluation routine. Or consider the P/R/F for each one separately
21
9/26/2016 Speech and Language Processing - Jurafsky and Martin 21 Relations Once you have captured the entities in a text you might want to ascertain how they relate to one another. Here we’re just talking about explicitly stated relations
22
9/26/2016 Speech and Language Processing - Jurafsky and Martin 22 Information Extraction CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York
23
9/26/2016 Speech and Language Processing - Jurafsky and Martin 23 Relation Types As with named entities, the list of relations is application specific. For generic news texts...
24
9/26/2016 Speech and Language Processing - Jurafsky and Martin 24 Relations By relation we really mean sets of tuples. Think about populating a database.
25
9/26/2016 Speech and Language Processing - Jurafsky and Martin 25 Relation Analysis We can divide relation analysis into two parts Determining if 2 entities are related And if they are, classifying the relation There are 2 reasons to do this Cutting down on training time for classification by eliminating most pairs Producing separate feature-sets that are appropriate for each task.
26
9/26/2016 Speech and Language Processing - Jurafsky and Martin 26 Relation Analysis Let’s just worry about named entities within the same sentence
27
9/26/2016 Speech and Language Processing - Jurafsky and Martin 27 Features We can group the features (for both tasks) into three categories Features of the named entities involved Features derived from the words between and around the named entities Features derived from the syntactic environment that governs the two entities
28
9/26/2016 Speech and Language Processing - Jurafsky and Martin 28 Features Features of the entities Their types Concatenation of the types Headwords of the entities George Washington Bridge Words in the entities Features between and around Particular positions to the left and right of the entities +/- 1, 2, 3 Bag of words between
29
9/26/2016 Speech and Language Processing - Jurafsky and Martin 29 Features Syntactic environment Constituent path through the tree from one to the other Base syntactic chunk sequence from one to the other Dependency path
30
9/26/2016 Speech and Language Processing - Jurafsky and Martin 30 Example For the following example, we’re interested in the possible relation between American Airlines and Tim Wagner. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.