Download presentation
Presentation is loading. Please wait.
Published byEustace Riley Modified over 9 years ago
1
1 CERATOPS Center for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah
2
2 Overview Rapidly re-trainable, robust components for: 1.Information extraction of facts and entities related to events from text 2.Extraction of opinions and motivations expressed in text 3.Tracking, linking, and summarizing events and opinions and their progressions over time
3
3 Motivation for Event IE Systems Rapid semantic processing of large volumes of unstructured text Automatic merging of facts and entity relationships across sets of documents Automatic population of large databases with factual information from many text sources
4
4 OUTBREAK Disease: Victims: Location: Country: Status: Containment: Information Extraction from Text / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. The agency says it is continuing depopulation efforts on infected farms on a priority basis. After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms.
5
5 Information Extraction of Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures Keywords and named entity recognition are not sufficient. Researchers have discovered how anthrax toxin destroys cells and rapidly causes death... Troops were vaccinated against anthrax, cholera, …
6
6 3 chickens died from avian flu. Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada 3 chickens died from avian flu. SUBJ VP PP Syntactic Analysis Extraction Coreference Resolution Template Generation
7
7 New Approach: Role-Identifying Nouns Lexically role-identifying nouns are defined by the role that the noun plays in an event. Semantically role-identifying nouns strongly evoke one event role in a domain based on semantics. (Intuition from Grice’s Maxim of Relevance) Disease Reports: toddler, girl, boy victim Crime Reports: restaurant, store, hotel location kidnapper, arsonist, assassinagent (perpetrator) casualty, fatality, victim theme (victim)
8
8 Bootstrapped Learning of Role-Identifying Nouns Unannotated Texts Best Extraction Patterns Best Extractions (Nouns) Ex: assassin, arsonist, kidnapper Ex: was arrested killed by Ex: murderer, sniper, criminal
9
9 Role-Identifying Expressions Typically, a verb refers to an event and the verb’s arguments identify the role players: But sometimes, a verb identifies a role player in an event without identifying the event! participated was implicated perpetrator victimlocation was kidnapped by in
10
10 Bootrapped Learning of Role-Identifying Expressions Basilisk event nouns STEP 1 relevant event nouns STEP 2 event extraction patterns AutoSlog Candidate RIE Pattern Generator candidate RIE patterns
11
11 Learning to Extract Perpetrators [Phillips & Riloff, RANLP-07] Role-Identifying Patterns: EVENT was perpetrated by was involved in EVENT Role-Identifying Nouns: assailants, attackers, cell, culprits, extremists, hitmen, kidnappers, militiamen, MRTA, narco-terrorists, sniper Event-Specific Patterns: was kidnapped by was killed by
12
12 Decoupling Relevant Region Identification and Extraction …the explosion ripped through the busy neighborhood in New Delhi. A bomb was found under a parked car… Local pattern matching has two drawbacks: — Facts can be missed if they do not occur with the event description. — False hits can be generated from irrelevant contexts. Solution: 1) Identify relevant text regions. 2) Apply general, but semantically appropriate patterns
13
13 IE Pattern Learning with Relevant Regions and Semantic Affinity [Patwardhan & Riloff, EMNLP-07] Relevant Region Classifier IE System Relevant Sentences IE Patterns Extractions Semantic Affinity Pattern Learner Self-training SVM Classifier relevant & irrelevant texts pattern
14
14 Learned Extraction Patterns TargetVictimWeaponPerpOrg destroyed barrels of shattered was damaged blew up murder of assassination of killing of question murdered exploded planted fired was planted explosion of claimed panama from kidnapped by command of wing of PerpIndDiseaseVictim blew up attacked identity of bands of gangs of cases of spread of outbreak of died
15
15 Text Extraction and Data Visualization for Animal Health Surveillance Collaborative project between CERATOPS, PURVAC, and the Veterinary Information Network (VIN), with funding from LLNL. Goal: proof-of-concept of an end-to-end NLP- based visual analytics system for unstructured text. CERATOPS
16
16 Animal Health Surveillance Monitoring animal health is important to DHS’ mission: –73% of emerging infectious diseases are zoonotic in origin. –Pets can provide early warning signs of disease outbreaks and exposures to toxic substances. –Adverse pet reactions can be early indicators of food chain contamination.
17
17
18
18 The Veterinary Information Network VIN is the largest on-line community, information resource, and on-line continuing education source for veterinarians. Over half of all veterinarians in the U.S. use VIN! VIN hosts message boards where veterinarians discuss what they are seeing in their practices. 15 years of message board data has been archived! VIN built a database of semantic information associated with pet health to support search. Paul Pion, DVM, President and co-founder of VIN, and served as our consultant.
19
19 NLP fact fact… fact NLP-based Visual Analytics CERATOPS
20
20 Prototype System for Used the VIN database (248,108 entries) to create 3 new dictionaries for text analysis: –syntactic and semantic lexicon –phrasal lexicon –synonym dictionary Enhanced the template generation process to use new types of semantic information. Converted our IE templates into a format appropriate for Purdue’s visualization system. We produced a prototype IE system to extract and visualize diseases, victims, dates, and locations from ProMed-mail disease outbreak reports.
21
21 ProMed-mail Visualization Output
22
22 NLP-based Visual Analytics for Animal Health Surveillance Rapid identification of new disease outbreaks. Trends or spikes in disease outbreaks. Unusual symptoms or clusters of symptoms. Statistical associations between foods & adverse pet reactions. Improved diagnostic tools to associate symptoms with diseases and external events. Future Goals:
23
23 Semantic Class Learning from the Web [Kozareva, Riloff, & Hovy, ACL-08] Goal: automatically create semantic dictionaries Use a doubly-anchored hyponym pattern: such as and * Construct pattern linkage graphs to capture the popularity and productivity of candidate terms and rank them. Produces very accurate results with truly minimal supervision (class name and one seed) CERATOPS
24
24 Semantic Class Learning Results
25
25 Coreference Resolution Links entities, events, and opinions within and across documents Chain1: Chain2: Chain3: Chain4: U.S. State Dept. President Bush NIH Inspector General
26
26 Build on Prior Work in NP Coreference Resolution Classification –given a description of two noun phrases, NP i and NP j, classify the pair as coreferent or not coreferent Clustering –coordinates pairwise coreference decisions husband King George VI Clustering Algorithm Queen Elizabeth her [Queen Elizabeth], set about transforming [her] [husband], [King George VI], … coref? E.g., Ng & Cardie ACL [2002]
27
27 Partially Supervised Clustering for Source Coreference Resolution Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty. Italian coach Lippi has also been blasted for his comments after the game. In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. He hailed 10- man Italy's determination to beat Australia and said the penalty was rightly given. Labels for non-source NPs are unavailable [Stoyanov & Cardie, EMNLP 2006]
28
28 Partially Supervised Clustering Extend rule-learning algorithm to learn pairwise classification function in the context of single-link clustering. –Exploit complex structure of coreference resolution During rule construction, consider the effect of the rule on the overall clustering of items –Compute transitive closure including the unlabelled pairs –Calculate performance ignoring the unlabelled pairs
29
29 State-of-the-Art Coreference Resolution Cornell, Utah, & LLNL are collaboratively building a state-of-the-art coreference resolver based on the best features identified in prior work. We plan to make the system publicly available. On-going work and future plans include: –systematic evaluations of coreference subproblems –incorporating external knowledge about entities –non-anaphoric NP identification –unsupervised, automatic training –topic coreference for opinion analysis
30
30 THE END
31
31 Overview Text analysis to support a broad range of knowledge discovery tasks Automatic annotators that assign semantic and conceptual labels to words, phrases, and documents Automatically extracting, summarizing and tracking information about events and opinions
32
32 NLP fact fact… fact fact… fact fact… fact fact… fact NLP fact fact… fact Images
33
33 headlines
34
34 Document Text One person was killed when a small bomb exploded at a police station in Basra town in Iraq's politically volatile southern region on Wednesday, residents said. The bomb was the first to hit an urban area since the riots in the southern city on May 16. The terrorist group Al Qaeda claimed responsibility for the attack, and… Event-oriented IE Goal of IE system: extract facts associated with events from unstructured text Event Weapon: a small bomb Location: Basra Victim: One person Perpetrator Org: Al Qaeda Physical Target: police station
35
35 Full
36
36 Information Extraction for Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures Keywords and named entity recognition are not sufficient. Researchers have discovered how anthrax toxin destroys cells and rapidly causes death... Troops were vaccinated against anthrax, cholera, …
37
37 After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms. The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. OUTBREAK Disease: Victims: Location: Country: Status: Containment: Fact Extraction: Example The agency says it is continuing depopulation efforts on infected farms on a priority basis. / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation
38
38 3 chickens died from avian flu. Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada 3 chickens died from avian flu. SUBJ VP PP Syntactic Analysis Semantic Extraction Coreference Resolution Relation and Event Analysis Text
39
39 Extraction of Opinions
40
40 IE Pattern Bootstrapping Process keywords Relevant Region Identifier Pattern Learner Patterns & Statistics Pattern Learner (More) Patterns & Statistics
41
41 Semantic Dictionary Bootstrapping Unannotated Texts Best Extraction Pattern(s) Extractions (Nouns) Ex: anthrax, ebola, cholera, flu, plague Ex: outbreak of Ex: smallpox, tularemia, botulism
42
42 Semantic Learning Case Study Input to Basilisk: 10 common disease names Of the top 200 words hypothesized to be diseases: 89 were already in the UMLS metathesaurus (32,000 names of diseases and organisms), but 111 were not! Including: adenomatosis tularaemia tularamia diarrhoea diphtheriae enterovirus-71 fibropapillomas gastroeneteritis flu kawasaki mad-cow-disease smut pertussis pleuro-pneumonia polioencephalomyelitis poliovirus h5n1 h7n3 ev71 yf jyf nvcjd pepmv wsmv
43
43 Learning Subjective Phrases Using Information Extraction techniques
44
44 Extractions expressed condolences, hope, grief, views, worries indicative of compromise, desire, thinking inject vitality, hatred reaffirmed resolve, position, commitment voiced outrage, support, skepticism, opposition, gratitude, indignation show of support, strength, goodwill, solidarity was sharedanxiety, view, niceties, feeling
45
45 Subjective Expressions as IE Patterns PATTERNFREQP(Subj | Pattern) asked 128 0.63 was asked 11 1.00 was expected 45 0.42 was expected from 5 1.00 talk 28 0.71 talk of 10 0.90 is talk 5 1.00 put 187 0.67 put end 10 0.90 is fact 38 1.00 fact is 12 1.00
46
46 Conclusions Rapidly re-trainable, robust components for 1.Information extraction of facts and entities 2.Extraction of opinions 3.Tracking, linking, and summarizing events and opinions and their progressions over time
47
47 Current Work: Topics Topic coreference resolution –Treat as an NP coreference resolution task –Modify our existing NP coref approach –Initial results look promising
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.