Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CERATOPS Center for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U.

Similar presentations


Presentation on theme: "1 CERATOPS Center for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U."— Presentation transcript:

1 1 CERATOPS Center for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah

2 2 Overview Rapidly re-trainable, robust components for: 1.Information extraction of facts and entities related to events from text 2.Extraction of opinions and motivations expressed in text 3.Tracking, linking, and summarizing events and opinions and their progressions over time

3 3 Motivation for Event IE Systems Rapid semantic processing of large volumes of unstructured text Automatic merging of facts and entity relationships across sets of documents Automatic population of large databases with factual information from many text sources

4 4 OUTBREAK Disease: Victims: Location: Country: Status: Containment: Information Extraction from Text / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. The agency says it is continuing depopulation efforts on infected farms on a priority basis. After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms.

5 5 Information Extraction of Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures Keywords and named entity recognition are not sufficient. Researchers have discovered how anthrax toxin destroys cells and rapidly causes death... Troops were vaccinated against anthrax, cholera, …

6 6 3 chickens died from avian flu. Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada 3 chickens died from avian flu. SUBJ VP PP Syntactic Analysis Extraction Coreference Resolution Template Generation

7 7 New Approach: Role-Identifying Nouns Lexically role-identifying nouns are defined by the role that the noun plays in an event. Semantically role-identifying nouns strongly evoke one event role in a domain based on semantics. (Intuition from Grice’s Maxim of Relevance) Disease Reports: toddler, girl, boy victim Crime Reports: restaurant, store, hotel location kidnapper, arsonist, assassinagent (perpetrator) casualty, fatality, victim theme (victim)

8 8 Bootstrapped Learning of Role-Identifying Nouns Unannotated Texts Best Extraction Patterns Best Extractions (Nouns) Ex: assassin, arsonist, kidnapper Ex: was arrested killed by Ex: murderer, sniper, criminal

9 9 Role-Identifying Expressions Typically, a verb refers to an event and the verb’s arguments identify the role players: But sometimes, a verb identifies a role player in an event without identifying the event! participated was implicated perpetrator victimlocation was kidnapped by in

10 10 Bootrapped Learning of Role-Identifying Expressions Basilisk event nouns STEP 1 relevant event nouns STEP 2 event extraction patterns AutoSlog Candidate RIE Pattern Generator candidate RIE patterns

11 11 Learning to Extract Perpetrators [Phillips & Riloff, RANLP-07] Role-Identifying Patterns: EVENT was perpetrated by was involved in EVENT Role-Identifying Nouns: assailants, attackers, cell, culprits, extremists, hitmen, kidnappers, militiamen, MRTA, narco-terrorists, sniper Event-Specific Patterns: was kidnapped by was killed by

12 12 Decoupling Relevant Region Identification and Extraction …the explosion ripped through the busy neighborhood in New Delhi. A bomb was found under a parked car… Local pattern matching has two drawbacks: — Facts can be missed if they do not occur with the event description. — False hits can be generated from irrelevant contexts. Solution: 1) Identify relevant text regions. 2) Apply general, but semantically appropriate patterns

13 13 IE Pattern Learning with Relevant Regions and Semantic Affinity [Patwardhan & Riloff, EMNLP-07] Relevant Region Classifier IE System Relevant Sentences IE Patterns Extractions Semantic Affinity Pattern Learner Self-training SVM Classifier relevant & irrelevant texts pattern

14 14 Learned Extraction Patterns TargetVictimWeaponPerpOrg destroyed barrels of shattered was damaged blew up murder of assassination of killing of question murdered exploded planted fired was planted explosion of claimed panama from kidnapped by command of wing of PerpIndDiseaseVictim blew up attacked identity of bands of gangs of cases of spread of outbreak of died

15 15 Text Extraction and Data Visualization for Animal Health Surveillance Collaborative project between CERATOPS, PURVAC, and the Veterinary Information Network (VIN), with funding from LLNL. Goal: proof-of-concept of an end-to-end NLP- based visual analytics system for unstructured text. CERATOPS

16 16 Animal Health Surveillance Monitoring animal health is important to DHS’ mission: –73% of emerging infectious diseases are zoonotic in origin. –Pets can provide early warning signs of disease outbreaks and exposures to toxic substances. –Adverse pet reactions can be early indicators of food chain contamination.

17 17

18 18 The Veterinary Information Network VIN is the largest on-line community, information resource, and on-line continuing education source for veterinarians. Over half of all veterinarians in the U.S. use VIN! VIN hosts message boards where veterinarians discuss what they are seeing in their practices. 15 years of message board data has been archived! VIN built a database of semantic information associated with pet health to support search. Paul Pion, DVM, President and co-founder of VIN, and served as our consultant.

19 19 NLP fact fact… fact NLP-based Visual Analytics CERATOPS

20 20 Prototype System for Used the VIN database (248,108 entries) to create 3 new dictionaries for text analysis: –syntactic and semantic lexicon –phrasal lexicon –synonym dictionary Enhanced the template generation process to use new types of semantic information. Converted our IE templates into a format appropriate for Purdue’s visualization system. We produced a prototype IE system to extract and visualize diseases, victims, dates, and locations from ProMed-mail disease outbreak reports.

21 21 ProMed-mail Visualization Output

22 22 NLP-based Visual Analytics for Animal Health Surveillance Rapid identification of new disease outbreaks. Trends or spikes in disease outbreaks. Unusual symptoms or clusters of symptoms. Statistical associations between foods & adverse pet reactions. Improved diagnostic tools to associate symptoms with diseases and external events. Future Goals:

23 23 Semantic Class Learning from the Web [Kozareva, Riloff, & Hovy, ACL-08] Goal: automatically create semantic dictionaries Use a doubly-anchored hyponym pattern: such as and * Construct pattern linkage graphs to capture the popularity and productivity of candidate terms and rank them. Produces very accurate results with truly minimal supervision (class name and one seed) CERATOPS

24 24 Semantic Class Learning Results

25 25 Coreference Resolution Links entities, events, and opinions within and across documents Chain1: Chain2: Chain3: Chain4: U.S. State Dept. President Bush NIH Inspector General

26 26 Build on Prior Work in NP Coreference Resolution Classification –given a description of two noun phrases, NP i and NP j, classify the pair as coreferent or not coreferent Clustering –coordinates pairwise coreference decisions husband King George VI Clustering Algorithm Queen Elizabeth her [Queen Elizabeth], set about transforming [her] [husband], [King George VI], … coref? E.g., Ng & Cardie ACL [2002]

27 27 Partially Supervised Clustering for Source Coreference Resolution Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty. Italian coach Lippi has also been blasted for his comments after the game. In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. He hailed 10- man Italy's determination to beat Australia and said the penalty was rightly given. Labels for non-source NPs are unavailable [Stoyanov & Cardie, EMNLP 2006]

28 28 Partially Supervised Clustering Extend rule-learning algorithm to learn pairwise classification function in the context of single-link clustering. –Exploit complex structure of coreference resolution During rule construction, consider the effect of the rule on the overall clustering of items –Compute transitive closure including the unlabelled pairs –Calculate performance ignoring the unlabelled pairs

29 29 State-of-the-Art Coreference Resolution Cornell, Utah, & LLNL are collaboratively building a state-of-the-art coreference resolver based on the best features identified in prior work. We plan to make the system publicly available. On-going work and future plans include: –systematic evaluations of coreference subproblems –incorporating external knowledge about entities –non-anaphoric NP identification –unsupervised, automatic training –topic coreference for opinion analysis

30 30 THE END

31 31 Overview Text analysis to support a broad range of knowledge discovery tasks Automatic annotators that assign semantic and conceptual labels to words, phrases, and documents Automatically extracting, summarizing and tracking information about events and opinions

32 32 NLP fact fact… fact fact… fact fact… fact fact… fact NLP fact fact… fact Images

33 33 headlines

34 34 Document Text One person was killed when a small bomb exploded at a police station in Basra town in Iraq's politically volatile southern region on Wednesday, residents said. The bomb was the first to hit an urban area since the riots in the southern city on May 16. The terrorist group Al Qaeda claimed responsibility for the attack, and… Event-oriented IE Goal of IE system: extract facts associated with events from unstructured text Event Weapon: a small bomb Location: Basra Victim: One person Perpetrator Org: Al Qaeda Physical Target: police station

35 35 Full

36 36 Information Extraction for Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures Keywords and named entity recognition are not sufficient. Researchers have discovered how anthrax toxin destroys cells and rapidly causes death... Troops were vaccinated against anthrax, cholera, …

37 37 After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms. The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. OUTBREAK Disease: Victims: Location: Country: Status: Containment: Fact Extraction: Example The agency says it is continuing depopulation efforts on infected farms on a priority basis. / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation

38 38 3 chickens died from avian flu. Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada 3 chickens died from avian flu. SUBJ VP PP Syntactic Analysis Semantic Extraction Coreference Resolution Relation and Event Analysis Text

39 39 Extraction of Opinions

40 40 IE Pattern Bootstrapping Process keywords Relevant Region Identifier Pattern Learner Patterns & Statistics Pattern Learner (More) Patterns & Statistics

41 41 Semantic Dictionary Bootstrapping Unannotated Texts Best Extraction Pattern(s) Extractions (Nouns) Ex: anthrax, ebola, cholera, flu, plague Ex: outbreak of Ex: smallpox, tularemia, botulism

42 42 Semantic Learning Case Study Input to Basilisk: 10 common disease names Of the top 200 words hypothesized to be diseases: 89 were already in the UMLS metathesaurus (32,000 names of diseases and organisms), but 111 were not! Including: adenomatosis tularaemia tularamia diarrhoea diphtheriae enterovirus-71 fibropapillomas gastroeneteritis flu kawasaki mad-cow-disease smut pertussis pleuro-pneumonia polioencephalomyelitis poliovirus h5n1 h7n3 ev71 yf jyf nvcjd pepmv wsmv

43 43 Learning Subjective Phrases Using Information Extraction techniques

44 44 Extractions expressed condolences, hope, grief, views, worries indicative of compromise, desire, thinking inject vitality, hatred reaffirmed resolve, position, commitment voiced outrage, support, skepticism, opposition, gratitude, indignation show of support, strength, goodwill, solidarity was sharedanxiety, view, niceties, feeling

45 45 Subjective Expressions as IE Patterns PATTERNFREQP(Subj | Pattern) asked 128 0.63 was asked 11 1.00 was expected 45 0.42 was expected from 5 1.00 talk 28 0.71 talk of 10 0.90 is talk 5 1.00 put 187 0.67 put end 10 0.90 is fact 38 1.00 fact is 12 1.00

46 46 Conclusions Rapidly re-trainable, robust components for 1.Information extraction of facts and entities 2.Extraction of opinions 3.Tracking, linking, and summarizing events and opinions and their progressions over time

47 47 Current Work: Topics Topic coreference resolution –Treat as an NP coreference resolution task –Modify our existing NP coref approach –Initial results look promising


Download ppt "1 CERATOPS Center for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U."

Similar presentations


Ads by Google