Download presentation
Presentation is loading. Please wait.
Published byScott Francis Modified over 9 years ago
1
1 Using Predicate-Argument Structure for Topic- and Event-based Distillation *now affiliated with ICSI Elizabeth Boschee, Michael Levit*, Marjorie Freedman BBN Technologies
2
2 Outline Introduction Approach Proposition Trees –Generation –Augmentation –Scoring –Usage Conclusions and Future Work
3
3 Introduction: Distillation Templates Distillation operates over queries formulated according to fixed “templates”, for example: –Describe the prosecution of [PERSON] for [CRIME] –List facts about [EVENT] –Find statements about [EVENT/TOPIC] made by [PERSON] –Describe reaction of [COUNTRY] to [EVENT/TOPIC] –Etc. To answer templated queries, a system must be able to decide whether text contains a reference to a query argument –For entity-like arguments (e.g. [PERSON]), standard extraction/coreference techniques are effective –Identifying references to topics/events (formulated in natural language) requires a different approach
4
4 Introduction: Sample Distillation Query Distillation query: List facts about [the widespread looting of Iraqi museums after the US invasion]. –On-topic information can be conveyed without significant overlap with the query terms Many works of art were stolen from Baghdad galleries in 2003 –Presence of query terms does not guarantee on-topic information Iraqi museums presented an exhibit on looting in Afghanistan after the US invasion –Some words are more important than others “looting” vs. “widespread” –Some phrases are more important than others “Iraqi museums” vs. “after the US invasion” Approach: Use query predicate-argument structure and extraction information to perform accurate topic identification
5
5 Approach Represent logical structure of query argument and candidate responses using “proposition trees” –Augment proposition trees using additional synonym and extraction information Define similarity metric over proposition trees –Account for structural transformations, relative importance of query terms, paraphrases and re-wordings, and omissions and additions Use similarity metric to –Measure relevance of candidate responses to query –Identify redundancy between two candidate responses This strategy was used in our GALE Year 1 evaluation system and achieved excellent results compared to human participants.
6
6 Proposition Tree Generation Syntactic parser generates parse tree for each sentence Rule-based proposition finder transforms parses into simple predicate-argument propositions –Each proposition consists of a noun/verb predicate and zero or more arguments, each with a “role” label Identifies logical subject and logical object as roles Prepositional roles are not resolved/analyzed (role label remains “of”, “in”, “at”, etc.) –This process includes trace resolution A set of proposition trees is created for each sentence –Nodes are either predicates or arguments –Branches are “roles” (e.g. “subject” or “of”) Example: “his arrest in Baghdad” arrest his possessive Baghdad in
7
7 Proposition Tree Augmentation Automatically supplement document proptrees with names or descriptors obtained through in-document coreference Example: “his arrest in Baghdad” arrest his possessive Baghdad in arrest his (Abu Abbas, Abbas, the PLF leader) possessive Baghdad (the Iraqi capital) in
8
8 Proposition Tree Augmentation Automatically supplement all proptrees with synonyms –WordNet –Nominalization table –BBN “equivalent name” algorithm Misspellings Transliteration variants Aliases Acronyms arrest (capture, apprehend, apprehension, detain) his (Abu Abbas, Abbas, the PLF leader, Mohammed Abbas, Abul Abbas) possessive Baghdad (the Iraqi capital, Baghhdad, Bagadad, Baghdag, Bagdad) in arrest his (Abu Abbas, Abbas, the PLF leader) possessive Baghdad (the Iraqi capital) in
9
9 Two proptrees are similar if one can be transformed into the other at minimal cost –“his arrest” “Abbas was captured” –Substitution (synonym): arrest captured –Substitution (label): possessive object –Substitution (coreference): his Abbas Match score for these two trees: ~80% –Despite zero percent token overlap Proposition Tree Scoring arrest his possessive captured Abbas object
10
10 Scoring: Cost Structure Different tree transformations have different costs The cost for replacing a word with its synonym is based on the estimated reliability of the synonym –“United Nations” “UN” is very reliable –“plant” “works” is less reliable Certain role substitutions are more costly than others –Changing the role from “in” to “premodifier” is cheap “the plant in Cernavoda” “ the Cernavoda plant” –Changing the role from “in” to “by” is expensive “the attack in Iraq” “the attack by Iraq” The cost for omitting a word/phrase increases the closer the word/phrase is to the root of the proptree –Because it causes a larger subtree to be omitted –In “the shutdown of the Cernavoda nuclear plant”, “nuclear” can be omitted more easily than “plant”
11
11 Scoring: Additions/Omissions Matching of proptrees is actually non-symmetric –Additions are free; omissions are not “The shutdown of the Cernavoda nuclear plant by the authorities” is a perfect match to the query argument “The shutdown of the plant” is not –When comparing query topic to candidate response, use only one direction of similarity –When comparing two candidate responses for redundancy, look at both Names can only be omitted if they appear somewhere else nearby in the document (i.e. are still in focus) –Eliminates matches to “the shutdown of the nuclear plant” when the document is about Chernobyl rather than Cernavoda
12
12 Scoring: Examples QUERY: “The arrest of Abu Abbas in Iraq” –The US arrest of Palestinian hardline leader Abu Abbas in Baghdad (0.872619) –The capture of Abu Abbas in Iraq (0.8) –Abbas' capture in Iraq by U.S. military forces (0.739286) –the exile Palestinian radical leader who was arrested near Baghdad in Iraq (0.733333) –Abu Abbas was arrested by US troops near the Iraqi capital of Baghdad. (0.686905) –US troops on Tuesday captured Abu Abbas (0.615476)
13
13 Scoring Variations: Subtree Subtree similarity: –To measure “subtree” similarity of proptree A to proptree B, break proptree A into a set of weighted subtrees Subtree weight based on size and position within original tree –Score each subtree with respect to proptree B; calculate weighted sum of subtree scores –Gives relatively high score to “The Cernavoda plant was quiet after the shutdown of the plant and all its operations” shutdown plant of Cernavoda premod nuclear shutdown plant of plant nuclear premod plant Cernavoda premod
14
14 Scoring Variations: Node Node similarity: –To measure “node” similarity of proptree A to proptree B, break proptree A into a set of weighted nodes Ignore tree structure and role labels shutdown plant of Cernavoda premod nuclear shutdown plant nuclear Cernavoda –Score each node with respect to proptree B; calculate weighted sum of node scores Still uses cost structure for synonyms, coreference, etc. –Gives high score to “In Cernavoda, the plant was quiet after the shutdown of nuclear operations”
15
15 Proposition Trees in Answer Finding In Year 1, proposition tree matching was used primarily in answer selection patterns Pattern might specify that the candidate answer must be have at least 80% similarity with the query event/topic argument –Desired similarity can be specified using any or all of the three variations of proposition tree similarity (full, subtree, and node) For instance, a pattern might specify that either minimum 60% full tree match or minimum 90% node match is acceptable –Similarity score can also optionally consider local context Similarity score for a sentence becomes a smoothed combination of the raw scores of nearby sentences
16
16 Proposition Trees in Redundancy Detection Proptree matching also used in redundancy detection –E.g. “the plant was closed” is found to be redundant with “the shutdown of the plant” Expected to be very useful for redundancy in Year 2 –Redundancy between specific nuggets of information must be identified and removed “The plant was closed in August” and “the shutdown of the plant due to drought” will be partially redundant Pinpoint identification of redundancy will allow for combination into: “The plant was closed in August due to drought”
17
17 Conclusions and Future Work Proposition trees provide an effective way to –Identify appropriate response nuggets in text –Identify and remove redundancy among responses Strategy used successfully in Year 1 GNG evaluation Future work: –Better proposition tree augmentation from OntoNotes Coreference for all noun phrases (not just ACE entities) Word sense disambiguation (not just blind use of WordNet) –Expand to Chinese (first) and Arabic (second) –Improved weighting on nodes and branches –Investigation of proposition tree algorithms for topic-based document retrieval
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.