Identifying Opinion Holders for Question Answering in Opinion Texts Soo-Min Kim and Eduard Hovy Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey, CA {skim, Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2007/08/16 AAAI
Introduction 1/2 Question answering in opinion texts “ Who strongly believes in Y ” A system to recognize the holder of opinion Y Application Stock market predictors Earlier work (Kim and Hovy,2004) Focus on identifying opinion expressions within text 現在進一步要找出 opinion holder Example 小叮噹認為銅鑼燒很好吃 Opinion holder :小叮噹 Opinion expression :認為 Opinion :銅鑼燒很好吃
Introduction 2/2 Define the opinion holder as an entity who expresses explicitly or implicitly the opinion contained in a sentence Entity =(person, country, organization, or special group of people) 一個 opinion expression 對應一個 holder “ A think B ’ s criticism of T is wrong ” B is the holder of “ the criticism of T ” A is the person who has an opinion that B ’ s criticism is wrong
辨別 opinion holder 的困難點 1. The opinion sentence contains more than one likely holder entity “ Russia ’ s defense minister said Sunday that his country disagrees with the U.S. view of Iraq, Iran and North Korea as an ‘ axis of evil ’”. The candidate holders : “ Russia ”, “ Russia ’ s defense minister ”, “ U.S. ”, “ Iraq ”, “ Iran ”, “ North Korea ” 2. There is more than one opinion in a sentence “ In relation to Bush ’ s axis of evil remarks, the German Foreign Minister also said, Allies are not satellites, and the French Foreign Minister caustically criticized that the United States ’ unilateral, simplistic worldview poses a new threat to the world ”.
本文提的解法 Automatic method for identifying opinion holders (OH) 1. Identify all possible opinion holder entities in a sentence 使用現有工具找出句子中的 Name entities 和 Noun phrases 2. Apply the Maximum Entropy (ME) ranking algorithm to select the most probable entity
System architecture
Holder candidate set Named entities (NE) Using BBN ’ s named entity tagger IdentiFinder Noun phrases (NP) Using Charniak ’ s parser For example
Maximum Entropy ranking algorithm A machine learning approach Maximum Entropy modeling Classification Select many candidates as answers as long as they are marked as true and does not select any candidate if every one is marked as false Poor performance Ranking Select the most probable candidate as an answer To maximize a given conditional probability distribution
Training data MPQA corpus (Wiebe et al., 2003) 535 documents (10657 sentences) 以下是標記者的標記例子: 只選意見強度 (Strength) 為 high or extreme 的句子 Opinion Holder
Training 流程
Feature selection for ME 1. Full parsing features (f2,f3,f4,f6) 2. Partial parsing features (f7,f8,f9) 3. Others (f1,f5)
Full parsing features 1/5 Using charniak ’ s parser For example: China ’ s official Xinhua news agency Form MPQA accusing From Earlier work (Kim and Hovy,2004)
Full parsing features 2/5
Full parsing features 3/5 To express tree structure for ME training “ NP S VP S S VP VBG ” Data sparseness problem
Full parsing features 4/5 Solution: 分成三條 path(f2,f3,f4) For example “ NP H S HE VP E S E S E VP E VBG E ”
Full parsing features 5/5 f6: The top two levels below a child node of HEhead on the path toward Hhead For example P1 = “ NPH PPH NPH ” P2 = “ NPH NPH PPH VPH NPH PPH NPH ” P1 and P2 as the same because they share “ PPH NPH ” at the top
Partial parsing features Using CASS parser f7 : (vgp … ) f8 : (c … ) f9 : Yes or No
Other features Non-structural features f1 : Type of The type of the candidate, with values NP, PERSON, ORGANIZATION, and LOCATION This feature enables ME to determine the most probable one among them automatically f5 : The distance between and, counted in parse tree words
Answer selection for evaluation 1/2 Strict selection For example 標準答案:小叮噹 System :小叮噹 Lenient selection For example 標準答案: “ Michel Sidibe, Director of the Country and Regional Support Department of UNAIDS ” System : “ Michel Sidibe ” Accept candidates with priority 1 & 2 & 3
Answer selection for evaluation 2/2 Threshold 1 = 0.5 Allow a candidate as an answer in case half of the words in a holder appear in the candidate as well Threshold 2 = 4 The average number words in human annotated holders is 3.71
Experiments 1/3 961 pairs of (, ) 863 for training 98 for testing Baseline The system choose the closest candidate to the expression as a holder without ME decision
Experiments 2/3
Experiments 3/3
Conclusions The importance of opinion holder identification was noticed yet it has not been much studied to date, partly because of the lack of annotated data. Using Maximum Entropy ranking to select the most probable holder among multiple candidates. Adopting parsing features significantly improved system performance.