Download presentation
Presentation is loading. Please wait.
Published byRosanna Chambers Modified over 9 years ago
1
Adaptive Subjective Triggers for Opinionated Document Retrieval Kazuhiro Seki Organization of Advanced Science & Technology Kobe University Kuniaki Uehara Graduate School of Engineering, Kobe University 2/10/2009 1
2
Background Increasing user-generated contents (UGC) on the web –often contain personal subjective opinions Can be helpful for personal/corporate decision making → demands to retrieve personal opinions for a given entity Traditional IR aims to find documents relevant to a given topic (entity) –not concerned with subjectivity Aim: Retrieve documents not only pertinent to a given entity but also containing subjective opinions 2
3
An (existing) approach Lexicon-based (Mishne, 2006; Zhang et al., 2008; etc.) –Look for subjective words/phrases “like” conveys favorable feelings –“I like the movie.” –Potential drawback Only words/phrases separate from context do not indicate subjectivity –“It looks like a cat.” –“She likes singing.” 3
4
Another approach considering wider context n-gram language model –estimate word occurrence probabilities based on prior context or history, i.e., (n – 1) words bigram: P(w i |w i–1 ) trigram: P(w i |w i–2,w i–1 ) –Generally, n is set to 2 to 3 4
5
Trigger models (Lau et al., 1993) Incorporate long distance dependency that cannot be handled by n-gram models Trigger pairs –word pairs such that one tends to bring about the occurrence of the other nor → either (syntactic dependency) memory → GB (semantic dependency) Used by linearly interpolating with an n-gram model (1–λ)·P B (w|h) + λ·P T (w|h) 5 trigger model n-gram model
6
Identifying trigger pairs (Tillmann et al. , 1996) 6 corpus n-gram model P(w|h) n-gram model P(w|h) vocabulary potential trigger pairs potential trigger pairs trigger model P T (w|h) trigger model P T (w|h) extended model P E (w|h) extended model P E (w|h) log likelihood difference Δ a→b =∑ i {logP E (w i |h i ) – logP(w i |h i )} each pair a → b evaluation start When P(b|h) < t → low level triggers
7
Building trigger model P T 1.For each identified trigger pair (a→b), compute their association score α(b|a) based on their co-occurrences 2.Define a trigger model P T by using α(·) 7 average association score between words in history h and word w
8
Subjective trigger model Assumptions –Personal subjective opinion consists of two main components Subject of the opinion (e.g, “I”, “you”) or the object the opinion is about (e.g., “The Curious Case of Benjamin Button”) Subjective expression (e.g., “like”, “feel”) Treat them as triggering and triggered words, respectively –Triggering words are expressed as pronouns Empirical finding –Proximity of pronouns and subjective expressions to objects is an effective measure of opinionatedness (Zhou et al., 2007; Yang et al., 2007) 8
9
Identifying “subjective” trigger pairs Pronouns considered –I, my, you, it, its, he, his, she, her, we, our, they, their, this History h: preceding words in the same sentence Corpus: 5000 customer reviews from Amazon.com 9
10
Identifying “subjective” trigger pairs (cont.) Low level trigger (P(w|h) < t) causes the problem –Penalize frequent w with infrequent history h 10
11
reranking documents d documents d query q documents d documents d Opinion retrieval Probability that d is relevant to q AND subjective –product of P INM (q|d) and P E (d)=∏ i P E (w i |h i ) –P E (d) is smaller for longer d –P INM (q|d) and P E (d) may have largely different variances Normalize P E (d) by length m & take weighted sum of logs 11 subj. language model P E (w|h) subj. language model P E (w|h) P INM (q|d) IR by INM
12
Dynamic model adaptation Motivation –Language models created from Amazon reviews may not be effective for some types of entities Procedure 1.Carry out keyword search for a given topic 2.Use k top ranked blog posts to identify new trigger pairs (a→b) and compute α’(·) 3.Update trigger model by using the new trigger pairs 12 association scores for new triggers
13
Empirical evaluation Data –TREC Blog track test collection 2006 3 million blog posts crawled from Dec 2005 to Feb 2006 50 “topics” (user information needs) Relevant & opinionated posts are explicitly labeled Two types of assessment –Evaluation of the language models –Their effects on opinion retrieval 13
14
Evaluation of language models Perplexity –Uncertainty of language model L in predicting word sequence (d = w 1,…,w m ) Created two hypothetical documents from the Blog track collection –concatenate all the opinionated posts → d O –all the relevant (but non-opinionated) posts → d N 14
15
Higher order n-grams monotonically decrease perplexity irrespective of language models and document types Opinionated document d O leads to lower perplexity Subjective language model P E produces lower perplexity than n-gram model P B Perplexity Results 15
16
Relation between parameter β and MAP 16 +22.0%
17
Improvement for individual topics 17
18
Analysis on individual topics Topics with notable improvement –“MacBook Pro”. Laptop (+0.22) –“Heineken”. Company and brand names (+0.20) –“Shimano”. Company and brand names (+0.19) –“Board chess”. Board game (+0.13) –“Zyrtec”. Medication (product name) (+0.12) –“Mardi Gras”. Final day of carnival (+0.11) Most of them are products –Model learned from Amazon reviews is effective for products in general, including beer and medication –Also effective for other types of entities 18
19
Analysis on individual topics (cont.) Topics with performance decline –“Jim Moran”. Congressman (–0.15) –“World Trade Org.”. International organization (–0.05) –“Cindy Sheehan”. Anti-war activist (–0.03) –“Ann Coulter”. Political commentator (–0.01) –“West Wing”. TV drama set in the white house (–0.01) –“Sonic food industry”. Fast-food restaurant chain (–0.01) Politics and organizations are difficult to improve? –Bruce Bartlett (+0.07), Jihad (+0.06), McDonalds (+0.03), Qualcomm (+0.02) 19
20
Results for dynamic model adaptation Moderately improved performance For “Zyrtec”, AP improved by 47.7% 20
21
Results for model adaptation for difficult topics For most topics, AP slightly but consistently improved 21
22
Conclusions Proposed subjective trigger models reflecting subjective opinions –Two assumptions + a modification to low-level triggers Combined with an IR model for opinion retrieval –22.0% improvement over INM in MAP –Effective for most topics, slight drop for topics concerning politics and organizations Dynamic model adaptation –Positive effect overall (+25.0% over initial search) –Moderately effective for politics- and organization- related topics 22
23
Future work Use of a larger corpus of customer reviews Use of labeled data in the blog track test collection Refine the approach to model adaptation 23
24
References Mishne, G.: Multiple Ranking Strategies for Opinion Retrieval in Blogs, Proceesings of the 15th Text Retrieval Conference (2006). Zhang, M. and Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp.411.418 (2008). Lau, R., Rosenfeld, R. and Roukos, S.: Trigger-based language models: a maximum entropy approach, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.45.48 (1993). Tillmann, C. and Ney, H.: Grammatical Interference: Learning Syntax from Sentences, Lecture Notes in Computer Science, chapter Selection criteria for word trigger pairs in language modeling, pp.95.106, Springer Berlin / Heidelberg (1996). Zhou, G., Joshi, H. and Bayrak, C.: Topic Categorization for Relevancy and Opinion Detection, Proceedings of the 16th Text Retrieval Conference (2007). Yang, K., Yu, N. and Zhang, H.: WIDIT in TREC 2007 Blog Track: Combining Lexicon- Based Methods to Detect Opinionated Blogs, Proceedings of the 16th Text Retrieval Conference (2007). Zhang, W., Yu, C. and Meng, W.: Opinion retrieval from blogs, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 831.840 (2007). 24
25
QUESTIONS? 25
26
Comparative experiments 2006 TREC best0.1885 Zhang et al.0.2726 Ours w/ our baseline0.2398 Ours w/ stronger baseline (0.3022) 0.3221 26
27
Comparative experiments 2007 TREC best0.4341 TREC 2nd0.3453 TREC 3rd0.3264 Ours w/ our baseline (0.2508) 0.3072 Ours w/ stronger baseline (0.3784) 0.4054 27
28
Comparative experiments 28 2008. Same baseline TREC best0.4067 TREC 2nd0.4006 TREC 3rd0.3964 Ours w/ stronger baseline (0.3822) 0.3996
29
Comparative experiments (polarity task) 29 2008. Same baseline TREC best (ours)0.1448 TREC 2nd0.1348 TREC 3rd0.1129
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.