Download presentation
Presentation is loading. Please wait.
Published byArabella Park Modified over 9 years ago
1
CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 wzhang@cs.uic.edu yu@cs.uic.edu meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Department of Computer Science, Binghamton University
2
CIKM 20072 Overview of the opinion retrieval Topic retrieval Opinion identification Ranking documents by opinion similarity Experimental results CIKM 20072 Outline
3
CIKM 20073 Overview of the Opinion Retrieval Opinion retrieval Given a query, find documents that have subjective opinions about the query A query “book” Relevant: “This is a very good book.” Irrelevant: “This book has 123 pages.”
4
CIKM 20074 Overview of the Opinion Retrieval Introduced at TREC 2006 Blog Track 14 groups, 57 submitted runs in TREC 2006 20 groups, 104 runs in TREC 2007 (on going) Key problems Opinion features Query-related opinions Rank the retrieved documents
5
CIKM 20075 Document set Our Algorithm Retrieved documents Query Opinionative documents Query-related opinionative documents
6
CIKM 20076 Topic Retrieval Retrieve query-relevant documents No opinion involved Features Phrase recognition Query expansion Two document-query similarities
7
CIKM 20077 Topic Retrieval – Phrase Recognition Semantic relationship among the words For phrase similarity calculation purpose 4 types Proper noun: “University of Lisbon” Dictionary phrase: “computer science” Simple phrase: “white car” Complex phrase: “small white car”
8
CIKM 20078 Topic Retrieval – Query Expansion Find the synonyms “wto” “world trade organization” Same importance Add additional terms “wto” negotiate, agreements, Tariffs,
9
CIKM 20079 Topic Retrieval - Similarity Sim(Query, Doc) = Phrase similarity Having or not having a phrase Sim_P = sum ( idf(P_i) ) Term similarity Sum of the Okapi scores of all the query terms Document ranking D1 is ranked higher than D2, if (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)
10
CIKM 200710 Opinion Identification Feature Selection SVM classifier Subjective training data Objective training data From topic retrieval To opinion ranking retrieved documents opinionativ e documents
11
CIKM 200711 Opinion Identification – Training Data Subjective training data Review web sites Documents having opinionative phrases Objective training data Dictionary entries Documents not having opinionative phrases
12
CIKM 200712 Opinion Identification – Feature Selection The words expressing opinions Pearson’s Chi-square test Test of the independence between subjectivity label and words via contingency table Count the number of sentences Unigrams and bigrams
13
CIKM 200713 Opinion Identification – Classifier A support vector machine (SVM) classifier Objective sentencesSubjective sentences Features Training Feature vector representation SVM classifier
14
CIKM 200714 Opinion Identification – Classifier Apply the SVM classifier SVM classifier Document Sentence 1 … Label 1:objective … Sentence 2 Sentence n Label 2:subjective Label n:objective
15
CIKM 200715 Opinion Similarity - Query-Related Opinions Find the query-related opinions queryopinionative sentence document text window
16
CIKM 200716 Opinion Similarity – Similarity 1 Assumption 1 Higher topic relevance Higher rank OSim_ir = Sim(Query, Doc)
17
CIKM 200717 Opinion Similarity – Similarity 2 Assumption 2 More query-related opinions Higher rank OSim_stcc: total number of sentences OSim_stcs: total score of sentences
18
CIKM 200718 Opinion Similarity – Similarity 3 A linear combination of 1 and 2 a * Osim_ir + (1-a) * OSim_stcc b * Osim_ir + (1-b) * OSim_stcs
19
CIKM 200719 Opinion Similarity – Experimental Results TREC 2006 Blog Track data 50 queries, 3.2 million Blog documens UIC at TREC 2006 Blog Track Title-only queries: scored the first 28% - 32% higher than best TREC 2006 scores Good things learned More training data Combined similarity function
20
CIKM 200720 Conclusions Designed and implemented an opinion retrieval system. IR + text classification for opinion retrieval The best known retrieval effectiveness on TREC 2006 blog data Extend to polarity classification: positive/negative/mixed Plan to improve feature selection
21
CIKM 200721 Questions? wzhang@cs.uic.edu http://www.cs.uic.edu/~wzhang/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.