CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 wzhang@cs.uic.edu yu@cs.uic.edu meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Department of Computer Science, Binghamton University

CIKM 20072 Overview of the opinion retrieval Topic retrieval Opinion identification Ranking documents by opinion similarity Experimental results CIKM 20072 Outline

CIKM 20073 Overview of the Opinion Retrieval Opinion retrieval Given a query, find documents that have subjective opinions about the query A query “book” Relevant: “This is a very good book.” Irrelevant: “This book has 123 pages.”

CIKM 20074 Overview of the Opinion Retrieval Introduced at TREC 2006 Blog Track 14 groups, 57 submitted runs in TREC 2006 20 groups, 104 runs in TREC 2007 (on going) Key problems Opinion features Query-related opinions Rank the retrieved documents

CIKM 20075 Document set Our Algorithm Retrieved documents Query Opinionative documents Query-related opinionative documents

CIKM 20076 Topic Retrieval Retrieve query-relevant documents No opinion involved Features Phrase recognition Query expansion Two document-query similarities

CIKM 20077 Topic Retrieval – Phrase Recognition Semantic relationship among the words For phrase similarity calculation purpose 4 types Proper noun: “University of Lisbon” Dictionary phrase: “computer science” Simple phrase: “white car” Complex phrase: “small white car”

CIKM 20078 Topic Retrieval – Query Expansion Find the synonyms “wto”  “world trade organization” Same importance Add additional terms “wto”  negotiate, agreements, Tariffs,

CIKM 20079 Topic Retrieval - Similarity Sim(Query, Doc) = Phrase similarity Having or not having a phrase Sim_P = sum ( idf(P_i) ) Term similarity Sum of the Okapi scores of all the query terms Document ranking D1 is ranked higher than D2, if (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

CIKM 200710 Opinion Identification Feature Selection SVM classifier Subjective training data Objective training data From topic retrieval To opinion ranking retrieved documents opinionativ e documents

CIKM 200711 Opinion Identification – Training Data Subjective training data Review web sites Documents having opinionative phrases Objective training data Dictionary entries Documents not having opinionative phrases

CIKM 200712 Opinion Identification – Feature Selection The words expressing opinions Pearson’s Chi-square test Test of the independence between subjectivity label and words via contingency table Count the number of sentences Unigrams and bigrams

CIKM 200713 Opinion Identification – Classifier A support vector machine (SVM) classifier Objective sentencesSubjective sentences Features Training Feature vector representation SVM classifier

CIKM 200714 Opinion Identification – Classifier Apply the SVM classifier SVM classifier Document Sentence 1 … Label 1:objective … Sentence 2 Sentence n Label 2:subjective Label n:objective

CIKM 200715 Opinion Similarity - Query-Related Opinions Find the query-related opinions queryopinionative sentence document text window

CIKM 200716 Opinion Similarity – Similarity 1 Assumption 1 Higher topic relevance  Higher rank OSim_ir = Sim(Query, Doc)

CIKM 200717 Opinion Similarity – Similarity 2 Assumption 2 More query-related opinions  Higher rank OSim_stcc: total number of sentences OSim_stcs: total score of sentences

CIKM 200718 Opinion Similarity – Similarity 3 A linear combination of 1 and 2 a * Osim_ir + (1-a) * OSim_stcc b * Osim_ir + (1-b) * OSim_stcs

CIKM 200719 Opinion Similarity – Experimental Results TREC 2006 Blog Track data 50 queries, 3.2 million Blog documens UIC at TREC 2006 Blog Track Title-only queries: scored the first 28% - 32% higher than best TREC 2006 scores Good things learned More training data Combined similarity function

CIKM 200720 Conclusions Designed and implemented an opinion retrieval system. IR + text classification for opinion retrieval The best known retrieval effectiveness on TREC 2006 blog data Extend to polarity classification: positive/negative/mixed Plan to improve feature selection

CIKM 200721 Questions? wzhang@cs.uic.edu http://www.cs.uic.edu/~wzhang/

CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Similar presentations

Presentation on theme: "CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Similar presentations

Presentation on theme: "CIKM 20071 1 Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of."— Presentation transcript:

Similar presentations

About project

Feedback