Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization.

Slides:



Advertisements
Similar presentations
Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Advertisements

Chapter 5: Introduction to Information Retrieval
Farag Saad i-KNOW 2014 Graz- Austria,
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
Product Feature Discovery and Ranking for Sentiment Analysis from Online Reviews. __________________________________________________________________________________________________.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Aki Hecht Seminar in Databases (236826) January 2009
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
A Holistic Lexicon-Based Approach to Opinion Mining
Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis Class Presentation By: Arunava Bhattacharya.
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Chapter 6: Information Retrieval and Web Search
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Extracting Keyphrases from Books using Language Modeling Approaches Rohini U AOL India R&D, Bangalore India Bangalore
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
Information Retrieval
Opinion Observer: Analyzing and Comparing Opinions on the Web
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Automatic Labeling of Multinomial Topic Models
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Opinion Observer: Analyzing and Comparing Opinions on the Web WWW 2005, May 10-14, 2005, Chiba, Japan. Bing Liu, Minqing Hu, Junsheng Cheng.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Research Progress Kieu Que Anh School of Knowledge, JAIST.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Queensland University of Technology
Erasmus University Rotterdam
Memory Standardization
Aspect-based sentiment analysis
Data Warehousing and Data Mining
iSRD Spam Review Detection with Imbalanced Data Distributions
Presentation transcript:

Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Product Review Summarization from a Deeper Perspective Ly Duy Khang Supervisor: A/P KAN Min Yen Ly Duy Khang CS4101 B.COMP. DISSERTATION 1

1. Introduction Motivation Related work Problem statement & Our approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Preliminary Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION 2 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

1. Introduction Motivation Related work Problem statement & Our approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Preliminary Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 3

Ly Duy Khang CS4101 B.COMP. DISSERTATION Product review A media commonly provided by online merchants for customers to review and express opinions on the products that they have purchased. Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 4 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Product review is an important source of information: 1.More and more people are shopping online, as a result of the expansion of e-commerce. 2.Enables customers to find opinions about products easily, as well as to share them with their peers. 3.Allows producers to get certain degree of feedback. 1.More and more people are shopping online, as a result of the expansion of e-commerce. 2.Enables customers to find opinions about products easily, as well as to share them with their peers. 3.Allows producers to get certain degree of feedback. Ly Duy Khang CS4101 B.COMP. DISSERTATION Problems 1.The number of reviews is often too large, and is still growing rapidly. 2.It is difficult to locate and capture opinions effectively. 1.The number of reviews is often too large, and is still growing rapidly. 2.It is difficult to locate and capture opinions effectively. Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 5 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Product review summarization system 1.Automatically process a large collection of reviews. 2.Identify topics and opinions in the review. 3.Aggregate all information and present a concise summary to the user. 1.Automatically process a large collection of reviews. 2.Identify topics and opinions in the review. 3.Aggregate all information and present a concise summary to the user. Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 6 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Summarization The task of extracting and presenting the most important information from the inputs. News headline Program agenda Scientific paper abstract … The task of extracting and presenting the most important information from the inputs. News headline Program agenda Scientific paper abstract … Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 7 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Review Summarization Focus on opinions (techniques from Sentiment Analysis): Thumbs-up/Thumbs-down indication: [Turney02] Facet-based summary: [Hu04a],[Hu04b],[Popescu05] Comparative summary: [Hu05] Focus on opinions (techniques from Sentiment Analysis): Thumbs-up/Thumbs-down indication: [Turney02] Facet-based summary: [Hu04a],[Hu04b],[Popescu05] Comparative summary: [Hu05] Ly Duy Khang CS4101 B.COMP. DISSERTATION 8 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Product Facet examples: 1.Camera: “battery life”, “lens”, “flash”, “resolution”, etc. 2.Music player: “sound”, “weight”, “size”, “storage”, etc. 1.Camera: “battery life”, “lens”, “flash”, “resolution”, etc. 2.Music player: “sound”, “weight”, “size”, “storage”, etc. Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION 9 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Google Product Bing Shopping

Problem statement Produce a facet-based summary of product review that captures Opinions of users. Evidences that support those opinions. Produce a facet-based summary of product review that captures Opinions of users. Evidences that support those opinions. Ly Duy Khang CS4101 B.COMP. DISSERTATION 10 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION 11 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Approach and Contribution Two main components: 1.Product Facet Identification Re-implement the baseline from [Hu04a] Contribute a new effective heuristic to improve the accuracy 2.Subtopic Summarization Initiate a sentence clustering solution Make necessary modification to sentence semantic similarity measurement (adopted from [Li06] and [Kong07]) Two main components: 1.Product Facet Identification Re-implement the baseline from [Hu04a] Contribute a new effective heuristic to improve the accuracy 2.Subtopic Summarization Initiate a sentence clustering solution Make necessary modification to sentence semantic similarity measurement (adopted from [Li06] and [Kong07]) Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 12

Why do we want to automate this task? 1.It is hard or even impossible to obtain a complete list of facets. e.g., iPhone’s alarm function 2.Different set of words used by users and manufacturers/sellers to describe the same facet. e.g., Price vs. Value; Body vs. Case 3.The manufacturer may not want to include those weak facets of their product. e.g., iPhone is unable to play Flash on the Web 1.It is hard or even impossible to obtain a complete list of facets. e.g., iPhone’s alarm function 2.Different set of words used by users and manufacturers/sellers to describe the same facet. e.g., Price vs. Value; Body vs. Case 3.The manufacturer may not want to include those weak facets of their product. e.g., iPhone is unable to play Flash on the Web Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 13 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Explicit/Implicit product facet Product facets can be expressed explicitly or implicitly. 1.The pictures of this camera are very clear. 2.The camera fits nicely into my palm. Product facets can be expressed explicitly or implicitly. 1.The pictures of this camera are very clear. 2.The camera fits nicely into my palm. We only consider explicit facet – appears as noun/noun phrase in the sentence. We only consider explicit facet – appears as noun/noun phrase in the sentence. 14 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Architecture Overview Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 15 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Preprocessing 1.Process each input sentence with a Part-of-Speech (POS) Tagger to obtain the POS label for each word. 2.Remove stop words from the result. 3.Stem each word to obtain its root form 4.Only noun/noun phrases are fed to the next module. 1.Process each input sentence with a Part-of-Speech (POS) Tagger to obtain the POS label for each word. 2.Remove stop words from the result. 3.Stem each word to obtain its root form 4.Only noun/noun phrases are fed to the next module. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 16 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Frequent Mining Identify all frequent noun/noun phrases that satisfy the minimum support, which is defined as the minimum number of sentences containing that noun/noun phrases. Identify all frequent noun/noun phrases that satisfy the minimum support, which is defined as the minimum number of sentences containing that noun/noun phrases. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 17 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Post Processing (1/2) 1.Usefulness pruning: Remove single-word facet that is likely to be meaningless. e.g. life  battery life 2.Compactness pruning: Remove facet phrase that is not compact. e.g. sample photo  photo 1.Usefulness pruning: Remove single-word facet that is likely to be meaningless. e.g. life  battery life 2.Compactness pruning: Remove facet phrase that is not compact. e.g. sample photo  photo Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 18 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Post Processing (2/2) 3.Infrequent facet discovery: help discover genuine facets that are not mentioned a lot. Gather opinion words that modify frequent facets. For each sentence that does not contain frequent facet but one or more opinion words, include the nearest noun/noun phrase as facet. 3.Infrequent facet discovery: help discover genuine facets that are not mentioned a lot. Gather opinion words that modify frequent facets. For each sentence that does not contain frequent facet but one or more opinion words, include the nearest noun/noun phrase as facet. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 19 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION d/ Sentence Extraction Sentences that contain any of the product facets that we have discovered are labeled with that corresponding facet. Only opinionated sentences are sent down to the next component. Sentences that contain any of the product facets that we have discovered are labeled with that corresponding facet. Only opinionated sentences are sent down to the next component. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 20 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Experimental Data From the same dataset as in [Hu04a]: 1 Digital Camera (45 reviews) 1 DVD Player (99 reviews) 1 Cell phone (41 reviews) From the same dataset as in [Hu04a]: 1 Digital Camera (45 reviews) 1 DVD Player (99 reviews) 1 Cell phone (41 reviews) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 21 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Evaluation Measure Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 22 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Experimental Result (Baseline) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Baseline RecallPrecisionF Camera Phone DVD Avg Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Improvement - Syntactic Role (1/2) Many noisy results such as: “light”, “hand”, “time”, “month”, “hour”, etc. Filtered by considering the word’ syntactic role in the sentence. Many noisy results such as: “light”, “hand”, “time”, “month”, “hour”, etc. Filtered by considering the word’ syntactic role in the sentence. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 24 Improvement - Syntactic Role (2/2) During the preprocessing step, we do not pass down to the next module those noun/noun phrases that do not appear as subject/object in the sentence. During the preprocessing step, we do not pass down to the next module those noun/noun phrases that do not appear as subject/object in the sentence. Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Experimental Result (Baseline with Syntactic Role) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement RecallPrecisionF-measure BaselineImproveBaselineImproveBaselineImprove Camera Phone DVD Avg % % % 25 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 26

Overview Methodology Evaluation Overview Methodology Evaluation Ly Duy Khang CS4101 B.COMP. DISSERTATION 27 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION How often does subtopic exist? Overview Methodology Evaluation Overview Methodology Evaluation CameraSubtopics Memory3 LCD6 Lens7 …… Average5.125 PhoneSubtopics Radio3 Headset4 Signal3 …… Average3.5 DVDSubtopics Price1 Remote4 Format1 …… Average2 28 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Overview Methodology Evaluation Overview Methodology Evaluation Ly Duy Khang CS4101 B.COMP. DISSERTATION 29 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Architecture Overview Overview Methodology Evaluation Overview Methodology Evaluation 30 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Preprocessing 1.General Entity pruning Product class name: “camera”, “DVD”, “phone”, etc. Brand name: “Nikon”, “Canon”, “iPod”, “Kingston”, etc. 2.Similarity pruning ([Kong07]) “picture” vs. “image”, “photo” “display” vs. “monitor” “Megapixel” vs. “Resolution” 1.General Entity pruning Product class name: “camera”, “DVD”, “phone”, etc. Brand name: “Nikon”, “Canon”, “iPod”, “Kingston”, etc. 2.Similarity pruning ([Kong07]) “picture” vs. “image”, “photo” “display” vs. “monitor” “Megapixel” vs. “Resolution” Overview Methodology Evaluation Overview Methodology Evaluation 31 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Sentence representation & Semantic similarity measurement (1/2) Adopted from the work by [Li 06], a scalable vector formulation is used to represent sentence, followed by cosine distance between two vectors for sentence semantic similarity measurement Adopted from the work by [Li 06], a scalable vector formulation is used to represent sentence, followed by cosine distance between two vectors for sentence semantic similarity measurement Overview Methodology Evaluation Overview Methodology Evaluation 32 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Sentence representation & Semantic similarity measurement (2/2) Overview Methodology Evaluation Overview Methodology Evaluation S1 = The battery of my camera is very impressive. S2 = This camera always has a long battery life. Joint Concept Vector: C = {battery, camera, impressive, long, battery life} V1 = { 1.0, 1.0, 1.0, 0.25, 0.5 } V2 = { 0.5, 1.0, 0.25, 1.0, 1.0 } sim(S1, S2) = = Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Sentence clustering (1/2) 1.Hierarchical clustering: 2.Non-hierarchical clustering: Overview Methodology Evaluation Overview Methodology Evaluation 34 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Sentence clustering (2/2) Overview Methodology Evaluation Overview Methodology Evaluation 35 To estimate the number of clusters, we adopt the graph-based algorithm proposed in [Hat01] To estimate the number of clusters, we adopt the graph-based algorithm proposed in [Hat01] Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION d/ Compact presentation Overview Methodology Evaluation Overview Methodology Evaluation 36 1.Sentences are now grouped into subtopics. 2.Determine the orientation for every sentences in the cluster. 3.For each positive/negative partition P, we would select the sentence with the maximum representative power to display 1.Sentences are now grouped into subtopics. 2.Determine the orientation for every sentences in the cluster. 3.For each positive/negative partition P, we would select the sentence with the maximum representative power to display Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation a/ Experimental Data From the same dataset used in the previous component, we extract a subset of those facets with high frequency in each product. Camera: 8 facets Phone: 8 facets DVD: 6 facets From the same dataset used in the previous component, we extract a subset of those facets with high frequency in each product. Camera: 8 facets Phone: 8 facets DVD: 6 facets 37 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation Experiment Results – Number of subtopics (average) 38 Manual subtopics SenSim ([Li06]) SenSim (+ADJ) Camera Phone DVD Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

1.Purity: rewards the clustering solution that introduces less noise in each cluster: 2.Inverse Purity: rewards the clustering solution that gathers more elements (of the same cluster in the gold standard) into a corresponding cluster: 1.Purity: rewards the clustering solution that introduces less noise in each cluster: 2.Inverse Purity: rewards the clustering solution that gathers more elements (of the same cluster in the gold standard) into a corresponding cluster: Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation b/ Evaluation Measure (1/2) 39 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

F-measure: The harmonic mean of purity and inverse purity ( α = 0.5): F-measure: The harmonic mean of purity and inverse purity ( α = 0.5): Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation b/ Evaluation Measure (2/2) 40 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation c/ Experiment Results – Performance using SenSim (+ADJ) 41 Camera %+32.63%+33.80%+36.21%+34.13%+38.89% Phone %+32.00%+18.74%+8.63%+24.64%+17.16% DVD %+27.72%+22.73%+8.33%+19.34%+15.94% Random (200)HierarchicalNon-hierarchical (200) PurityI-PurityF(0.5)PurityI-PurityF(0.5)PurityI-PurityF(0.5) Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 42

Ly Duy Khang CS4101 B.COMP. DISSERTATION Limitation and Future work 1.We do not conduct human evaluation on the effectiveness of the new proposed summary compared to the current ones. 2.Automatic sentiment analysis module integration. 3.Better sentence semantic similarity measurement with deep analysis. 4.Implicit facets handling. 5.Sentence reformulation for summary output. 6.Extend subtopics to other review summarization settings. 1.We do not conduct human evaluation on the effectiveness of the new proposed summary compared to the current ones. 2.Automatic sentiment analysis module integration. 3.Better sentence semantic similarity measurement with deep analysis. 4.Implicit facets handling. 5.Sentence reformulation for summary output. 6.Extend subtopics to other review summarization settings. 43 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Conclusion 1.We designed a complete summarization system targeting the domain of product reviews. 2.We introduced an effective heuristic rule using syntactic role to improve the process of identifying product facets. 3.We showed the existence of subtopic within the discussion of product facets and addressed this limitation in current summarization system with our proposed clustering component. 4.We extended the sentence semantic similarity measurement with sentiment information. 1.We designed a complete summarization system targeting the domain of product reviews. 2.We introduced an effective heuristic rule using syntactic role to improve the process of identifying product facets. 3.We showed the existence of subtopic within the discussion of product facets and addressed this limitation in current summarization system with our proposed clustering component. 4.We extended the sentence semantic similarity measurement with sentiment information. 44 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

[Barzilay02] Barzilay, R., Elhadad, N., & McKeown, K. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17, 35–55. [Car98b] Carbonell, J., & Goldstein, J. (1998). The use of MMR, Diversity-based Re-ranking for Reordering Documents and Producing Summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 335–336. [Ding08] Ding, X., Liu, B., & Yu, P. S. (2008). A Holistic Lexicon-based Approach to Opinion Mining. Proceedings of the international conference on Web search and web data mining – WSDM [Hat01] Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., yen Kan, M., & McKeown, K. R. (2001). Simnder: A exible clustering tool for summarization. In Proceedings of the NAACL Workshop on Automatic Summarization, [Hat97] Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the Semantic Orientation of Adjectives. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, [Hovy01] Hovy, E. H. (2001). Automated text summarization. Handbook of computational linguistics. Oxford University Press, Oxford. ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 45 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

[Knight00] Knight, K., & Marcu, D. (2000). Statistics-based summarization-step one: Sentence compression. Proceedings of the National Conference on Artificial Intelligence, 703–710 [Barzilay99] Barzilay, R., Mckeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, 550–557. [Hu04b] Hu, M., & Liu, B. (2004b). Mining Opinion Features in Customer Reviews. Proceedings of the National Conference on Artificial Intelligence, [Hu05] Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. Proceedings of the 14th international conference on World Wide Web [Kim06] Kim, S. M., & Hovy, E. (2006). Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. Computational Linguistics [Li06] Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Trans. on Knowledge and Data Engineering, 18 (8), ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 46 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

[Liu09] Liu, B. (2009). Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, 1-38 [Popescu05] Popescu, A. M., & Etzioni, O. (2005). Extracting Product Features and Opinions from Reviews. Computational Linguistics, [Radev04] Radev, D., Jing, H., Sty ś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40(6), 919–938. [Turney02] Turney, P., C., & Littman, M. (2002). Unsupervised Learning of Semantic Orientation From a Hundred-Billion-Word Corpus. [Wiebe99] Wiebe, J. M., Bruce, R. F., & O'Hara, T. P. (1999). Development and Use of a Gold- standard Data Set for Subjectivity Classifications. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics [Ye05] Ye, S., Qiu, L., Chua, T., & Kan, M. Y. (2005). NUS at DUC 2005: Understanding Documents via Concept Links. Document Understanding Conference (DUC) [Yu03] Yu, H., & Hatzivassiloglou, V. (2003). Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. Proceedings of the conference on Empirical methods in natural language processing, ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 47 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion

Ly Duy Khang CS4101 B.COMP. DISSERTATION Q & A 48

Ly Duy Khang CS4101 B.COMP. DISSERTATION Thank you for your attention 49