Download presentation
Presentation is loading. Please wait.
1
Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Product Review Summarization from a Deeper Perspective Ly Duy Khang Supervisor: A/P KAN Min Yen Ly Duy Khang CS4101 B.COMP. DISSERTATION 1
2
1. Introduction Motivation Related work Problem statement & Our approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Preliminary Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION 2 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
3
1. Introduction Motivation Related work Problem statement & Our approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Preliminary Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 3
4
Ly Duy Khang CS4101 B.COMP. DISSERTATION Product review A media commonly provided by online merchants for customers to review and express opinions on the products that they have purchased. Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 4 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
5
Product review is an important source of information: 1.More and more people are shopping online, as a result of the expansion of e-commerce. 2.Enables customers to find opinions about products easily, as well as to share them with their peers. 3.Allows producers to get certain degree of feedback. 1.More and more people are shopping online, as a result of the expansion of e-commerce. 2.Enables customers to find opinions about products easily, as well as to share them with their peers. 3.Allows producers to get certain degree of feedback. Ly Duy Khang CS4101 B.COMP. DISSERTATION Problems 1.The number of reviews is often too large, and is still growing rapidly. 2.It is difficult to locate and capture opinions effectively. 1.The number of reviews is often too large, and is still growing rapidly. 2.It is difficult to locate and capture opinions effectively. Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 5 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
6
Product review summarization system 1.Automatically process a large collection of reviews. 2.Identify topics and opinions in the review. 3.Aggregate all information and present a concise summary to the user. 1.Automatically process a large collection of reviews. 2.Identify topics and opinions in the review. 3.Aggregate all information and present a concise summary to the user. Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 6 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
7
Summarization The task of extracting and presenting the most important information from the inputs. News headline Program agenda Scientific paper abstract … The task of extracting and presenting the most important information from the inputs. News headline Program agenda Scientific paper abstract … Ly Duy Khang CS4101 B.COMP. DISSERTATION Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach 7 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
8
Review Summarization Focus on opinions (techniques from Sentiment Analysis): Thumbs-up/Thumbs-down indication: [Turney02] Facet-based summary: [Hu04a],[Hu04b],[Popescu05] Comparative summary: [Hu05] Focus on opinions (techniques from Sentiment Analysis): Thumbs-up/Thumbs-down indication: [Turney02] Facet-based summary: [Hu04a],[Hu04b],[Popescu05] Comparative summary: [Hu05] Ly Duy Khang CS4101 B.COMP. DISSERTATION 8 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Product Facet examples: 1.Camera: “battery life”, “lens”, “flash”, “resolution”, etc. 2.Music player: “sound”, “weight”, “size”, “storage”, etc. 1.Camera: “battery life”, “lens”, “flash”, “resolution”, etc. 2.Music player: “sound”, “weight”, “size”, “storage”, etc. Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
9
Ly Duy Khang CS4101 B.COMP. DISSERTATION 9 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Google Product Bing Shopping
10
Problem statement Produce a facet-based summary of product review that captures Opinions of users. Evidences that support those opinions. Produce a facet-based summary of product review that captures Opinions of users. Evidences that support those opinions. Ly Duy Khang CS4101 B.COMP. DISSERTATION 10 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
11
Ly Duy Khang CS4101 B.COMP. DISSERTATION 11 Motivation Related work Problem statement & Our approach Motivation Related work Problem statement & Our approach Approach and Contribution Two main components: 1.Product Facet Identification Re-implement the baseline from [Hu04a] Contribute a new effective heuristic to improve the accuracy 2.Subtopic Summarization Initiate a sentence clustering solution Make necessary modification to sentence semantic similarity measurement (adopted from [Li06] and [Kong07]) Two main components: 1.Product Facet Identification Re-implement the baseline from [Hu04a] Contribute a new effective heuristic to improve the accuracy 2.Subtopic Summarization Initiate a sentence clustering solution Make necessary modification to sentence semantic similarity measurement (adopted from [Li06] and [Kong07]) Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
12
1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 12
13
Why do we want to automate this task? 1.It is hard or even impossible to obtain a complete list of facets. e.g., iPhone’s alarm function 2.Different set of words used by users and manufacturers/sellers to describe the same facet. e.g., Price vs. Value; Body vs. Case 3.The manufacturer may not want to include those weak facets of their product. e.g., iPhone is unable to play Flash on the Web 1.It is hard or even impossible to obtain a complete list of facets. e.g., iPhone’s alarm function 2.Different set of words used by users and manufacturers/sellers to describe the same facet. e.g., Price vs. Value; Body vs. Case 3.The manufacturer may not want to include those weak facets of their product. e.g., iPhone is unable to play Flash on the Web Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 13 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
14
Ly Duy Khang CS4101 B.COMP. DISSERTATION Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Explicit/Implicit product facet Product facets can be expressed explicitly or implicitly. 1.The pictures of this camera are very clear. 2.The camera fits nicely into my palm. Product facets can be expressed explicitly or implicitly. 1.The pictures of this camera are very clear. 2.The camera fits nicely into my palm. We only consider explicit facet – appears as noun/noun phrase in the sentence. We only consider explicit facet – appears as noun/noun phrase in the sentence. 14 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
15
Ly Duy Khang CS4101 B.COMP. DISSERTATION Architecture Overview Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 15 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
16
Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Preprocessing 1.Process each input sentence with a Part-of-Speech (POS) Tagger to obtain the POS label for each word. 2.Remove stop words from the result. 3.Stem each word to obtain its root form 4.Only noun/noun phrases are fed to the next module. 1.Process each input sentence with a Part-of-Speech (POS) Tagger to obtain the POS label for each word. 2.Remove stop words from the result. 3.Stem each word to obtain its root form 4.Only noun/noun phrases are fed to the next module. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 16 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
17
Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Frequent Mining Identify all frequent noun/noun phrases that satisfy the minimum support, which is defined as the minimum number of sentences containing that noun/noun phrases. Identify all frequent noun/noun phrases that satisfy the minimum support, which is defined as the minimum number of sentences containing that noun/noun phrases. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 17 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
18
Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Post Processing (1/2) 1.Usefulness pruning: Remove single-word facet that is likely to be meaningless. e.g. life battery life 2.Compactness pruning: Remove facet phrase that is not compact. e.g. sample photo photo 1.Usefulness pruning: Remove single-word facet that is likely to be meaningless. e.g. life battery life 2.Compactness pruning: Remove facet phrase that is not compact. e.g. sample photo photo Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 18 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
19
Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Post Processing (2/2) 3.Infrequent facet discovery: help discover genuine facets that are not mentioned a lot. Gather opinion words that modify frequent facets. For each sentence that does not contain frequent facet but one or more opinion words, include the nearest noun/noun phrase as facet. 3.Infrequent facet discovery: help discover genuine facets that are not mentioned a lot. Gather opinion words that modify frequent facets. For each sentence that does not contain frequent facet but one or more opinion words, include the nearest noun/noun phrase as facet. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 19 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
20
Ly Duy Khang CS4101 B.COMP. DISSERTATION d/ Sentence Extraction Sentences that contain any of the product facets that we have discovered are labeled with that corresponding facet. Only opinionated sentences are sent down to the next component. Sentences that contain any of the product facets that we have discovered are labeled with that corresponding facet. Only opinionated sentences are sent down to the next component. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 20 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
21
Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Experimental Data From the same dataset as in [Hu04a]: 1 Digital Camera (45 reviews) 1 DVD Player (99 reviews) 1 Cell phone (41 reviews) From the same dataset as in [Hu04a]: 1 Digital Camera (45 reviews) 1 DVD Player (99 reviews) 1 Cell phone (41 reviews) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 21 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
22
Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Evaluation Measure Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 22 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
23
Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Experimental Result (Baseline) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement Baseline RecallPrecisionF Camera790.8220.7470.783 Phone670.7610.7180.739 DVD490.7970.7930.795 Avg.650.7930.7530.772 23 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
24
Ly Duy Khang CS4101 B.COMP. DISSERTATION Improvement - Syntactic Role (1/2) Many noisy results such as: “light”, “hand”, “time”, “month”, “hour”, etc. Filtered by considering the word’ syntactic role in the sentence. Many noisy results such as: “light”, “hand”, “time”, “month”, “hour”, etc. Filtered by considering the word’ syntactic role in the sentence. Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement 24 Improvement - Syntactic Role (2/2) During the preprocessing step, we do not pass down to the next module those noun/noun phrases that do not appear as subject/object in the sentence. During the preprocessing step, we do not pass down to the next module those noun/noun phrases that do not appear as subject/object in the sentence. Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
25
Ly Duy Khang CS4101 B.COMP. DISSERTATION Experimental Result (Baseline with Syntactic Role) Preliminaries Methodology Evaluation Improvement Preliminaries Methodology Evaluation Improvement RecallPrecisionF-measure BaselineImproveBaselineImproveBaselineImprove Camera0.822 0.7470.8020.7830.812 Phone0.761 0.7180.7850.7390.773 DVD0.797 0.7930.8670.7950.831 Avg. 0.793 +0% 0.7530.818 +8.6% 0.7720.805 +4.3% 25 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
26
1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 26
27
Overview Methodology Evaluation Overview Methodology Evaluation Ly Duy Khang CS4101 B.COMP. DISSERTATION 27 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
28
Ly Duy Khang CS4101 B.COMP. DISSERTATION How often does subtopic exist? Overview Methodology Evaluation Overview Methodology Evaluation CameraSubtopics Memory3 LCD6 Lens7 …… Average5.125 PhoneSubtopics Radio3 Headset4 Signal3 …… Average3.5 DVDSubtopics Price1 Remote4 Format1 …… Average2 28 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
29
Overview Methodology Evaluation Overview Methodology Evaluation Ly Duy Khang CS4101 B.COMP. DISSERTATION 29 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
30
Ly Duy Khang CS4101 B.COMP. DISSERTATION Architecture Overview Overview Methodology Evaluation Overview Methodology Evaluation 30 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
31
Ly Duy Khang CS4101 B.COMP. DISSERTATION a/ Preprocessing 1.General Entity pruning Product class name: “camera”, “DVD”, “phone”, etc. Brand name: “Nikon”, “Canon”, “iPod”, “Kingston”, etc. 2.Similarity pruning ([Kong07]) “picture” vs. “image”, “photo” “display” vs. “monitor” “Megapixel” vs. “Resolution” 1.General Entity pruning Product class name: “camera”, “DVD”, “phone”, etc. Brand name: “Nikon”, “Canon”, “iPod”, “Kingston”, etc. 2.Similarity pruning ([Kong07]) “picture” vs. “image”, “photo” “display” vs. “monitor” “Megapixel” vs. “Resolution” Overview Methodology Evaluation Overview Methodology Evaluation 31 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
32
Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Sentence representation & Semantic similarity measurement (1/2) Adopted from the work by [Li 06], a scalable vector formulation is used to represent sentence, followed by cosine distance between two vectors for sentence semantic similarity measurement Adopted from the work by [Li 06], a scalable vector formulation is used to represent sentence, followed by cosine distance between two vectors for sentence semantic similarity measurement Overview Methodology Evaluation Overview Methodology Evaluation 32 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
33
Ly Duy Khang CS4101 B.COMP. DISSERTATION b/ Sentence representation & Semantic similarity measurement (2/2) Overview Methodology Evaluation Overview Methodology Evaluation S1 = The battery of my camera is very impressive. S2 = This camera always has a long battery life. Joint Concept Vector: C = {battery, camera, impressive, long, battery life} V1 = { 1.0, 1.0, 1.0, 0.25, 0.5 } V2 = { 0.5, 1.0, 0.25, 1.0, 1.0 } sim(S1, S2) = = 0.75 33 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
34
Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Sentence clustering (1/2) 1.Hierarchical clustering: 2.Non-hierarchical clustering: Overview Methodology Evaluation Overview Methodology Evaluation 34 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
35
Ly Duy Khang CS4101 B.COMP. DISSERTATION c/ Sentence clustering (2/2) Overview Methodology Evaluation Overview Methodology Evaluation 35 To estimate the number of clusters, we adopt the graph-based algorithm proposed in [Hat01] To estimate the number of clusters, we adopt the graph-based algorithm proposed in [Hat01] Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
36
Ly Duy Khang CS4101 B.COMP. DISSERTATION d/ Compact presentation Overview Methodology Evaluation Overview Methodology Evaluation 36 1.Sentences are now grouped into subtopics. 2.Determine the orientation for every sentences in the cluster. 3.For each positive/negative partition P, we would select the sentence with the maximum representative power to display 1.Sentences are now grouped into subtopics. 2.Determine the orientation for every sentences in the cluster. 3.For each positive/negative partition P, we would select the sentence with the maximum representative power to display Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
37
Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation a/ Experimental Data From the same dataset used in the previous component, we extract a subset of those facets with high frequency in each product. Camera: 8 facets Phone: 8 facets DVD: 6 facets From the same dataset used in the previous component, we extract a subset of those facets with high frequency in each product. Camera: 8 facets Phone: 8 facets DVD: 6 facets 37 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
38
Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation Experiment Results – Number of subtopics (average) 38 Manual subtopics SenSim ([Li06]) SenSim (+ADJ) Camera 5.1251.8753.0 Phone 3.51.52.5 DVD 21.1671.5 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
39
1.Purity: rewards the clustering solution that introduces less noise in each cluster: 2.Inverse Purity: rewards the clustering solution that gathers more elements (of the same cluster in the gold standard) into a corresponding cluster: 1.Purity: rewards the clustering solution that introduces less noise in each cluster: 2.Inverse Purity: rewards the clustering solution that gathers more elements (of the same cluster in the gold standard) into a corresponding cluster: Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation b/ Evaluation Measure (1/2) 39 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
40
F-measure: The harmonic mean of purity and inverse purity ( α = 0.5): F-measure: The harmonic mean of purity and inverse purity ( α = 0.5): Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation b/ Evaluation Measure (2/2) 40 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
41
Ly Duy Khang CS4101 B.COMP. DISSERTATION Overview Methodology Evaluation Overview Methodology Evaluation c/ Experiment Results – Performance using SenSim (+ADJ) 41 Camera 0.5240.6170.5420.6760.8190.7250.7140.8280.753 5.125 +29.02%+32.63%+33.80%+36.21%+34.13%+38.89% Phone 0.6470.5930.6040.6820.7830.7170.7020.7390.707 3.5 +5.54%+32.00%+18.74%+8.63%+24.64%+17.16% DVD 0.8250.6220.6820.9040.7950.8370.8940.7430.791 2 +9.60%+27.72%+22.73%+8.33%+19.34%+15.94% Random (200)HierarchicalNon-hierarchical (200) PurityI-PurityF(0.5)PurityI-PurityF(0.5)PurityI-PurityF(0.5) Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
42
1. Introduction Motivation Related work Approach 2. Product Facet Identification Preliminaries Methodology Evaluation Improvement 3. Subtopic Summarization Overview Methodology Evaluation 4. Discussion and Conclusion OutlineOutline Ly Duy Khang CS4101 B.COMP. DISSERTATION Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion 42
43
Ly Duy Khang CS4101 B.COMP. DISSERTATION Limitation and Future work 1.We do not conduct human evaluation on the effectiveness of the new proposed summary compared to the current ones. 2.Automatic sentiment analysis module integration. 3.Better sentence semantic similarity measurement with deep analysis. 4.Implicit facets handling. 5.Sentence reformulation for summary output. 6.Extend subtopics to other review summarization settings. 1.We do not conduct human evaluation on the effectiveness of the new proposed summary compared to the current ones. 2.Automatic sentiment analysis module integration. 3.Better sentence semantic similarity measurement with deep analysis. 4.Implicit facets handling. 5.Sentence reformulation for summary output. 6.Extend subtopics to other review summarization settings. 43 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
44
Ly Duy Khang CS4101 B.COMP. DISSERTATION Conclusion 1.We designed a complete summarization system targeting the domain of product reviews. 2.We introduced an effective heuristic rule using syntactic role to improve the process of identifying product facets. 3.We showed the existence of subtopic within the discussion of product facets and addressed this limitation in current summarization system with our proposed clustering component. 4.We extended the sentence semantic similarity measurement with sentiment information. 1.We designed a complete summarization system targeting the domain of product reviews. 2.We introduced an effective heuristic rule using syntactic role to improve the process of identifying product facets. 3.We showed the existence of subtopic within the discussion of product facets and addressed this limitation in current summarization system with our proposed clustering component. 4.We extended the sentence semantic similarity measurement with sentiment information. 44 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
45
[Barzilay02] Barzilay, R., Elhadad, N., & McKeown, K. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17, 35–55. [Car98b] Carbonell, J., & Goldstein, J. (1998). The use of MMR, Diversity-based Re-ranking for Reordering Documents and Producing Summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 335–336. [Ding08] Ding, X., Liu, B., & Yu, P. S. (2008). A Holistic Lexicon-based Approach to Opinion Mining. Proceedings of the international conference on Web search and web data mining – WSDM [Hat01] Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., yen Kan, M., & McKeown, K. R. (2001). Simnder: A exible clustering tool for summarization. In Proceedings of the NAACL Workshop on Automatic Summarization, 41-49 [Hat97] Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the Semantic Orientation of Adjectives. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, 174-181. [Hovy01] Hovy, E. H. (2001). Automated text summarization. Handbook of computational linguistics. Oxford University Press, Oxford. ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 45 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
46
[Knight00] Knight, K., & Marcu, D. (2000). Statistics-based summarization-step one: Sentence compression. Proceedings of the National Conference on Artificial Intelligence, 703–710 [Barzilay99] Barzilay, R., Mckeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, 550–557. [Hu04b] Hu, M., & Liu, B. (2004b). Mining Opinion Features in Customer Reviews. Proceedings of the National Conference on Artificial Intelligence, 755-760 [Hu05] Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. Proceedings of the 14th international conference on World Wide Web [Kim06] Kim, S. M., & Hovy, E. (2006). Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. Computational Linguistics [Li06] Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Trans. on Knowledge and Data Engineering, 18 (8), 1138-1150. ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 46 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
47
[Liu09] Liu, B. (2009). Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, 1-38 [Popescu05] Popescu, A. M., & Etzioni, O. (2005). Extracting Product Features and Opinions from Reviews. Computational Linguistics, 339-346. [Radev04] Radev, D., Jing, H., Sty ś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40(6), 919–938. [Turney02] Turney, P., C., & Littman, M. (2002). Unsupervised Learning of Semantic Orientation From a Hundred-Billion-Word Corpus. [Wiebe99] Wiebe, J. M., Bruce, R. F., & O'Hara, T. P. (1999). Development and Use of a Gold- standard Data Set for Subjectivity Classifications. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics 246-253 [Ye05] Ye, S., Qiu, L., Chua, T., & Kan, M. Y. (2005). NUS at DUC 2005: Understanding Documents via Concept Links. Document Understanding Conference (DUC) [Yu03] Yu, H., & Hatzivassiloglou, V. (2003). Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. Proceedings of the conference on Empirical methods in natural language processing,129-136 ReferencesReferences Ly Duy Khang CS4101 B.COMP. DISSERTATION 47 Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion Introduction Product Facet Identification Subtopic Summarization Discussion and Conclusion
48
Ly Duy Khang CS4101 B.COMP. DISSERTATION Q & A 48
49
Ly Duy Khang CS4101 B.COMP. DISSERTATION Thank you for your attention 49
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.