Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.

Slides:

Advertisements

Similar presentations

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.

TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.

Sentiment Analysis An Overview of Concepts and Selected Techniques.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.

Adding Common Sense into Artificial Intelligence Common Sense Computing Initiative Software Agents Group MIT Media Lab.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Mining and Summarizing Customer Reviews

Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Knowledge representation

Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.

Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.

HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )

Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Software Quality in Use Characteristic Mining from Customer Reviews Warit Leopairote, Athasit Surarerks, Nakornthip Prompoon Department of Computer Engineering,

Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Emotion Recognition from Text Using Situational Information and a Personalized Emotion Model Yong-soo Seol 1, Han-woo Kim 1, and Dong-joo Kim 2 1 Department.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )

2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.

1 Commonsense Reasoning in and over Natural Language Hugo Liu Push Singh Media Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Sentiment analysis algorithms and applications: A survey

University of Computer Studies, Mandalay

Presented by: Prof. Ali Jaoua

Review-Level Aspect-Based Sentiment Analysis Using an Ontology

Hierarchical, Perceptron-like Learning for OBIE

Presentation transcript:

Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT) 2 Department of Computer and Engineering, Malaviya National Institute of Technology (MNIT) Computational Intelligence and Neuroscience 2015 報告者：劉憶年 2016/3/22

Outline Introduction ConceptNet Related Work Proposed Methodology Experiments and Results Conclusions 2

Introduction (1/4) The field of sentiment classification is an exciting new research direction due to the large number of real world applications where discovering people’s opinion is important for better decision-making. The automatic analysis of online contents to extract opinion requires deep understanding of natural text by the machine; capabilities of most of the existing models are known to be unsatisfactory. 3

Introduction (2/4) An opinion lexicon is a dictionary containing opinion words with their polarity value to indicate the positive or negative sentiments, for example, “happy,” “excellent,” “bad,” “boring,” and so forth. In the literature, several opinion lexicons are publically available like SentiWordNet, General Inquirer, SenticNet, and so forth. Opinion words change their polarity value depending on the context. Therefore, it is necessary to obtain the accurate polarity value of a word with the help of contextual information of opinion word. 4

Introduction (3/4) First important task in sentiment analysis is to identify the opinion targets (aspects, entities, and topic identification problems) about which some opinion is expressed. And, the second is to construct the opinion lexicon (good, excellent, etc.), for example, “I am very glad with the ambience of this restaurant.” “Ambience” is the opinion target of the writer and “glad” is the opinion word in this example. In the proposed work, overall polarity is computed at document level considering aspect level polarity of various aspects. Polarity of various aspects are computed and aggregated. Proposed model would be able to incorporate the importance of the features into consideration in determining the overall sentiment of the document. 5

Introduction (4/4) Common-sense reasoning is often performed through common-sense ontologies and the use of reasoning algorithms, such as predicate logic and machine learning, to reach a conclusion. Concept-level text analysis focuses on a semantic analysis of text through the use of web ontologies or semantic networks, which allow the aggregation of conceptual and affective information associated with natural language opinions. With the help of the domain specific ontology consisting of the common-sense knowledge produces only useful features. Further, polarity of an opinion word is determined with the help of contextualized sentiment lexicon. 6

ConceptNet (1/2) ConceptNet is a large semantic network consisting of large number of common-sense concepts. It is the largest machine usable commonsense resource consisting of more than 250,000 relations. It is the largest publicly available common-sense knowledge base which can be used to mine various inferences from the text. It consists of nodes (concepts) connected by edges (relations between concepts). Some of the relationships between concepts in the ConceptNet are IsA, EffectOf, CapableOf, MadeOf, DesireOf, and so forth. InConceptNet, an assertion is defined by five properties, that is, language, relation, concept 1, concept 2, and frequency. 7

ConceptNet (2/2) The ConceptNet semantic graph represents the information from the Open Mind corpus as a directed graph, in which the nodes are concepts and the labeled edges are commonsense assertions that interconnect them. 8

Related Work (1/5) Techniques employed by sentiment analysis models can be broadly categorized into machine learning and semantic orientation approaches. In corpus based approach, polarity value is computed based on the cooccurrences of the term with other positive or negative seed words in the corpus; there are various methods reported in the literature to determine the polarity value of the term, whereas, dictionary based approaches utilize the predeveloped polarity lexicons like SentiWordNet, WordNet, SenticNet, and so forth. 9

Related Work (2/5) Initially, sentiment-rich features are extracted from the unstructured text. Further, semantic orientations of these sentiment-rich features are determined based on corpus or dictionary. Finally, overall polarity of the document is determined by aggregating the semantic orientations of all the features. Opinion target extraction is more important in product reviews to identify the product features because each product may have large number of features and it is very difficult to make an explicit list of such features for every product. Therefore, it is necessary to have such mechanism, which may automatically identify the product features from the text. 10

Related Work (3/5) Feature-specific sentiment analysis finds out the opinion expressed with respect to a specific feature of the product. To determine the accurate polarity of the features is the key phase for building efficient sentiment analysis model. Most of the sentiment analysis models use domain- independent sentiment dictionary to determine the polarity of the features. 11

Related Work (4/5) In the literature, varieties of sentiment lexicons are freely available, but those lexicons do not provide contextual information in determining polarity value of a feature. Most of the sentiment lexicons available publically like SentiWordNet, General Inquirer, SenticNet, and so forth are domain-independent and sentiment analysis problem is domain dependent problem. Semantic orientation of a word changes according to the domain or the context in which that word is being used. In the proposed approach, we combined various sentiment lexicons present in the literature and also build contextualized sentiment lexicon. 12

Related Work (5/5) ConceptNet as a knowledge resource is also used for sentiment analysis. Our work is different from these approaches in the sense that we expanded the ontology with the help of WordNet for better coverage of the product features and also we used contextual polarity lexicon developed by us to determine the polarity of the extracted features. 13

Proposed Methodology 14

Proposed Methodology -- Construction of Domain Specific Ontology from ConceptNet. (1/3) Ontology can be described as a connection of concepts with semantic relations. Constructed ontology can be considered as a commonsense knowledge base consisting of the domain specific concepts and relation among them. In the proposed approach, initially, we construct the domain specific ontology with the help of ConceptNet. Next, we expand the ontology by merging the ontologies of the synonyms of the domain name for better coverage of domain specific features. 15

Proposed Methodology -- Construction of Domain Specific Ontology from ConceptNet. (2/3) We extract the concepts from the ConceptNet upto level 4.The level of the ontology is set empirically. It is observed that as we increase the levels in ConceptNet, irrelevant concepts are extracted. 16

Proposed Methodology -- Construction of Domain Specific Ontology from ConceptNet. (3/3) 17

Proposed Methodology -- Aspect Extraction. Aspect is a term about which an opinion is expressed. To identify the aspect term, first of all, reviews are part- of- speech tagged and all the noun and noun + noun terms are extracted. Stanford POS-tagger is used for tagging the review documents. After the extraction of noun and noun + noun based aspects, these are matched with the domain specific ontology constructed in earlier section to eliminate all the unimportant and irrelevant aspects. All the irrelevant nouns-based aspects are pruned with the help of ontology. 18

Proposed Methodology -- Feature-Specific Opinion Extraction. Users generally express opinion on multiple features of product. It is important to get the association of the opinion target (i.e., aspect) and opinion words. To detect the association between opinion target and opinion words dependency parsing is used. Stanford dependency parser is used for extracting dependency rules from the review documents. 19

Proposed Methodology -- Construction of Contextual Polarity Lexicon. (1/3) A polarity lexicon is a dictionary containing the terms/words with their polarity value. Further, we determine the ambiguous terms from this sentiment lexicon and determine their polarity depending on the context in which they appear. Finally, polarity of the opinion word is retrieved from sentiment lexicon and contextual polarity lexicon. 20

Proposed Methodology -- Construction of Contextual Polarity Lexicon. (2/3) –SenticNet. It is a lexical resource constructed by clustering the vector space model of affective commonsense knowledge extracted from ConceptNet. This dictionary produces a list of concepts with their polarity value. Polarity value of the concepts given in this dictionary is computed by exploiting artificial intelligence (AI) and semantic web techniques. –SentiWordNet. SentiWordNet is a sentiment dictionary containing polarity score of opinion words. SentiWordNet was built using WordNet and a ternary classifier. The classifier is based on “bag of synset” model which uses manually disambiguated glosses available from the Princeton WordNet Gloss Corpus. 21

Proposed Methodology -- Construction of Contextual Polarity Lexicon. (3/3) –General Inquirer. The General Inquirer (GI) was one of the first sentiment dictionaries, publicly available dictionary, used for automatic text analysis. 22

Proposed Methodology -- Construction of Contextual Polarity Lexicon. -- Ambiguous Term Determination. The terms, which occur dominantly in positive context and occur very less in negative context, are most likely to be unambiguous terms. In contrast, the terms, which occur equally in positive and negative contexts may be ambiguous terms. 23

Proposed Methodology -- Construction of Contextual Polarity Lexicon. -- Context Term Determination. Context terms are cooccurring terms with the ambiguous terms, which can change the polarity of the term. We need to consider the context term to determine the accurate polarity of the ambiguous terms. Context terms are considered as all the nouns, adjectives, adverbs, and verbs that occurred in two sentences before and after the occurrence of the ambiguous term in all the review documents. 24

Proposed Methodology -- Sentiment Aggregation. First of all, features are extracted from the review document, and then it is matched in the ontology. The level of ontology where it is located determines the importance of the feature. Initially, polarity of an opinion word is retrieved from ambiguous polarity list, if the word is present. Contribution of the opinion word in determining the overall polarity is taken as polarity ∗ (height of ontology). Finally, the overall polarity of the document is determined by summing up the contribution of each term. 25

Experiments and Results -- Dataset and Evaluation. First is the restaurant review dataset. This corpus consists of 4,488 reviews. Polarity of the review documents is classified as positive or negative. Second corpus is the movie review dataset; it consists of 2000 reviews of equal number of negative and positive reviews. And the third dataset is the software review dataset. It consists of 1000 positive and 915 negative review documents. Accuracy is used as an evaluation measure. A simple negation handling method is used in all the experiments. In this method, negation terms (no, not, and never) are added to all the adjectives present in the sentence to reverse the polarity of each term. 26

Experiments and Results -- Experiments. First is to investigate the effectiveness of the ontology used to prune the irrelevant features extracted from the review documents. It is done by selecting only domain specific important features based on ConceptNet. Second is to investigate the effectiveness of incorporating the importance of the features in determining the overall sentiment of the document. Third is to evaluate the contextual polarity lexicon in determining the polarity of the opinion words. 27

Experiments and Results -- Experiments. -- Baseline. In this approach, a sentiment lexicon is taken to retrieve the polarity value of all the features extracted from the review document. Then, it sums up the total positive and negative polarity values of all the words of the document; if total positive polarity is greater than total negative polarity value, then the positive sentiment was assigned to document and vice versa. 28

Experiments and Results -- Experiments. -- Domain Specific Ontology BasedMethod. In this experimental setting, we extracted the features with noun and (noun + noun) combinations from the review documents; further, extracted features are matched in the ontology to select only domain specific important features. Then, dependency parsing is used to get the opinion words corresponding to the features extracted in first phase. Then, all the three lexicons are used to get the polarity value of the opinion words. Finally, polarity value of all the opinion words in a document is summed to get the final polarity of the document. 29

Experiments and Results -- Experiments. -- Considering the Importance of the Features. This approach is similar to previous approach, that is, domain specific ontology based method except in polarity computation; we consider the importance of the feature by looking at the level of match of the feature in the domain specific ontology. 30

Experiments and Results -- Experiments. -- Contextual Sentiment Lexicon. In this experiment, we add the contextual polarity lexicon, in addition to all the three lexicons. 31

Experiments and Results -- Experiments. -- Considering Contextual Information and Importance of the Feature. In this experiment, we investigate the effect of the context information, importance of the features, and domain specific ontology, all together to determine the efficiency of the proposed sentiment analysis model. 32

Experiments and Results -- Results. Both methods are unsupervised; we only require a predefined ontology and prebuilt polarity lexicons. It is observed from the experiments that the effect of common-sense knowledge based ontology slightly depends on the type of the dataset. It improves the performance for the domains, which have more possible aspects. 33

Experiments and Results -- Comparison with Related Work. Our work is different from these approaches in the sense that we expanded the ontology with the help of WordNet for better coverage of the product features and also we used contextual polarity lexicon developed by us to determine the polarity of the extracted features. 34

Conclusions In this paper, we investigate the effect of three factors, that is, domain specific ontology, importance of the features, and contextual information all together in determining the overall sentiment of text. The proposed approach including all these information significantly improves the performance of the sentiment analysis model over the baseline method. All the experiments are performed on various review datasets, namely, software, movie, and restaurants. Experimental results show the effectiveness of the proposed approach. Future work involves discovering methods to enrich the knowledge base. Along with using ConceptNet, how other ontologies can help to enrich the concept mining process is also a big task to deal with in future. 35