A Survey of Opinion Mining Dongjoo Lee Intelligent Database Systems Lab. Dept. of Computer Science and Engineering Seoul National University Good Morning. My name is Dongjoo Lee. I am a member of the Intelligent Database Systems Lab. I am here to present a survey of opinion mining.
Introduction Related Areas The Web contains a wealth of opinions about products, politics, and more in newsgroup posts, review sites, and other web sites A few problems What is the general opinion on the proposed tax reform? How is popular opinion on the presidential candidates evolving? Which of our customers are unsatisfied? Why? Opinion Mining (OM) a recent discipline at the crossroads of information retrieval and computational linguistics which is concerned not with the subject of a document, but with opinion it expresses Related Areas Data Mining(DM), Information Retrieval (IR), Text Classification (TC), Text Summarization (TS) The web contains a wealth of opinions about products, politics, and more in newsgroup posts, review sites, and elsewhere. Many researchers and businesses try to obtain useful information from these opinion resources. These are examples of problems to be solved using opinion resources. Noticeably, many research papers deal with obtaining customer opinion. Opinion mining is a recent discipline that tries to solve these problems. And it is at the crossroads of information retrieval and computational linguistics. It is concerned with the opinion a document expresses, not with the subject of a document. Many techniques used in related areas, such as IR, data mining, text classification and text summarization have been used to solve OM problems. Center for E-Business Technology
Agenda Introduction Development of Linguistic Resource Conjunction Method PMI Method WordNet Expanding Method Gloss Use Method Sentiment Classification Machine Learning Method NLP Combined Method Extracting and Summarizing Opinion Expression Statistical Approach NLP Based Approach Discussion I divided my presentation into three sections. At first, I’ll introduce development methods of linguistic resource. Next I’ll introduce sentiment classification methods. Then I’ll introduce several systems for extracting and summarizing opinion expression. At last, my discussion will be presented. Center for E-Business Technology
Development of Linguistic Resource (1) Linguistic resources can be used to extract opinion and to classify the sentiment of text Appraisal Theory Sentiment related properties are well-defined A framework of linguistic resources which describes how writers and speakers express inter-subjective and ideological position underlying linguistic foundation of OM Tasks Determining the subjectivity of a term Determining term orientation Determining the strength of term attitude Example Objective: vertical, yellow, liquid Subjective Positive: good < excellent Negative: bad < terrible Linguistic resources can be used to extract opinion and to classify sentiment. Sentiment related properties are well-defined in Appraisal Theory. Appraisal theory is a framework of linguistic resources which describes how writers and speakers use words. Tasks for developing linguistic resource are as follows. Determining the subjectivity of a term. Determining term orientation. Determining the strength of term attitude. These are objective terms and these are subjective terms. And excellent and terrible are more intense than good and bad. Center for E-Business Technology
Development of Linguistic Resource (2) Conjunction Method PMI Method Orientation Subjectivity WordNet Expansion Method Gloss Use Method SentiWordNet I investigated four methods for developing linguistic resources. Conjunction method and WordNet expansion method were used only for determining term orientation. And PMI method and gloss use method were used for determining term subjectivity and orientation. Gloss use method was also used for determining attitude strength. As a result of Gloss use method, SentiWordNet was constructed. Center for E-Business Technology
Conjunction Method - overview Hatzivassiloglou and McKeown, 1997 Hypothesis Adjectives in ‘and’ conjunctions usually have similar orientation, while ‘but’ is used with opposite orientation. Process Randomly selected adjectives with positive and negative orientation seed terms were used to predict orientation. negative All conjunction of adjectives are extracted from the corpus. A log-linear regression model combines information from different conjunctions to determine if each two conjoined adjectives are of same or different orientation. A clustering algorithm separates the adjectives into two subsets of different orientation. It places as many words of same orientation as possible into the same subset. The average frequencies in each group are compared and the group with the higher frequency is labeled as positive. positive seed terms corpus and but Hatzivassiloglou(하찌바씰로그로) and McKeown(맥커은) used term conjunction to determine the orientation of adjectives based on the hypothesis that, adjectives in ‘and’ conjunctions usually have similar orientation, while ‘but’ is used with opposite orientation. Orientation of terms is determined as follows. At first, all conjunctions of adjectives are extracted from the corpus. And al log-linear regression model combines information from different conjunctions to determine if each two conjoined adjectives are of same or different orientation. Then a clustering algorithm separates the adjectives into two subsets of different orientation. It places as many words of same orientation as possible into the same subset. At last, the average frequencies in each group are compared and the group with the higher frequency is labeled as positive. They used randomly selected adjectives as seed terms. Center for E-Business Technology
Conjunction Method – objective function and constraints Select pmin that minimizes Φ(p) dissimilarity between adjectives in same cluster is minimized and dissimilarity between adjectives in different cluster is maximized. Experiments HM term set : 1,336 adjectives 657 positive, 679 negative terms Methods to improve performance of orientation prediction But rule : Most conjunctions had same orientation, while some conjunctions linked by ‘but’ had almost opposite orientation log-linear regression model morphological relationship adequate-inadequate or thoughtful –thoughtless log-linear model with morphological relationship : 82.5% accuracy |Ci| : the cardinality of cluster i d(x, y) : the dissimilarity between adjectives x , y When clustering the adjectives, objective is to select partition p that minimizes this objective function. This means that dissimilarity among adjectives in same cluster is minimized and dissimilarity among adjectives in different cluster is maximized. For experiments, HM term set was created The authors experimented with these three methods to improve accuracy of orientation prediction. When using log-linear model with morphological relationship, they obtained the best overall accuracy. Center for E-Business Technology
PMI Method - overview Pointwise Mutual Information (PMI) a measure of association used in information theory and statistics Orientation Turney and Littman, 2003 terms with similar orientation tend to co-occur in documents Subjectivity Baroni and Vegnaduzzo, 2004 subjective adjectives tend to occur in the near of other subjective adjectives Pointwise Mutual Information is a measure of association. It is widely used in information theory and statistics. PMI between two words is calculated through this equation. Turney and Littman used PMI to determine term orientation based on the hypothesis that terms with similar orientation tend to co-occur in documents. And Baroni and Vegnaduzzo used PMI to determine term subjectivity based on the hypothesis that subjective adjectives tend to occur in the near of other subjective adjectives. Center for E-Business Technology
PMI Method – predicting semantic orientation Modified PMI was measured using the number of results returned by the AltaVista search engine with NEAR operator Predicting semantic orientation of a term SO(t) Experiments With HM term set and three corpora With small corpus, accuracy isn’t higher than conjunction method. With large corpus, accuracy is higher than conjunction method. t : target term ti : paradigmatic term Corpus AV-ENG AV-CA TASA Approx. # of word in corpus 1 *1011 2*109 1*107 Accuracy 87.13% 80.31% 61.83% Turney and Littman measured modified PMI using the number of results returned by the AltaVista search engine with NEAR operator. Using modified PMI with positive and negative seed term sets, they determined the orientation of a term. If SO value of a term t is bigger than zero, it is a positive term, while if it is less than zero, it is a negative term. They experimented with HM term set and three corpora that have different total word size. Even with a small corpus, accuracy isn’t higher than conjunction method, However with a large corpus, accuracy is higher than conjunction method. Center for E-Business Technology
WordNet Expansion Method Hu et al., 2004 used synonym and antonym relationship between words Hypothesis adjectives usually share the same orientation as their synonyms and opposite orientation as their antonyms By using a set of seed adjectives, orientation of all adjectives in WordNet can be assigned through a procedure exploring on the cluster graphs. Hu et al. used synonym and antonym relationship between words in the WordNet to determine orientation of terms. They assumes that adjectives usually share the same orientation as their synonyms and opposite orientation as their antonyms. By using a set of seed adjectives, orientation of all adjectives in WordNet can be assigned through a procedure exploring on the cluster graphs. This method was used at their opinion analysis system. Center for E-Business Technology
Gloss Use Method - overview Esuli et al., 2005, 2006 Hypothesis Orientation terms with similar orientation have similar glosses Subjectivity terms without orientation have non-oriented glosses SentiWordNet All words in the WordNet have three scores positivity, negativity, and objectivity Term Sense is positioned in reversed triangle good: that which is pleasing or valuable or useful; agreeable or pleasing beautiful: aesthetically pleasing pretty: pleasing by delicacy or grace; not imposing yellow: similar to the color of an egg yolk vertical: at right angles to the plane of the horizon or a base line Esuli et al. used term glosses to determine orientation and subjectivity of a term. They determined term orientation based on the hypothesis that terms with similar orientation have similar glosses. And they determined term subjectivity based on this hypothesis and another hypothesis that terms without orientation have non-oriented glosses. At last, they constructed the SentiWordNet in which all words in the WordNet have three scores: positivity, negativity and objectivity. Term sense is positioned in reversed triangle like this figure. Center for E-Business Technology
Gloss Use Method – classification process A seed set (Lp, Ln) is provided as input Lexical relations (e.g. synonymy) from a thesaurus, or online dictionary, are used to extend seed set. Once added to the original ones, the new terms yield two new, richer sets Trp and Trn; together they form the training set for the learning phase of Step 4. For each term ti in Trp∪Trn or in the test set, a textual representation of ti is generated by collating all the glosses of ti as found in a machine-readable dictionary. Each such representation is converted into vectorial form by standard text indexing techniques. A binary text classifier is trained on the terms in Trp∪Trn and then applied to the terms in the test set. Experiments Classifier : NB, SVM, PrTFIDF 87.38% Accuracy Term classification process is described in this figure. A seed term set is provided as input. And lexical relations from a thesaurus, are used to extend the seed set. Once added to the original one, the new terms yield two new, richer sets Trp and Trn. For each term ti in extended see set or Test set is expressed as a vectorial form. A binary text classifier is trained by a seed set and applied to terms in the test set. For Naïve bayesian, SVM, and PrTFIDF classifier, eighty seven percent accuracy was obtained. Center for E-Business Technology
Development of Linguistic Resource - Summary Method Intuition Accuracy Characteristics Conjunction Adjectives in and conjunctions usually have similar orientation, though but is used with opposite orientation 78.08% The First try test data : 1336 adjectives PMI method terms with similar orientation tend to co-occur in documents 87.13% No limitation Much time required WordNet Expansion adjectives usually share the same orientation as their synonyms and opposite orientation as their antonyms N/A Limited to WordNet Gloss Use terms with similar orientation have similar glosses terms without orientation have non-oriented glosses 87.38% SentiWordNet (All word in WordNet) Accuracy depends on the quality of thesaurus This is a summary of development methods of linguistic resource. As shown in this table, gloss use method demonstrated the best level of accuracy in classifying term orientation. Center for E-Business Technology
Sentiment Classification The process of identifying the sentiment – or polarity – of a piece of text or a document. Document-level Sentence-level, phrase-level Feature-level Define target of the opinion and assign the sentiment of the target Document-level Sentiment Classification Method PMI method Machine Learning Method Default Classifiers Enhanced Classifier NLP Combined Method A Two-Step Classification Combining Appraisal Theory Sentiment classification is the process of identifying the sentiment of a piece of text or a document. It can be classified into document-level, sentence-level, phrase-level, or feature-level classification. For feature-level sentiment classification, the target of the opinion is defined and the sentiment of the target is assigned. Now, I’ll introduce three document-level sentiment classification methods. Center for E-Business Technology
PMI Method Turney et al., 2002 Process Experiments Only two-word phrases containing adjectives or adverbs are extracted Semantic orientation of a phrase SO(phrase) = PMI(phrase, “excellent”) – PMI(phrase, “poor”) Semantic orientation is an average semantic orientation of the phrases Experiments 410 reviews from Epinions (epinion.com): 170 positive, 240 negative calculating the PMI of 10,658 phrases from 410 reviews consume about 30 hours Domain of review Accuracy Automobiles 84.00% Movies 65.83% - Honda Accord 83.78% - The Matrix 66.67% - Volkswagen Jetta 84.21% - Pearl Harbor 65.00% Banks 80.00% Travel Destination 70.53% - Bank of America 78.33% - Cancun 64.41% - Washington Mutual 81.67% - Puerto Vallarta 80.56% Turney et al. used PMI method to classify a document according to its polarity. They used only two-word phrases containing adjectives or adverbs. Semantic orientation of a phrase are determined by calculating PMI with the term excellent and poor. And semantic orientation of a document is determined by an average semantic orientation of the phrases. They experimented with four hundreds and ten reviews from epinion dot com. The difference of the classification accuracy within the same domain isn’t large except travel domain. And the difference between automobiles and movies is large. It seems to be caused by the words used in each domain. Center for E-Business Technology
ML - Default Classifier Pang and Lee, 2002 A special case of text categorization with sentiment- rather than topic-based categories Document modeling standard bag-of-features framework Experiments Data : movie reviews (Internet Movie Database), rating -> negative, neutral, positive Naïve Bayes, Maximum Entropy, Support Vector Machine In terms of relative performance, Naïve Bayes tends to do the worst and SVM tends to do the best, although the differences aren’t very large. Features # of features Frequency or presence? NB ME SVM unigrams 16165 freq. 78.7 N/A 72.8 pres. 81.0 80.4 82.9 unigrams+bigrams 32330 80.6 80.8 82.7 bigrams 77.3 77.4 77.1 unigrams+POS 16695 81.5 81.9 adjectives 2633 77.0 77.7 75.1 top 2633 unigrams 80.3 81.4 unigrams+position 22430 80.1 81.6 Pand and Lee considered the document-level sentiment classification as a special case of text categorization with sentiment rather than topic-based categories. They used standard bag-of-features framework, so that a document is expressed as a feature-frequency vector. Three classifier was used. Naïve bayes, maximum entropy and support vector machine. They experimented with movie reviews from Internet Movie Database. They selected only reviews where the author’s rating was expressed either with stars or numerical values. Ratings were automatically extracted and converted into one of three categories: positive, negative, or neutral. In terms of relative performance, Naïve bayes tended to do the worst and SVM tended to do the best, although the differences aren’t very large. Center for E-Business Technology
ML - Using Only Subjective Sentences Pang and Lee, 2004 improved polarity classification by removing objective sentences A subjectivity detector determines whether each sentence is subjective or not Standard subjectivity classifier Subjectivity classifier using proximity relationship The use of subjectivity extracts can improve the polarity classification at least no loss of accuracy. After their experiments of default polarity classifiers, Pang and Lee improved polarity classification by removing objective sentences. A subjectivity detector determines whether each sentence is subjective or not, Only subjective sentences are provided to the default classifier as input. They implemented two subjectivity classifier. One is the standard subjectivity classifier classifies each sentence in isolation. And the other uses proximity relationship between sentences. Their experiment showed that the use of subjectivity extracts can improve the polarity classification at least no loss of accuracy. Center for E-Business Technology
NLP Combined Method – A Two-Step Classification Wilson et al., 2005 A Two-Step Contextual Polarity Classification employ machine learning and 28 linguistic features document polarity : the average polarity of phrases Step 1. Neutral-polar classifier classifies each phrase containing a clue as neutral or polar Step 2. Polarity classifier takes all phrases marked in step 1 as polar and disambiguates their contextual polarity (positive, negative, both, or neutral). 28 Features : were extracted using NLP techniques with a dependency parser 4 Word Features, 8 Modification Features, 11 Structure Features, 3 Sentence Features, 1 Document Feature Experiments Data : Multi-perspective Question Answering (MPQA) Opinion Corpus Wilson et al. combined NLP techniques in document-level sentiment classification. They used a two-step contextual polarity classification employing machine learning techniques and 28 linguistic features. Document polarity is the average polarity of phrases in the document. At step one, neutral-polar classifier classifies each phrase containing a clue as neutral or polar. At step two, polarity classifier takes all polar phrases and disambiguates their contextual polarity. 28 features extracted using NLP techniques with a dependency parser. Wilson et al. experimented with MPQA Corpus. At both neutral-polar classification and polarity classification, linguistic features improved the classification performance. neutral-polar classification (%) polarity classification (%). Features Accuracy Word token 73.6 Word+priorpol 74.2 28 features 75.9 Features Accuracy Word token 61.7 Word+priorpol 63.0 10 features 65.7 Center for E-Business Technology
NLP Combined Method - Combining Appraisal Theory Whitelaw et al., 2005 applied the appraisal theory to the machine learning methods of Pang and Lee Structure of an appraisal An example “not very happy” Experiments a lexicon of 1329 appraisal entities have been produced semi-automatically from 400 seed terms in around twenty man-hours combining attitude type and orientation : accuracy 90.2%. Whitelaw et al. applied appraisal theory to the machine learning methods of Pang and Lee. They defined the structure of an appraisal encompassing sentiment related properties like this figure. The appraisal has four attributes: attitude, graduation, orientation, and polarity. In an example “not very happy”, appraisal of each word is defined and finally it is extended to the final phrase. They constructed semi-automatically a lexicon of over thirteen hundreds appraisal entities. When combining attitude type and orientation, they obtained the best accuracy about ninety percent. Center for E-Business Technology
Sentiment Classification - Summary Method Characteristics Pros Cons PMI Method Use phrase PMI Simple Need not priory polarity dictionary Loss of contextual meaning Slow(Time to get PMI) Machine Learning Method Bag of Words Unigram to bigram or n-gram SVM, NB, MaxEnt Need learning phase NLP Combined Method Based on ML Parsing or Syntactic Analysis Prior polarity to contextual polarity Consider contextual meaning Easily extendible for various purpose Need prior polarity dictionary Syntactic Analysis Overhead I think NLP combined method is the best method for document-level sentiment classification that considers contextual polarity. But it needs a dictionary containing terms whose prior polarity was determined. And also it brings syntactic analysis overhead. Center for E-Business Technology
Extracting and Summarizing Opinion Expression Goal Extract the opinion expression from large reviews and present it with an effective way Tasks Feature Extraction Sentiment classification at the feature-level requires the extraction of features that are the target of opinion words Sentiment Assignment Each feature is usually classified as being either favorable or unfavorable. Visualization Extracted opinion expression are summarized and visualized. Methods Statistical Approaches ReviewSeer (2003) Opinion Observer (2004) Red Opal (2007) NLP-Based Approaches Kanayama System (2004) WebFountain (2005) OPINE (2005) product Summarize Extract Features I investigated six systems whose goal was to extract opinion expressions from large reviews on the Web and present it in an effective way. Usually these systems perform three tasks. Feature extraction, sentiment assignment, and visualization. Sentiment classification at the feature-level requires the extraction of features that are the target of opinion words. And each feature is usually classified as being either favorable or unfavorable. For effective presentation, extracted opinion expression are summarized and visualized. I divided the systems into two groups according to their underlying foundation. And I’ll introduce two representative systems. Assign Sentiment product reviews Center for E-Business Technology
Opinion Observer - Overview Hu and Liu, 2005 Extract and summarize opinion expression from customer reviews on the Web. Only mines the features of the product on which the customers have expressed their opinions and whether the opinion are positive or negative Overall process Review crawling Feature extraction Sentiment assignment Opinion word extraction Opinion orientation identification Summary generation Overall process Opinion Observer extracts and summarizes opinion expression from customer reviews on the Web. It only mines the features of the product on which the customers have expressed their opinions and whether the opinion are positive or negative. This figure shows the overall process of the opinion observer. At first, reviews are crawled from the Web. Then, frequent feature are extracted. Afterward, opinion words are extracted and orientation of the words are identified. During this phase, infrequent features are identified. At last, orientation of opinion sentences are identified and summary is generated. Center for E-Business Technology
Opinion Observer - Tasks Feature Extraction Product features are extracted from the noun or noun phrase by the association miner CBA Compactness pruning, redundancy pruning Sentiment Assignment Opinion sentence : a sentence contains one or more product features and one or more opinion words Adjectives are the only opinion words Prior polarity of adjectives was identified by WordNet expansion methods with seed terms Infrequent features are extracted by using frequent opinion words Polarity of a sentence is assigned as a dominant orientation Extracted form : (product feature, # of positive sentences, # of negative sentences) Experiments Large collection of reviews of 15 electronic products 86.3% recall, 84.0% precision Product features are extracted from the nouns or noun phrases by the association miner, CBA. And compactness pruning and redundancy pruning are applied to remove unlikely feature. In this system, opinion sentence is defined as a sentence that contains one or more product features and one or more opinion words. Adjectives are the only opinion words and prior polarity of them was identified by WordNet expansion methods with seed terms. And infrequent features are extracted by using frequent opinion words. Polarity of a sentence is assigned as a dominant orientation. At last, opinion expression is extracted as the following form. Product feature, the number of positive sentences, and the number of negative sentences. From their experiments with large collection of reviews of fifteen electronic products, eighty six percent recall and eight four percent precision were obtained. Center for E-Business Technology
Opinion Observer - Visualization Features of products are compared by the bar graph Number of positive and negative sentences of each feature are normalized Positive portion The number of positive and negative sentences of each feature are normalized and visualized using bar graph. With this interface, customers can compare several products at the same time. Negative portion Center for E-Business Technology
Web Fountain - Overview Yi et al., 2005 Extracts target features of the sentiment from the various resources and assigns polarity to the features System Architecture Sentiment Miner Analyzes grammatical sentence structures and phrases by using NLP techniques Web Fountain extracts target features of the sentiment from the various resources and assigns polarity to the features. Sentiment Miner of the Web fountain analyzes grammatical sentence structures and phrases by using NLP techniques. Center for E-Business Technology
Web Fountain – Tasks Feature Extraction Sentiment Assignment Candidate features a part-of relationship with the given topic an attribute-of relationship with the given topic. an attribute-of relationship with a known feature of the given topic bBNP (Beginning definite Base Noun Phrase) heuristic is used Select bnp (base noun phrase) that has high likelihood ratio Experiments Precision - digital camera: 97%, music reviews: 100% Sentiment Assignment Parse and traverse with two linguistic resources Sentiment lexicon: define the sentiment polarity of terms Sentiment pattern database: contain the sentiment assignment patterns of predicates Product review Recall 56%, Precision 87% In the system, bBNP heuristic is used to extract candidate features. Candidate features satisfy one of these relationships. Base noun phrases that has high likelihood ratio are selected as features. Their experiment with digital camera and music reviews showed the high precision. In order to assign sentiment to the extracted features, Reviews are parsed and traversed with two linguistic resources. Sentiment lexicon defines the sentiment polarity of terms. And sentiment pattern database contains the sentiment assignment patterns of predicates. Experiment with product review, fifty six percent recall and eight seven percent precision were obtained. Center for E-Business Technology
Web Fountain – Visualization Web interface listing sentiment bearing sentences about a given product Web Fountain has a web interface listing sentiment bearing sentences about a given product. Center for E-Business Technology
Extracting and Summarizing Opinion Expression - Summary System Feature Extraction Sentiment Assignment Visualization Statistical ReviewSeer (2003) N/A probabilistic model Naïve Bayes Accuracy: 85.3% List feature term and it’s score and show sentences contain the feature term Opinion Observer (2004) CBA miner Infrequent feature selection WordNet expansion prior polarity of adjectives graph Recall: 86.3% Precision: 84.0% Red Opal (2007) frequent noun and noun phrase Precision:85% use user’s rating Precision:80% ordered product list by score of each feature the confidence of the scoring NLP-based Kanayama’s system sentiment unit modifying the machine translation framework Recall:43% Precision:89% WebFountain (2005) bBNP heuristics likelihood ratio Precision:97% sentiment lexicon sentiment pattern database Recall:56% Precision:87% listing sentiment bearing sentences of a product OPINE Web PMI Recall:76% Precision:79% Relaxation Labeling Recall:89% Precision:86% This is a summary of six systems that I investigated. ReviewSeer extracts feature terms and have a web interface which lists feature term and it’s score and shows sentences contain the feature term. Opinion Observer uses CBA miner to extract product feature, And uses the prior polarity of adjective to assign sentiment to the features. And it enables the comparison of products with the bar graph interface. RedOpal is a recent system. It uses user’s rating to assign score to extracted feature. It lists products in a descending order based on the score of the selected feature. Kanayama et al. defined the sentiment unit. And modifying the machine translation framework, sentiment units were extracted. High precision was obtained. In WebFountain, bBNP heuristics was used to extract product feature. And sentiment of each feature was assigned using NLP techniques with two linguistic resources. OPINE uses Web PMI to extract product features. And relaxation labeling was used to assign sentiment. Although it is hard to compare the performance of these systems due to the lack of standard test data. But in general, statistical approaches showed higher recall than NLP based approaches. And NLP-based approaches showed higher precision than statistical approaches. Center for E-Business Technology
Discussion OM is a growing research discipline related to various research areas, such as IR, computational linguistics, TC, TS, and DM. Surveyed three topics and summarized it. For Korean OM? There isn’t any published research into the Korean OM. Language differences may impose some limits on the methods used in the OM subtasks. Structural differences between English and Korean may mean that the same heuristics cannot be applied to extract features from text The lack of Korean thesaurus similar to WordNet limits the methods of obtaining the prior polarity of words for the PMI or conjunction methods. Research into Korean OM must be conducted in conjunction with other related areas. The final section of my presentation is a discussion. Opinion mining is a growing research discipline related to various research areas, such as IR, computational linguistics, TC, TS, and DM. I investigated three topics of opinion mining and summarized it. There hasn’t been any published research using the Korean language yet. Language differences may impose some limits on the methods used in the OM subtasks. For example, Structural differences between English and Korean may mean that the same heuristics cannot be applied to extract features from text. The lack of Korean thesaurus similar to WordNet limits the methods of obtaining the prior polarity of words for the PMI or conjunction methods. Because of these reason, I think that research into Korean OM must be conducted in conjunction with other related areas. Center for E-Business Technology
Discussion - Research Map of OM This research map shows the research flow of opinion mining. I introduced four method for developing linguistic resources, and three method for sentiment classification. I investigated six systems to extract and summarize opinion expression from reviews. Statistical approaches shows higher recall and NLP-based approaches higher precision. Center for E-Business Technology
Thank you My presentation is over. Thank you very much. Have you any question? Center for E-Business Technology