A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu
Agenda Overview Objective The Proposed Method Experiment and Result Conclusion
Overview Online consumers can use various Web 2.0 mediums like message forums, blogs or reviews sites to express their opinion and access opinions expressed by others Research has confirmed that online reviews by consumers can be a good proxy for word-of-mouth, and can ultimately influence the purchase decision-making of other potential buyers, who explore the Internet for product related information
Overview Sentiment Analysis or opinion mining is a recent research area where we attempt to classify opinionated texts according to their polarity (positive or negative). Different solutions have been proposed for sentiment analysis of online text data, including those based on machine learning, dictionary, statistical, and semantic approaches
Objective In this paper, we have proposed a sentiment classification model using back-propagation artificial neural network (BPANN). Information Gain and three popular sentiment lexicons have been used to extract sentiment representing features. The proposed approach exploits classification performance of BPANN and utilizes domain knowledge from the sentiment lexicons for document-level sentiment analysis
Need of BPANN for Sentiment Analysis Machine learning techniques have performed better in terms of accuracy than semantic and lexicon based methods of sentiment analysis However, the performance of machine learning approaches is heavily dependent on the selected features, the quality and quantity of training data and the domain of the dataset
Need of BPANN for Sentiment Analysis The additional learning time required by the machine learning techniques is also a prominent issue, since lexicon or semantic based approaches do not need time for training. The additional learning time required by the machine learning techniques is also a prominent issue, since lexicon or semantic based approaches do not need time for training
The Proposed Method The opinionated text documents are collected and then, preprocessed. The Vector Space Model (VSM) is utilized in order to generate the bag of words representation for each document Stemming is done to reduce words to their basic root or stem Stop words are discarded, but we have consciously preserved some useful sentiment expressing terms such as “ok” and “not”.
The Proposed Method To compute a numerical representation from user-generated opinionated text data, the residual tokens are arranged as per their frequencies or occurrences in whole documents set. Information gain based feature selection is effective for sentiment based text classification. Hence, the top n-ranked features are selected based upon information gain feature selection and the most frequent words in sentiment lexicons.
Information Gain Feature Selection Information gain (IG) is a feature goodness criterion and has performed well for sentiment-feature selection. A feature is selected based on its impact on decreasing overall entropy. The attributes ranked high as per IG score will minimize the overall information necessary to classify instances into predefined classes
Information Gain Feature Selection The information gain of a feature w, over all classes is given by P(ci): probability that a random instance document belongs to class ci. P(w): probability of the occurrence of the feature w in a randomly selected document. P(ci |w): probability that a randomly selected document belongs to class ci if document has the feature w.
Back-Propagation Artificial Neural Network (BPANN) Artificial Neural network (ANN) is a classifier that can be adopted for linear and non-linear text categorization problems. Advantages: adaptive learning, parallelism, pattern learning, sequence recognition, fault tolerance, and generalization. ANNs can be feed-forward or feedback networks
Back-Propagation Artificial Neural Network (BPANN) Among the various training algorithms of ANN, error back propagation (BP) is the best known algorithm BP is an iterative gradient algorithm proposed to minimize the mean square error (MSE) which is MSE a measure of difference between actual and desired output of a multilayer feed-forward neural network
Back-Propagation Artificial Neural Network (BPANN) The BP algorithm works in two passes: Forward pass obtains the inputs and activation value Backward pass is performed to adjust the weights of network nodes and minimize the MSE. These two passes will repeat iteratively until the network converges.
Back-Propagation Artificial Neural Network (BPANN)
BPANN can be represented by a network diagram formed by connected nodes using directed edges and arranged in different layers. Each layer of BPANN consists of processing elements. All neurons interact though weighted connections. The associated weight value with each directed link is determined by minimizing a global error function through backward error propagation in a gradient descent learning process.
Back-Propagation Artificial Neural Network (BPANN) In forward pass, a neuron computes a weighted sum of the sample inputs applied and then perform an activation function to this sum to calculate its output. The input signal will gradually progress through each layer using the non- linear function. The sigmoid transfer function is a popular activation function During training, the weights of each layer‘s neurons are modified to minimize the global error
Experimental Evaluation Movie reviews dataset 1,000 positive and 1,000 negative reviews Hotel reviews dataset 501 positive and 501 negative user-generated reviews
Experimental Evaluation Three popular lexicons (sentiment dictionaries) have been used in this study. HM dataset: 1336 adjectives: 657 positive and 679 negative GI dataset: 3596 adjectives, adverbs, nouns, and verbs, out of which 1614 are positive and 1982 are negative Opinion Lexicon: 2006 positive and 4783 negative words
Performance Evaluation Overall accuracy (OA) as performance evaluation metrics Precision and Recall
Experimental Design and Results This study has used feed forward BPANN with single hidden layer. The number of input nodes in input layer was set between 50 and 1000 as in prior research studies The hidden nodes were kept as 15 and output nodes were set at 2, for binary classification. Use 500 iterations with learning rate as 0.01 and momentum as 0.8 for BPANN Use 66% of the dataset for training and 33% for testing the BPANN
Experimental Design and Results The BPANN using gradient descent method often fails to converge. This study involved selection of model parameter by training the candidate model multiple times (3 and more) by initializing with different set of randomly generated weights. Thus, repeated model training with setting random initial weights is performed to overcome the non-convergence problem. Restricted iterations as early stopping procedure to avoid risk of overfitting
Movie Result
Hotel Result
Results The model complexity of BPANN is controlled by restricting the number of hidden layers (usually, just one) and number of neurons in each hidden layer Selecting more than 1000 features based on sentiment lexicons and information gain did not lead to improvement in classification results. BPANN can be sensitive to total number of training and test instances provided when experiment with different sized datasets
Conclusion The results clearly indicate that BPANN is suitable for sentiment based classification Information Gain (IG) has out-performed lexicon based feature selection and succeeded in reducing the dimensionality