Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.

Sentiment Analysis CMPT 733

Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency (TF-IDF) Word2Vec Skip-gram Model Training Linear Regression Assignment 2 Computing Science/Apala Guha

What is sentiment analysis? Wikipedia: Aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. Examples: Full of zany characters and richly applied satire, and some great plot twists: is this a positive or negative review? Public opinion on the stock market mined from Tweets What do people think about a political candidate or issue? Can we predict election outcomes or market performance from sentiment analysis? Computing Science/Apala Guha

Overview of Approach Running Example: Sentiment analysis in Amazon Reviews Amazon reviews consist of both a text and a rating We learn the relationship between the text content and the rating Computing Science/Apala Guha

Overview of Approach Computing Science/Apala Guha I purchased one of these feom Walmart ……. Review Text Feature / Representation Feature Extraction Linear Regression [1- 5] Score

TF-IDF Term Frequency (TF): the number of times each word appears in a review Review: My small cat loves this carrier. It is very soft inside and it has a small window that my cat can use to look outside. What are the potential problems with this representation? Computing Science/Apala Guha

TF-IDF Raw term frequency will give too much weight to terms used in long reviews We should give equal importance to each review Some words are ubiquitous but without significant meaning These words will receive unnecessary importance Usually common words occur 1-2 orders of magnitude more times than uncommon words We need to suppress less significant, ubiquitous words while enhancing more significant, rare words Computing Science/Apala Guha

TF-IDF tf (term, review) = termFreqInDoc (term) / totalTermsInReview (review) How does this solve the problem of variable-length reviews? idf (term) = log ((totalReviews + 1) / (termFreqInCorpus (term) + 1)) How does this solve the problem of ubiquitous versus rare words? tf-idf (term, review) = tf (term, review) * idf (review) How does this overall reflect the importance of a particular term in a particular review? Computing Science/Apala Guha

TF-IDF Computing Science/Apala Guha TF-IDF

Can you spot any problems with the TF-IDF representation? Computing Science/Apala Guha

TF-IDF Pays no attention to word semantics Words with similar meanings are considered separately Words having different meanings in different contexts are considered to be the same It would be nice to incorporate some word semantics information into the feature representation Computing Science/Apala Guha

Word2Vec Word semantics are based on their context i.e. nearby words. Example: I love having cereal in morning for breakfast. My breakfast is usually jam with butter. The best part of my day is morning’s fresh coffee with a hot breakfast. ‘cereal’, ‘jam’, ‘butter’, and ‘coffee’ are related. We need to represent each word such that similar words have similar representation. Computing Science/Apala Guha

Word2Vec: Skip-gram Insurgents killed in ongoing fighting. Bi-grams = {insurgents killed, killed in, in ongoing, ongoing fighting} 2-skip-bi-grams = {insurgents killed, insurgents in, insurgents ongoing, killed in, killed ongoing, killed fighting, in ongoing, in fighting, ongoing fighting} Tri-grams = {insurgents killed in, killed in ongoing, in ongoing fighting} 2-skip-tri-grams = {insurgents killed in, insurgents killed ongoing, insurgents killed fighting, insurgents in ongoing, insurgents in fighting, insurgents ongoing fighting, killed in ongoing, killed in fighting, killed ongoing fighting, in ongoing fighting}. Computing Science/Apala Guha

Word2Vec: skip-gram Neural network trained on context of each word. Predicts the context given a word. The predicted context is used as the feature representation of a particular word in a review. We need to combine the feature vectors of the words in a review to get the overall feature vector of the review. Computing Science/Apala Guha

Word2Vec: skip-gram Computing Science/Apala Guha 1xV VxN 1xN NxV 1xV V = #distinct words I love for morning cereal

Word2Vec: skip-gram Input layer selects a single word among V words Output layer gives C (size of context) vectors, each of which selects one word among V words A weight matrix W of dimension VxN, transforms input vector into a 1xN vector N can be informally thought of as the number of characteristics of a word The value at each position reflects how strongly a particular characteristic is present. A weight matrix W’ of dimension NxV is associated with each output word vector to transform the projection layer into the output layer. We are seeing which output word at a particular skip position matches best the features of the input word. Computing Science/Apala Guha

Word2Vec: skip-gram Unsupervised learning Semantic representation ‘cat’ will be close to ‘kitten’ Computing Science/Apala Guha

Word2Vec: skip-gram Suggest some ways to combine feature vectors of the words appearing in a review to get the overall feature vector of the review. Computing Science/Apala Guha

Word2Vec: skip-gram Computing Science/Apala Guha My cat loves this Carrier ++++++++++ review r: Average

Word2Vec: skip-gram Computing Science/Apala Guha. cat. kitten. pet. love. like. favor. dog. puppy. pup. doggy. kitty. happy. satisfied. fulfilled x x x x 01001000 00100001

Word2Vec: skip-gram Computing Science/Apala Guha Words: My cat loves this Carrier Word2Vec 0100 0010 1000 0100 0001 ++++++++++ Average Use cluster representation

Word2Vec: skip-gram We need to represent an overall review with a feature vector, not just individual words We could average the feature vectors of the individual words in the review Or we could cluster words in the corpus, and use the degree of presence of different clusters in a review as the feature vector Computing Science/Apala Guha

Model Training Find the relationship between review feature vectors and rating scores Train a linear regression model Use the model to predict the rating of a test review Broader classification such as positive/negative is also possible by using a threshold on the score rating Computing Science/Apala Guha

Assignment 2 TF-IDF representation Train Linear Regression model Train Word2Vec representation Extract average Word2Vec for each review Cluster Word2Vec features Computing Science/Apala Guha

Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.

Similar presentations

Presentation on theme: "Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.

Similar presentations

Presentation on theme: "Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency."— Presentation transcript:

Similar presentations

About project

Feedback