Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Sentiment Analysis

Similar presentations


Presentation on theme: "Introduction to Sentiment Analysis"— Presentation transcript:

1 Introduction to Sentiment Analysis
Workership Introduction to Sentiment Analysis CS 410, UMass Boston April 11, 2019 WORKERSHIP Boston, MA Website:

2 introduction: sentiment analysis
Extracting attitudes within text Converting unstructured information into structured data Part of the larger Natural Language Processing field Topology of Affective States: Scherer (1984) Attitudes: liking, loving, hating, valueing, desiring Emotion: angry, sad, joyful, fearful, ashamed, proud Mood: cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal Stance: friendly, cold, warm, supportive Personality Traits: nervous, anxious, reckless, morose, hostile, envious, jealous

3 introduction: sentiment analysis

4 introduction: SENTIMENT ANALYSIS
Tasks Related to Sentiment Analysis Classification Positive, Negative, Neutral Strength Score on a scale of -1 to 1 Detect the target of the sentiment Classify into more complex affective states Example Text: About 401K plan One problem here is that the company has only two enrollment dates for the 401K plan, January 1 & July 1. In my case, I started working in January so I wasn't eligible for enrollment until July of the following year. I had to wait 18 months before they started contributing. There is no valid reason to only have two enrollment dates. An employee should be enrolled as soon as they have completed their eligibility requirements.  However, a 6% employer contribution is pretty good, but it would be a hell of a lot better if was on total pay and not just base pay.

5 introduction: SENTIMENT ANALYSIS
Tasks Related to Sentiment Analysis Classification Positive, Negative, Neutral Strength Score on a scale of -1 to 1 Detect the target of the sentiment Classify into more complex affective states Example Text: About 401K plan One problem here is that the company has only two enrollment dates for the 401K plan, January 1 & July 1. In my case, I started working in January so I wasn't eligible for enrollment until July of the following year. I had to wait 18 months before they started contributing. There is no valid reason to only have two enrollment dates. An employee should be enrolled as soon as they have completed their eligibility requirements.  However, a 6% employer contribution is pretty good, but it would be a hell of a lot better if was on total pay and not just base pay.

6 introduction: why is sentiment analysis useful?
Data and Scalability People don’t like voting. They like expressing views in their own words Allows for real-time, granular analysis Creation of metrics and comparisons

7 introduction: approaches
Rule Based Approach Lexicon of Positive and Negative Words Manually Crafted Rules based on linguistic expertise

8 introduction: approaches
Machine Learning Approach Supervised Learning Comment Tagged Sentiment The 24 hour rule is the right thing ….. Positive I strongly advise that the contract ……. Negative You are correct in your assertion…. Has anyone given thought to…. If the union is involved in this process …. Neutral …..

9 introduction: approaches
Machine Learning Approach Supervised Learning Features Comment …. ….. Tagged Sentiment The 24 hour rule is the right thing ….. Positive I strongly advise that the contract ……. Negative You are correct in your assertion…. Has anyone given thought to…. If the union is involved in this process …. Neutral Build a prediction model to classify any text into sentiment classes Rule based techniques useful for feature generation

10 Make a predictions based on the training model
introduction: approaches Machine Learning Approach Supervised Learning Features Comment …. ….. Sentiment Prediction Comment that needs to be classified…. ? Make a predictions based on the training model

11 process: SENTIMENT ANALYSIS process
Data Cleaning and Tokenization Feature Extraction Training Testing and Accuracy

12 process: data cleaning and tokenization
Examples of data cleaning HTML Text Cleaning Capitalization Repeated Characters Punctuations, whitespace, stop words Platform specific symbols Negation Regular Expressions are very helpful to search by patterns Break apart the text into tokens

13 process: feature extraction
Rule Based Features Bag of Words, N-grams Dimension Reduction Techniques Word Embeddings and Word2Vec

14 process: feature extraction
Bag of Words Approach Create a document term matrix Columns are all the unique terms in all the comments Each comment becomes a vector of term counts Using TF-IDF (term frequency-inverse document frequency) to weight importance of a word in the collection of documents The columns (all unique terms) act as features Comment I Love Dogs hate …… Tagged Sentiment I like dogs, and dogs are great 1 2 …. Positive I hate dogs Negative …….. …., …..

15 process: feature extraction
N-grams Contiguous sequence of n-tokens Bigrams: I like, like dogs, dogs and…. Trigrams: I like dogs, like dogs and…. N-grams take more word context into account Comment I like Like dogs Dogs and I hate …… Tagged Sentiment I like dogs and dogs are great 1 …. Positive I hate dogs Negative …….. …., …..

16 process: feature extraction
Dimension Reduction Techniques Singular Value Decomposition can help get rid of redundant data by reducing the dimension of the document-feature matrix A = original document-term matrix U = document-concept matrix D = concept strength matrix V = term-concept matrix

17 Latent Semantic Analysis
process: feature extraction Dimension Reduction Techniques Singular Value Decomposition can help get rid of redundant data by reducing the dimension of the document-feature matrix 1 2 4 -0.21 -0.44 -0.87 -0.45 -0.89 7.93 3.17 -0.57 -0.7 Document-Term Document-Concept Concept Strength Concept-Term Latent Semantic Analysis Words used in similar contexts have similar meanings

18 process: feature extraction
Word2Vec Uses a 2 layer neural network to create word vectors Set the hidden layer size to the dimension of the word vectors Train using all of the text in the corpus to predict surrounding words

19 process: feature extraction
Word2Vec Create a word vector space using the weights for the hidden layer Compare similarity between words using the word vectors Use word embedding vector space as features when you train

20 process: training Build a prediction model based on the documents and features Logistics Regression Naïve Bayes Decision Trees Support Vector Machine Neural Network Decisions about which training model to use depends on number of examples, features, computational complexity

21 process: TESTING Apply the model generated by the training process to predict sentiment for any text To compute accuracy, divide tagged dataset into training and test sets Testing Metrics Accuracy: (True Positive + True Negative)/Total Precision: % of selected items that are correct True Positive/(True Positive + False Positive) Recall: % of correct items that are selected True Positive/ (True Positive + False Negative)

22 Big picture: sentiment analysis and Natural language processing
Converting unstructured data to structured data Language Ambiguity Combining linguistics with computational statistics

23 Introduction to Sentiment Analysis
Workership Introduction to Sentiment Analysis CS 410, UMass Boston April 11, 2019 WORKERSHIP Boston, MA Website:


Download ppt "Introduction to Sentiment Analysis"

Similar presentations


Ads by Google