Introduction to Sentiment Analysis

Slides:



Advertisements
Similar presentations
Problem Semi supervised sarcasm identification using SASI
Advertisements

Sentiment Analysis An Overview of Concepts and Selected Techniques.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Three kinds of learning
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Text mining.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Today Ensemble Methods. Recap of the course. Classifier Fusion
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Data Mining and Decision Support
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
Oracle Advanced Analytics
A Simple Approach for Author Profiling in MapReduce
Dimensionality Reduction and Principle Components Analysis
Sentiment Analysis of Twitter Messages Using Word2Vec
A Straightforward Author Profiling Approach in MapReduce
Sentiment analysis algorithms and applications: A survey
Text Mining CSC 600: Data Mining Class 20.
Information Retrieval: Models and Methods
Natural Language Processing (NLP)
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Sentiment Analysis Study
Natural Language Processing of Knee MRI Reports
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Neural Language Model CS246 Junghoo “John” Cho.
Classifying enterprises by economic activity
Text Categorization Assigning documents to a fixed set of categories
An Overview of Concepts and Selected Techniques
Word Embedding Word2Vec.
iSRD Spam Review Detection with Imbalanced Data Distributions
CSCI 5832 Natural Language Processing
Text Mining & Natural Language Processing
Statistical n-gram David ling.
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Text Mining & Natural Language Processing
CS246: Information Retrieval
Text Mining CSC 576: Data Mining.
MTBI Personality Predictor using ML
Information Retrieval
Natural Language Processing (NLP)
Word2Vec.
University of Illinois System in HOO Text Correction Shared Task
Word embeddings (continued)
Introduction to JMP Text Explorer Platform
Text Analytics Solutions with Azure Machine Learning
From Unstructured Text to StructureD Data
Word representations David Kauchak CS158 – Fall 2016.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Extracting Why Text Segment from Web Based on Grammar-gram
Natural Language Processing (NLP)
Presentation transcript:

Introduction to Sentiment Analysis Workership Introduction to Sentiment Analysis CS 410, UMass Boston April 11, 2019 WORKERSHIP Boston, MA Email: abhi@workership.com Website: www.workership.com

introduction: sentiment analysis Extracting attitudes within text Converting unstructured information into structured data Part of the larger Natural Language Processing field Topology of Affective States: Scherer (1984) Attitudes: liking, loving, hating, valueing, desiring Emotion: angry, sad, joyful, fearful, ashamed, proud Mood: cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal Stance: friendly, cold, warm, supportive Personality Traits: nervous, anxious, reckless, morose, hostile, envious, jealous

introduction: sentiment analysis

introduction: SENTIMENT ANALYSIS Tasks Related to Sentiment Analysis Classification Positive, Negative, Neutral Strength Score on a scale of -1 to 1 Detect the target of the sentiment Classify into more complex affective states Example Text: About 401K plan One problem here is that the company has only two enrollment dates for the 401K plan, January 1 & July 1. In my case, I started working in January so I wasn't eligible for enrollment until July of the following year. I had to wait 18 months before they started contributing. There is no valid reason to only have two enrollment dates. An employee should be enrolled as soon as they have completed their eligibility requirements.  However, a 6% employer contribution is pretty good, but it would be a hell of a lot better if was on total pay and not just base pay.

introduction: SENTIMENT ANALYSIS Tasks Related to Sentiment Analysis Classification Positive, Negative, Neutral Strength Score on a scale of -1 to 1 Detect the target of the sentiment Classify into more complex affective states Example Text: About 401K plan One problem here is that the company has only two enrollment dates for the 401K plan, January 1 & July 1. In my case, I started working in January so I wasn't eligible for enrollment until July of the following year. I had to wait 18 months before they started contributing. There is no valid reason to only have two enrollment dates. An employee should be enrolled as soon as they have completed their eligibility requirements.  However, a 6% employer contribution is pretty good, but it would be a hell of a lot better if was on total pay and not just base pay.

introduction: why is sentiment analysis useful? Data and Scalability People don’t like voting. They like expressing views in their own words Allows for real-time, granular analysis Creation of metrics and comparisons

introduction: approaches Rule Based Approach Lexicon of Positive and Negative Words Manually Crafted Rules based on linguistic expertise

introduction: approaches Machine Learning Approach Supervised Learning Comment Tagged Sentiment The 24 hour rule is the right thing ….. Positive I strongly advise that the contract ……. Negative You are correct in your assertion…. Has anyone given thought to…. If the union is involved in this process …. Neutral …..

introduction: approaches Machine Learning Approach Supervised Learning Features Comment …. ….. Tagged Sentiment The 24 hour rule is the right thing ….. Positive I strongly advise that the contract ……. Negative You are correct in your assertion…. Has anyone given thought to…. If the union is involved in this process …. Neutral Build a prediction model to classify any text into sentiment classes Rule based techniques useful for feature generation

Make a predictions based on the training model introduction: approaches Machine Learning Approach Supervised Learning Features Comment …. ….. Sentiment Prediction Comment that needs to be classified…. ? Make a predictions based on the training model

process: SENTIMENT ANALYSIS process Data Cleaning and Tokenization Feature Extraction Training Testing and Accuracy

process: data cleaning and tokenization Examples of data cleaning HTML Text Cleaning Capitalization Repeated Characters Punctuations, whitespace, stop words Platform specific symbols Negation Regular Expressions are very helpful to search by patterns Break apart the text into tokens

process: feature extraction Rule Based Features Bag of Words, N-grams Dimension Reduction Techniques Word Embeddings and Word2Vec

process: feature extraction Bag of Words Approach Create a document term matrix Columns are all the unique terms in all the comments Each comment becomes a vector of term counts Using TF-IDF (term frequency-inverse document frequency) to weight importance of a word in the collection of documents The columns (all unique terms) act as features Comment I Love Dogs hate …… Tagged Sentiment I like dogs, and dogs are great 1 2 …. Positive I hate dogs Negative …….. …., …..

process: feature extraction N-grams Contiguous sequence of n-tokens Bigrams: I like, like dogs, dogs and…. Trigrams: I like dogs, like dogs and…. N-grams take more word context into account Comment I like Like dogs Dogs and I hate …… Tagged Sentiment I like dogs and dogs are great 1 …. Positive I hate dogs Negative …….. …., …..

process: feature extraction Dimension Reduction Techniques Singular Value Decomposition can help get rid of redundant data by reducing the dimension of the document-feature matrix A = original document-term matrix U = document-concept matrix D = concept strength matrix V = term-concept matrix

Latent Semantic Analysis process: feature extraction Dimension Reduction Techniques Singular Value Decomposition can help get rid of redundant data by reducing the dimension of the document-feature matrix 1 2 4 -0.21 -0.44 -0.87 -0.45 -0.89 7.93 3.17 -0.57 -0.7 Document-Term Document-Concept Concept Strength Concept-Term Latent Semantic Analysis Words used in similar contexts have similar meanings

process: feature extraction Word2Vec Uses a 2 layer neural network to create word vectors Set the hidden layer size to the dimension of the word vectors Train using all of the text in the corpus to predict surrounding words

process: feature extraction Word2Vec Create a word vector space using the weights for the hidden layer Compare similarity between words using the word vectors Use word embedding vector space as features when you train

process: training Build a prediction model based on the documents and features Logistics Regression Naïve Bayes Decision Trees Support Vector Machine Neural Network Decisions about which training model to use depends on number of examples, features, computational complexity

process: TESTING Apply the model generated by the training process to predict sentiment for any text To compute accuracy, divide tagged dataset into training and test sets Testing Metrics Accuracy: (True Positive + True Negative)/Total Precision: % of selected items that are correct True Positive/(True Positive + False Positive) Recall: % of correct items that are selected True Positive/ (True Positive + False Negative)

Big picture: sentiment analysis and Natural language processing Converting unstructured data to structured data Language Ambiguity Combining linguistics with computational statistics

Introduction to Sentiment Analysis Workership Introduction to Sentiment Analysis CS 410, UMass Boston April 11, 2019 WORKERSHIP Boston, MA Email: abhi@workership.com Website: www.workership.com