Download presentation
Presentation is loading. Please wait.
Published byAubrie Davidson Modified over 8 years ago
1
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software Engineer HP Big Data Business Unit #GHC14 2014
2
What to Expect Sentiment Analysis −What is it? −Why is it interesting? −How HP Vertica Pulse works −Achieving greater accuracy −Different point of view using the most- mentioned word tree
3
2014 What I Expect A 5-star rating on GHC app I just expect you to enjoy and learn!
4
2014 Sentiment Analysis In plain English −the process of automatically detecting if a text segment contains emotional or opinionated content and determining its polarity (e.g., “thumbs up” or “thumbs down”), is a field of research that has received significant attention in recent years, both in academia and in industry. [Wright, 2009]
5
2014 Gimme Examples! Also known as: −Opinion Mining −Text Mining Determine people’s general opinion −“I just got a new car, and I’m loving it ” −“My new car isn’t as fast as I thought.”
6
2014 Why are we interested? Increasing(every minute!) web usage −Articles −Blogs −Comments Power of Social Media −Online Shopping −Customer Reviews −Recommended products on Amazon −How other people feel about the product
7
2014 Product Review
8
2014 Data… Data… Data…
9
2014 HP Vertica Pulse
10
2014 How to Analyze? Lexicon-based approach – HP Labs [Zhang et. al. 2011] Choose a product, person, event, organization, or topic [Hu and Liu, 2004] to analyze the opinion Determine the Semantic Orientation score of opinion lexicons WordSemantic Orientation Value Fabulous+3 Good+1 Bad Nasty-3
11
2014 Sentiment Scoring Input: text or sentence Output: For each attribute or entity, generates a sentiment score ranging from -1 to 1 −-1: Negative sentiment − 0: Neutral sentiment − 1: Positive sentiment Entity-level lexicon-based sentiment scoring
12
2014 Limitation Semantic Orientation value(‘missed’) = -1 Gives more weight to the closely located word Accuracy can suffer
13
2014 Improve accuracy Accuracy is what we strive for! More robust pre-processing −Prune data to fit for different types of user opinion (e.g. Twitter vs. YouTube comments) Naïve Bayes Classifier Training Tune accordingly
14
2014 Data Set Test dataset −Stanford students collected −In 2009 −Over 3 million tweets with tested score −Analyzed 3500 tweets Collected dataset −HP Vertica Pulse Twitter Connector −In 2014 −Total of 1.2 million tweets
15
2014 Data Pruning Remove −Job postings #job, #jobs, #tweetmyjob −Links http://this.is/nogood −Duplicates −Twitter specific characters RT, @, # −Emoticons I hate my life :-), sarcasm is wide-spread disease After pruning −~287000 tweets, 24% of the 1.2 million tweets
16
2014 Naïve Bayes Classifier
17
2014 Naïve Bayes Classifier Results: −Final accuracy : 0.788
18
2014 Tuning Pulse Positive words Negative words Neutral words White lists Stop words Synonym mappings
19
2014 Accuracy Comparison Sentiment scores generated for each phase
20
2014 Trend/Targeted Analysis Targeted dataset analysis can help improve accuracy Identify the most-mentioned words −Use the most-recurrent words to narrow the scope of analysis Find new trends −Government healthcare (2009) vs. Obamacare (2014) Are we looking at the targeted data? −“Solve healthcare challenges with technology!” −“Healthcare After ObamaCare” −“Get affordable healthcare at HealthCare.gov”
21
2014 Generating Tree Increase the relevancy of sentiment score by running the sentiment analysis on the entity, as well as on the most-recurrent words to identify: −Homonyms that machines do not understand −More accurate scores based on user interest Generate tree using Text Search −Merge stemmer words e.g. query, queries, querying… −Lucene - apache open source
22
2014 Tree View healthcare obamacare !(Obamacare) obama !(Obama) !(health) health
23
2014 Thank you Questions? bohyun@hp.com bohyun.j.kim@gmail.com Many thanks to*: Tim Donar, Solution Engineer Beth Favini, Tech Pubs Sr. Manager Judith Plummer, Tech Pubs Editor in Chief * In alphabetical order
24
2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.