Download presentation
Presentation is loading. Please wait.
Published byRhoda Hopkins Modified over 9 years ago
1
Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer Science at Unversity of Texas at El Paso 1
2
Overall Process Extract Reviews Pre-process data Sentiment Model Restaurants Grouping Terms Analysis 2
3
Extract Reviews Reviews Dataset was filtered Using category feature Searched "Restaurants" and extract business ids Extracted reviews with the same business ids Created polar target remove three star reviews one and two stars are negative four and five stars are positive 3
4
Extract Reviews Dataset was unbalance 20 % were negative 80% were positive Selected even number of examples Extracted dates as well for each example 4
5
Pre-Process Data Removed Stop words- Except descriptive nouns and negatives Nonsensical words- Except common slang words Punctuation and numbers Hyperlinks and invalid inputs Spelling Corrector Stemming All words were converted into lower case 5
6
Pre-Process Data Use symbols to represents words Negative words "~" for example: not great = ~great All caps words " ! " for example HATE = !hate Used bigrams to separate terms example: "service slow food nasty no so great " "service slow" "slow food" "food nasty" "nasty ~so" "~so great" 6
7
Sentiment Model Naive Bayes Classifier Class (negative and positive) 7
8
Sentiment Model NBSVM (Naive Bayes Support Vector Machine) Have not been run for Yelp dataset Matlab implementation available online [4] Feature Vector Author's SVM model 8
9
Sentiment Evaluation Results 10-fold evaluation 9
10
Restaurants Grouping K-Means K=2 Attributes Sentiment Overall Using probabilities of 20000 examples Number of days since business open Average star ratings Cluster 100 business Consist of ~4,ooo reviews 10
11
Clustering Results 11
12
References [1] San Francisco Restaurants, Dentists, Bars, Beauty Salons, Doctors. (n.d.). Retrieved April 2, 2015, from http://www.yelp.com/http://www.yelp.com/ [2] Naive Bayes Text Classification Book Chapter, Stanford [3] Luca, M. (2011). Reviews, reputation, and revenue: The case of Yelp.com.com (September 16, 2011). Havard Business School NOM Unit Working Paper, (12-016). [4] Wang, Sida, and Christopher D. Manning. "Baselines and bigrams: Simple, good sentiment and topic classification." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 2012.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.