Download presentation
Presentation is loading. Please wait.
Published byMyles Wiggins Modified over 9 years ago
1
CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information and Computer Science University of California, Irvine
2
Problem Statement ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Classify a given yelp review text into one or more relevant categories
3
Dataset Reviews s Reviews from Food and Restaurant category # Useful votes > 1 Total 10,000 reviews Classification categories Identified categories using sample set of 400 random reviews Refined categories using 200 more reviews Final categories: 5 Food, Ambience, Service, Deals/Discounts Worthiness
4
Data Annotation 10,000 reviews divided into 5 bins (w/ repetition) 6 researchers manually annotated reviews 225 man-hours of work! Discrepancy in 981 ambiguous reviews -- removed from analysis Total 9,019 reviews: split into 80% train and 20% test
5
Features – unigrams/bigrams/trigrams 5 Total 703 textual features 375 unigrams, 208 bigrams, 120 trigrams Frequency Unigrams/bigrams/trigram s
6
Features – User ratings 6 3 nominal features – Good, Moderate, Bad Review starsFeature orBad Moderate orGood
7
Approach ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Reviews can be classified into more than one categories Not a binary classification problem. It is a multi-label classification!
8
Binary classifiers for each category ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} Learns one binary classifier for each category Output is the union of predictions of all binary classifiers Reviews Food Review 11 Review 20 Review 31 Review 40 Reviews Service Review 10 Review 20 Review 30 Review 41 ReviewsAmbience Review 10 Review 21 Review 30 Review 41 Reviews Deals Review 11 Review 21 Review 30 Review 41 Original dataset Transformed datasets
9
Classifier for each subset of categories Categories = {Food, Service, Ambience, Deals} We consider each different “subset of categories” as a single category and learn a multi-class classifier Original dataset ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} FoodServiceAmbienceDeals 1001 FoodServiceAmbienceDeals 0011 Transformed dataset ReviewsCategories Review 1“1001” Review 2“0011” Review 3“1000” Review 4“0111”
10
Ensemble of subset classifiers Train a classifier for predicting only each subset of categories ReviewsCategories FoodServiceAmbienceDeals Review 10110 Review 20101 Review 30001 Review 40111 Review 51010 Review 61110 Review 70111 Classifier 1 for (Food, Service) Classifier 2 for (Food, Ambience) Classifier 3 for (Food, Deals) Classifier 4 for (Service, Ambience) Classifier 5 for (Service, Deals Classifier 6 for (Ambience, Deals) Total 6 classifiers for subset of size of 2 categories – 4C2
11
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 Prediction from (Food, Service) classifier
12
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier
13
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier
14
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier
15
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier
16
Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier
17
Ensemble of classifiers: Prediction Final prediction: Majority vote (>= 2 classifiers) FoodServiceAmbienceDeals 01 11 10 01 01 01 1011 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier Majority vote
18
Evaluation measures Notations: Let (x,Y) be a multi-label example, Y L Let h be a multi-label classifier Let Z = h(x) be the set of labels predicted by h for (x, Y) Precision: Recall:
19
Precision & Recall (Train)
20
Precision & Recall (Test)
21
Observation1: Ensemble gave the best results
22
Observation 2: Data Skew Normalized skew in training data by adding selective data
23
Precision & Recall (w & w/o category normalization)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.