Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information.

Similar presentations


Presentation on theme: "CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information."— Presentation transcript:

1 CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information and Computer Science University of California, Irvine

2 Problem Statement ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Classify a given yelp review text into one or more relevant categories

3 Dataset Reviews s Reviews from Food and Restaurant category # Useful votes > 1 Total 10,000 reviews Classification categories Identified categories using sample set of 400 random reviews Refined categories using 200 more reviews Final categories: 5 Food, Ambience, Service, Deals/Discounts Worthiness

4 Data Annotation 10,000 reviews divided into 5 bins (w/ repetition) 6 researchers manually annotated reviews 225 man-hours of work! Discrepancy in 981 ambiguous reviews -- removed from analysis Total 9,019 reviews: split into 80% train and 20% test

5 Features – unigrams/bigrams/trigrams 5 Total 703 textual features 375 unigrams, 208 bigrams, 120 trigrams Frequency Unigrams/bigrams/trigram s

6 Features – User ratings 6 3 nominal features – Good, Moderate, Bad Review starsFeature orBad Moderate orGood

7 Approach ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Reviews can be classified into more than one categories Not a binary classification problem. It is a multi-label classification!

8 Binary classifiers for each category ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} Learns one binary classifier for each category Output is the union of predictions of all binary classifiers Reviews Food Review 11 Review 20 Review 31 Review 40 Reviews Service Review 10 Review 20 Review 30 Review 41 ReviewsAmbience Review 10 Review 21 Review 30 Review 41 Reviews Deals Review 11 Review 21 Review 30 Review 41 Original dataset Transformed datasets

9 Classifier for each subset of categories Categories = {Food, Service, Ambience, Deals} We consider each different “subset of categories” as a single category and learn a multi-class classifier Original dataset ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} FoodServiceAmbienceDeals 1001 FoodServiceAmbienceDeals 0011 Transformed dataset ReviewsCategories Review 1“1001” Review 2“0011” Review 3“1000” Review 4“0111”

10 Ensemble of subset classifiers Train a classifier for predicting only each subset of categories ReviewsCategories FoodServiceAmbienceDeals Review 10110 Review 20101 Review 30001 Review 40111 Review 51010 Review 61110 Review 70111 Classifier 1 for (Food, Service) Classifier 2 for (Food, Ambience) Classifier 3 for (Food, Deals) Classifier 4 for (Service, Ambience) Classifier 5 for (Service, Deals Classifier 6 for (Ambience, Deals) Total 6 classifiers for subset of size of 2 categories – 4C2

11 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 Prediction from (Food, Service) classifier

12 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier

13 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier

14 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier

15 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier

16 Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier

17 Ensemble of classifiers: Prediction Final prediction: Majority vote (>= 2 classifiers) FoodServiceAmbienceDeals 01 11 10 01 01 01 1011 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier Majority vote

18 Evaluation measures Notations: Let (x,Y) be a multi-label example, Y L Let h be a multi-label classifier Let Z = h(x) be the set of labels predicted by h for (x, Y) Precision: Recall:

19 Precision & Recall (Train)

20 Precision & Recall (Test)

21 Observation1: Ensemble gave the best results

22 Observation 2: Data Skew Normalized skew in training data by adding selective data

23 Precision & Recall (w & w/o category normalization)

24

25


Download ppt "CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information."

Similar presentations


Ads by Google