CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information.

CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information and Computer Science University of California, Irvine

Problem Statement ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Classify a given yelp review text into one or more relevant categories

Dataset Reviews s Reviews from Food and Restaurant category # Useful votes > 1 Total 10,000 reviews Classification categories Identified categories using sample set of 400 random reviews Refined categories using 200 more reviews Final categories: 5 Food, Ambience, Service, Deals/Discounts Worthiness

Data Annotation 10,000 reviews divided into 5 bins (w/ repetition) 6 researchers manually annotated reviews 225 man-hours of work! Discrepancy in 981 ambiguous reviews -- removed from analysis Total 9,019 reviews: split into 80% train and 20% test

Features – unigrams/bigrams/trigrams 5 Total 703 textual features 375 unigrams, 208 bigrams, 120 trigrams Frequency Unigrams/bigrams/trigram s

Features – User ratings 6 3 nominal features – Good, Moderate, Bad Review starsFeature orBad Moderate orGood

Approach ReviewCategories FoodServiceAmbienceDiscountWorthiness They have the best happy hours around, the food is good and their service is even better. When its winter, we become regulars. :) 111 Reviews can be classified into more than one categories Not a binary classification problem. It is a multi-label classification!

Binary classifiers for each category ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} Learns one binary classifier for each category Output is the union of predictions of all binary classifiers Reviews Food Review 11 Review 20 Review 31 Review 40 Reviews Service Review 10 Review 20 Review 30 Review 41 ReviewsAmbience Review 10 Review 21 Review 30 Review 41 Reviews Deals Review 11 Review 21 Review 30 Review 41 Original dataset Transformed datasets

Classifier for each subset of categories Categories = {Food, Service, Ambience, Deals} We consider each different “subset of categories” as a single category and learn a multi-class classifier Original dataset ReviewsCategories Review 1{Food, Deals} Review 2{Ambience, Deals} Review 3{Food} Review 4{Service, Ambience, Deals} FoodServiceAmbienceDeals 1001 FoodServiceAmbienceDeals 0011 Transformed dataset ReviewsCategories Review 1“1001” Review 2“0011” Review 3“1000” Review 4“0111”

Ensemble of subset classifiers Train a classifier for predicting only each subset of categories ReviewsCategories FoodServiceAmbienceDeals Review 10110 Review 20101 Review 30001 Review 40111 Review 51010 Review 61110 Review 70111 Classifier 1 for (Food, Service) Classifier 2 for (Food, Ambience) Classifier 3 for (Food, Deals) Classifier 4 for (Service, Ambience) Classifier 5 for (Service, Deals Classifier 6 for (Ambience, Deals) Total 6 classifiers for subset of size of 2 categories – 4C2

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 Prediction from (Food, Service) classifier

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier

Ensemble of classifiers: Prediction Ask each classifier to vote! FoodServiceAmbienceDeals 01 11 10 01 01 01 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier

Ensemble of classifiers: Prediction Final prediction: Majority vote (>= 2 classifiers) FoodServiceAmbienceDeals 01 11 10 01 01 01 1011 Prediction from (Food, Service) classifier Prediction from (Food, Ambience) classifier Prediction from (Food,Deals) classifier Prediction from (Service, Ambience) classifier Prediction from (Service,Deals) classifier Prediction from (Ambience,Deals) classifier Majority vote

Evaluation measures Notations: Let (x,Y) be a multi-label example, Y L Let h be a multi-label classifier Let Z = h(x) be the set of labels predicted by h for (x, Y) Precision: Recall:

Precision & Recall (Train)

Precision & Recall (Test)

Observation1: Ensemble gave the best results

Observation 2: Data Skew Normalized skew in training data by adding selective data

Precision & Recall (w & w/o category normalization)

CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information.

Similar presentations

Presentation on theme: "CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information.

Similar presentations

Presentation on theme: "CS 277 DataMining Project Presentation Instructor: Prof. Dave Newman Team: Hitesh Sajnani, Vaibhav Saini, Kusum Kumar Donald Bren School of Information."— Presentation transcript:

Similar presentations

About project

Feedback