Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.

Slides:



Advertisements
Similar presentations
Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models Wei ChuSeung-Taek Park WWW 2009 Audience Science Yahoo! Labs.
Advertisements

Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.
Design of Experiments Lecture I
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Optimizing search engines using clickthrough data
Prediction with Regression
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
- 1 - Intro to Content Optimization Yury Lifshits. Yahoo! Research Largely based on slides by Bee-Chung Chen, Deepak Agarwal & Pradheep Elango.
Active Learning and Collaborative Filtering
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
1 Traffic Shaping to Optimize Ad Delivery Deepayan Chakrabarti Erik Vee.
Classification and Prediction: Regression Analysis
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Content Recommendation on Y! sites Deepak Agarwal Stanford Info Seminar 17 th Feb, 2012.
ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory WEBKDD 2002 Edmonton,
Multiworld Testing Machine Learning for Contextual Decision-Making.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
B. Trousse, R. Kanawati - JTE : Advanced Services on the Web, Paris 7 may 1999 Broadway: a recommendation computation approach based on user behaviour.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009.
Evolutionary Computing Chapter 12. / 26 Chapter 12: Multiobjective Evolutionary Algorithms Multiobjective optimisation problems (MOP) -Pareto optimality.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Collaborative Deep Learning for Recommender Systems
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Matrix Factorization and Collaborative Filtering
Recommendation in Scholarly Big Data
Evaluation – next steps
Intro to Content Optimization
FRM: Modeling Sponsored Search Log with Full Relational Model
Lecture 5: Leave no relevant data behind: Data Search
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Presentation transcript:

Evaluation Methods and Challenges

2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side experiments on a small fraction of randomly selected traffic with new method (treatment) and status quo (control) –Limitation Often expensive and difficult to test large number of methods Problem: How do we evaluate methods offline on logged data? –Goal: To maximize clicks/revenue and not prediction accuracy on the entire system. Cost of predictive inaccuracy for different instances vary. E.g. 100% error on a low CTR article may not matter much because it always co-occurs with a high CTR article that is predicted accurately

3 Deepak Agarwal & Bee-Chung ICML’11 Usual Metrics Predictive accuracy –Root Mean Squared Error (RMSE) –Mean Absolute Error (MAE) –Area under the Curve, ROC Other rank based measures based on retrieval accuracy for top-k –Recall in test data What Fraction of items that user actually liked in the test data were among the top-k recommended by the algorithm (fraction of hits, e.g. Karypsis, CIKM 2001) One flaw in several papers –Training and test split are not based on time. Information leakage Even in Netflix, this is the case to some extent –Time split per user, not per event. For instance, information may leak if models are based on user-user similarity.

4 Deepak Agarwal & Bee-Chung ICML’11 Metrics continued.. Recall per event based on Replay-Match method –Fraction of clicked events where the top recommended item matches the clicked one. This is good if logged data collected from a randomized serving scheme, with biased data this could be a problem –We will be inventing algorithms that provide recommendations that are similar to the current one No reward for novel recommendations

5 Deepak Agarwal & Bee-Chung ICML’11 Details on Replay-Match method (Li, Langford, et al) x: feature vector for a visit r = [r 1,r 2,…,r K ]: reward vector for the K items in inventory h(x): recommendation algorithm to be evaluated Goal: Estimate expected reward for h(x) s(x): recommendation scheme that generated logged-data x 1,..,x T : visits in the logged data r ti : reward for visit t, where i = s(x t )

6 Deepak Agarwal & Bee-Chung ICML’11 Replay-Match continued Estimator If importance weights and –It can be shown estimator is unbiased E.g. if s(x) is random serving scheme, importance weights are uniform over the item set If s(x) is not random, importance weights have to be estimated through a model

7 Deepak Agarwal & Bee-Chung ICML’11 Back to Multi-Objective Optimization Recommender EDITORIAL content Clicks on FP links influence downstream supply distribution AD SERVER PREMIUM display (GUARANTEED) Spot Market (Cheaper) Downstream engagement (Time spent)

8 Deepak Agarwal & Bee-Chung ICML’11 Serving Content on Front Page: Click Shaping What do we want to optimize? Current: Maximize clicks (maximize downstream supply from FP) But consider the following –Article 1: CTR=5%, utility per click = 5 –Article 2: CTR=4.9%, utility per click=10 By promoting 2, we lose 1 click/100 visits, gain 5 utils If we do this for a large number of visits --- lose some clicks but obtain significant gains in utility? –E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement, etc)

9 Deepak Agarwal & Bee-Chung ICML’11 Why call it Click Shaping? Supply distribution Changes BEFORE AFTER SHAPING can happen with respect to any downstream metrics (like engagement)

10 Deepak Agarwal & Bee-Chung ICML’11 10 Multi-Objective Optimization A 1 A 2 A n n articles K properties news finance omg … … S 1 S 2 S m m user segments … CTR of user segment i on article j: p ij Time duration of i on j: d ij known p ij, d ij x ij : variables

11 Deepak Agarwal & Bee-Chung ICML’11 11 Multi-Objective Program  Scalarization Goal Programming Simplex constraints on x iJ is always applied Constraints are linear Every 10 mins, solve x Use this x as the serving scheme in the next 10 mins

12 Deepak Agarwal & Bee-Chung ICML’11 Pareto-optimal solution (more in KDD 2011) 12

13 Deepak Agarwal & Bee-Chung ICML’11 Summary Modern recommendation systems on the web crucially depend on extracting intelligence from massive amounts of data collected on a routine basis Lots of data and processing power not enough, the number of things we need to learn grows with data size Extracting grouping structures at coarser resolutions based on similarity (correlations) is important –ML has a big role to play here Continuous and adaptive experimentation in a judicious manner crucial to maximize performance –Again, ML has a big role to play Multi-objective optimization is often required, the objectives are application dependent. –ML has to work in close collaboration with engineering, product & business execs

Challenges

15 Deepak Agarwal & Bee-Chung ICML’11 Recall: Some examples Simple version –I have an important module on my page, content inventory is obtained from a third party source which is further refined through editorial oversight. Can I algorithmically recommend content on this module? I want to drive up total CTR on this module More advanced –I got X% lift in CTR. But I have additional information on other downstream utilities (e.g. dwell time). Can I increase downstream utility without losing too many clicks? Highly advanced –There are multiple modules running on my website. How do I take a holistic approach and perform a simultaneous optimization?

16 Deepak Agarwal & Bee-Chung ICML’11 For the simple version Multi-position optimization –Explore/exploit, optimal subset selection Explore/Exploit strategies for large content pool and high dimensional problems –Some work on hierarchical bandits but more needs to be done Constructing user profiles from multiple sources with less than full coverage –Couple of papers at KDD 2011 Content understanding Metrics to measure user engagement (other than CTR)

17 Deepak Agarwal & Bee-Chung ICML’11 Other problems Whole page optimization –Incorporating correlations Incentivizing User generated content Incorporating Social information for better recommendation Multi-context Learning