Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.

Slides:



Advertisements
Similar presentations
An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.
Advertisements

ICML 2009 Yisong Yue Thorsten Joachims Cornell University
L3S Research Center University of Hanover Germany
A pre-Weekend Talk on Online Learning TGIF Talk Series Purushottam Kar.
Assessment right from the start using the glossary activity Dr Lisa Schmidt Centre for University Teaching.
Active Appearance Models
Reinforcement Learning
Dialogue Policy Optimisation
Introduction Distance-based Adaptable Similarity Search
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Taming the monster: A fast and simple algorithm for contextual bandits
TECHNOLOGY GUIDE 4: Intelligent Systems
More MR Fingerprinting
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Online Search Evaluation with Interleaving Filip Radlinski Microsoft.
Adapting Deep RankNet for Personalized Search
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Stochastic Routing Routing Area Meeting IETF 82 (Taipei) Nov.15, 2011.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Overview December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
BCS547 Neural Decoding.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
DFT Applications Technology to calculate observables Global properties Spectroscopy DFT Solvers Functional form Functional optimization Estimation of theoretical.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Yi Jiang MS Thesis 1 Yi Jiang Dept. Of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA Array Signal Processing in.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Activity Design Goal: work from problems and opportunities of problem domain to envision new activities.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Tracking parameter optimization
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Feedback-Aware Social Event-Participant Arrangement
Overview of Machine Learning
Interactive Information Retrieval
Presentation transcript:

Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam

Example task: Find best news articles based on user context; optimize click-through rate Example task: Tune ad display parameters (e.g., mainline reserve) to optimize revenue Example task: Improve ranking of QAC to optimize suggestion usage Typical approach: lots of offline tuning + AB testing.

[Kohavi et al. ’09, ‘12] Example: which search interface results in higher revenue?

Image adapted from:

Address key challenge: how to balance exploration and exploitation – explore to learn, exploit to benefit from what has been learned. = Reinforcement learning problem where actions do not affect future states

Example

both arms are promising, higher uncertainty for C Bandit approaches balance exploration and exploitation based on expected payoff and uncertainty.

[Li et al. ‘12]

Contextual bandits [Li et al. ‘12] Example results: Balancing exploration and exploitation is crucial for good results.

1) Balance exploration and exploitation, to ensure continued learning while applying what has been learned 2) Explore in a small action space, but learn in a large contextual space

Illustrated Sutra of Cause and Effect "E innga kyo" by Unknown - Woodblock reproduction, published in 1941 by Sinbi-Shoin Co., Tokyo. Licensed under Public domain via Wikimedia Commons -

Problem: estimate effects of mainline reserve changes. [Bottou et. al ‘13]

controlled experiment counterfactual reasoning

Key idea: estimate what would have happened if a different system (distribution over parameter values) had been used, using importance sampling. Step 1: factorize based on known causal graph This works because: [Bottou et. al ‘13] Step 2: compute estimates using importance sampling Example distributions: [Precup et. al ‘00]

[Bottou et. al ‘13] Counterfactual reasoning allows analysis over a continuous range.

1) Leverage known causal structure and importance sampling to reason about “alternative realities” 2) Bound estimator error to distinguish between uncertainty due to low sample size and exploration coverage

Compare two rankings: 1)Generate interleaved (combined) ranking 2)Observe user clicks 3)Credit clicks to original rankers to infer outcome document 1 document 2 document 3 document 4 document 2 document 3 document 4 document 1 document 2 document 3 document 4 Example: optimize QAC ranking

Dueling bandit gradient descent (DBGD) optimizes a weight vector for weighted- linear combinations of ranking features. current best weight vector sample unit sphere to generate candidate ranker randomly generated candidate feature 1 feature 2 Relative listwise feedback is obtained using interleaving Learning approach [Yue & Joachims ‘09]

generate many candidates and select the most promising one feature 1 feature 2 [Hofmann et al. ’13c] Approach: candidate pre-selection (CPS)

informational click model [Hofmann et al. ’13b, Hofmann et al. ’13c] From earlier work: learning from relative listwise feedback is robust to noise. Here: adding structure further dramatically improves performance.

1) Avoid combinatorial action space by exploring in parameter space 2) Reduce variance using relative feedback 3) Leverage known structures for sample-efficient learning

Contextual bandits Systematic approach to balancing exploration and exploitation; contextual bandits explore in small action space but optimize in large context space. Counterfactual reasoning Leverages causal structure and importance sampling for “what if” analyses. Online learning to rank Avoids combinatorial explosion by exploring and learning in parameter space; uses known ranking structure for sample-efficient learning.

Applications Assess action and solution spaces in a given application, collect and learn from exploration data, increase experimental agility Try this (at home) Try open-source code samples; Living labs challenge allows experimentation with online learning and evaluation methods Challenge: labs.net/challenge/ labs.net/challenge/ Code: /ilps/lerot /ilps/lerot