Download presentation
Presentation is loading. Please wait.
Published byMitchell Gallagher Modified over 9 years ago
1
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research & INRIA
2
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 2
3
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Medical Diagnosis 3 HEALTHCARE AGENT ACTION - medical decisions (more tests, more hospitalization) STATES - patient’s test results - patient’s record. COST - Cost for hospital - patient’s unhappiness what is the optimal diagnosis???
4
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Computational Finance 4 FINANCIAL ANALYST AGENT ACTION - buy - Sell STATES - market indicators - value of the portfolio. COST - loss what is the optimal financial strategy???
5
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 5
6
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. LTV vs. CTR 6
7
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation 7 MANAGE R Disaster, Bankruptc y, Death… Dat a Machine Learning Algorith m baseline performance Confidence level
8
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Different Approaches to this Problem 1. Model-free Approach 2. Model-based Approach 3. Online Approach 4. Risk-sensitive Approach 8
9
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 9 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?
10
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 10 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy
11
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 11 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level
12
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 12 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level Risk Plot
13
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Main Ingredients 13 Yes / No historical data new polic y baseline performance confidenc e level Risk Plot Weighted Importance Return High Probability Lower-Bound
14
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results 14 Personalized ad recommendation (real data)
15
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges finding a tight bound knowing the policy(ies) that generated the batch of data (using random data) requires a large number of samples makes it suitable for digital marketing applications sacrifice safety for sample efficiency 15 Contextual Bandit (CTR) line of work from Yahoo research (Langford, Strehl, Li, Dudik, Kakade, …) Reinforcement Learning (LTV) four papers at AAAI-15, WWW-15, ICML-15, IJCAI-15 (Thomas, Theocharous, MGH)
16
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 16 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?
17
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-based Approach 17 Build a Simulato r of the System MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy Comput e Directly from the Simulat or error Main Question: Given the simulator and the error in building it, how to compute a policy that is guaranteed (with a given confidence level) to perform at least as well as a baseline???
18
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation True model of the system Simulator Set of feasible models Baseline performance Baseline policy 18
19
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Standard Optimization (ideal scenario) If then 19
20
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization return 20 solvable using robust value iteration solution is conservative
21
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization with a Baseline computationally more expensive (NP hard), but less conservative solution non polynomial solution exists good approximations exist (still with less conservative solution than robust) 21 in preparation – a preliminary version at a NIPS-2014 workshop (Joint work with Yinlam Chow & Marek Petrik)
22
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges Solving the optimization problem Solving the optimization problem efficiently (good approximation solutions) Scaling up the optimization problem Building a good simulator of the system difficult in digital marketing more suitable in robotics & domains with more prior knowledge more suitable for the domains in which we cannot generate a large amount of data more suitable in robotics & domains with more prior knowledge Constructing error bounds for the learned simulator 22
23
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Online Approach 23 loss of handling of the traffic by instead of ? MANAGE R data traffic Our policy Company’ s policy Company’ s policy if loss we have some preliminary results in the bandit setting
24
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach 24 Policy Trajectory 1 Trajectory 3 Trajectory 4 Trajectory 2 Return
25
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach 25 0.1 CVaR 0.1
26
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges Construct conceptually meaningful and computationally tractable risk-sensitive criteria Optimal solution of risk-sensitive criteria is often not a stationary Markovian policy Finding the best solution among the stationary Markovian policies is difficult Policy gradient optimization for a wide range of risk-sensitive criteria 26
27
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results Traffic Signal Control (mean-var optimization) 27
28
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results American Option Pricing (mean-CVaR optimization) 28 Distribution of Histogram of
29
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results American Option Pricing (mean-CVaR optimization) 29 Tail of Papers at NIPS-2013, NIPS-2014, NIPS-2015 (joint work with Prashanth L.A., Yinlam Chow, Aviv Tamar, Shie Mannor)
30
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.