Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research.

Similar presentations


Presentation on theme: "© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research."— Presentation transcript:

1 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research & INRIA

2 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 2

3 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Medical Diagnosis 3 HEALTHCARE AGENT ACTION - medical decisions (more tests, more hospitalization) STATES - patient’s test results - patient’s record. COST - Cost for hospital - patient’s unhappiness what is the optimal diagnosis???

4 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Computational Finance 4 FINANCIAL ANALYST AGENT ACTION - buy - Sell STATES - market indicators - value of the portfolio. COST - loss what is the optimal financial strategy???

5 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 5

6 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. LTV vs. CTR 6

7 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation 7 MANAGE R Disaster, Bankruptc y, Death… Dat a Machine Learning Algorith m baseline performance Confidence level

8 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Different Approaches to this Problem 1. Model-free Approach 2. Model-based Approach 3. Online Approach 4. Risk-sensitive Approach 8

9 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 9 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?

10 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 10 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy

11 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 11 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level

12 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 12 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level Risk Plot

13 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Main Ingredients 13 Yes / No historical data new polic y baseline performance confidenc e level Risk Plot Weighted Importance Return High Probability Lower-Bound

14 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results 14 Personalized ad recommendation (real data)

15 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  finding a tight bound  knowing the policy(ies) that generated the batch of data (using random data)  requires a large number of samples  makes it suitable for digital marketing applications  sacrifice safety for sample efficiency 15  Contextual Bandit (CTR)  line of work from Yahoo research (Langford, Strehl, Li, Dudik, Kakade, …)  Reinforcement Learning (LTV)  four papers at AAAI-15, WWW-15, ICML-15, IJCAI-15 (Thomas, Theocharous, MGH)

16 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 16 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?

17 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-based Approach 17 Build a Simulato r of the System MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy Comput e Directly from the Simulat or error Main Question: Given the simulator and the error in building it, how to compute a policy that is guaranteed (with a given confidence level) to perform at least as well as a baseline???

18 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation  True model of the system  Simulator  Set of feasible models  Baseline performance  Baseline policy 18

19 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Standard Optimization (ideal scenario) If then 19

20 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization return 20  solvable using robust value iteration  solution is conservative

21 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization with a Baseline  computationally more expensive (NP hard), but less conservative solution  non polynomial solution exists  good approximations exist (still with less conservative solution than robust) 21 in preparation – a preliminary version at a NIPS-2014 workshop (Joint work with Yinlam Chow & Marek Petrik)

22 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  Solving the optimization problem  Solving the optimization problem efficiently (good approximation solutions)  Scaling up the optimization problem  Building a good simulator of the system  difficult in digital marketing  more suitable in robotics & domains with more prior knowledge  more suitable for the domains in which we cannot generate a large amount of data  more suitable in robotics & domains with more prior knowledge  Constructing error bounds for the learned simulator 22

23 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Online Approach 23 loss of handling of the traffic by instead of ? MANAGE R data traffic Our policy Company’ s policy Company’ s policy if loss we have some preliminary results in the bandit setting

24 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach 24 Policy Trajectory 1 Trajectory 3 Trajectory 4 Trajectory 2 Return

25 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach 25 0.1 CVaR 0.1

26 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  Construct conceptually meaningful and computationally tractable risk-sensitive criteria  Optimal solution of risk-sensitive criteria is often not a stationary Markovian policy  Finding the best solution among the stationary Markovian policies is difficult  Policy gradient optimization for a wide range of risk-sensitive criteria 26

27 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  Traffic Signal Control (mean-var optimization) 27

28 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  American Option Pricing (mean-CVaR optimization) 28 Distribution of Histogram of

29 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  American Option Pricing (mean-CVaR optimization) 29 Tail of Papers at NIPS-2013, NIPS-2014, NIPS-2015 (joint work with Prashanth L.A., Yinlam Chow, Aviv Tamar, Shie Mannor)

30 © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.


Download ppt "© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research."

Similar presentations


Ads by Google