© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Dialogue Policy Optimisation
Partially Observable Markov Decision Process (POMDP)
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
A Prior-Free Revenue Maximizing Auction for Secondary Spectrum Access Ajay Gopinathan and Zongpeng Li IEEE INFOCOM 2011, Shanghai, China.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Scenario Optimization. Financial Optimization and Risk Management Professor Alexei A. Gaivoronski Contents Introduction Mean absolute deviation models.
Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
© 2014 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Philip S. Thomas | Summer Intern, Adobe Research Safe Policy Search for Digital.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Chapter 8 Estimating Single Population Parameters
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
How to prepare yourself for a Quants job in the financial market?   Strong knowledge of option pricing theory (quantitative models for pricing and hedging)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Grant CMMI Service Enterprise Systems Robust Portfolio Management with Uncertain Rates of Return PI: Aurélie Thiele – Lehigh University PhD student:
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Mr. Perminous KAHOME, University of Nairobi, Nairobi, Kenya. Dr. Elisha T.O. OPIYO, SCI, University of Nairobi, Nairobi, Kenya. Prof. William OKELLO-ODONGO,
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.
Decision and Risk Analysis Financial Modelling & Risk Analysis Kiriakos Vlahos Spring 2000.
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Customers Value Optimization in Digital Marketing Georgios Theocharous (Adobe.
Dr. István Fekete: The Role of Integrated Risk Management in Organizations April11th, Budapest.
Fast Spectrum Allocation in Coordinated Dynamic Spectrum Access Based Cellular Networks Anand Prabhu Subramanian*, Himanshu Gupta*,
INVESTMENTS | BODIE, KANE, MARCUS Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin CHAPTER 27 The Theory of Active.
Alternative Measures of Risk. The Optimal Risk Measure Desirable Properties for Risk Measure A risk measure maps the whole distribution of one dollar.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
SAAC Review Michael Schilmoeller Tuesday February 2, 2011 SAAC.
LECTURE 22 VAR 1. Methods of calculating VAR (Cont.) Correlation method is conceptually simple and easy to apply; it only requires the mean returns and.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
1 Value at Risk Chapter The Question Being Asked in VaR “What loss level is such that we are X % confident it will not be exceeded in N business.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Electricity markets, perfect competition and energy shortage risks Andy Philpott Electric Power Optimization Centre University of.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
Simulation is the process of studying the behavior of a real system by using a model that replicates the system under different scenarios. A simulation.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
12.3 Efficient Diversification with Many Assets We have considered –Investments with a single risky, and a single riskless, security –Investments where.
Primbs1 Receding Horizon Control for Constrained Portfolio Optimization James A. Primbs Management Science and Engineering Stanford University (with Chang.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 16 Decision Analysis.
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Inference for a Single Population Proportion (p)
Figure 5: Change in Blackjack Posterior Distributions over Time.
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
The Theory of Active Portfolio Management
Financial Econometrics Lecture Notes 4
Reinforcement Learning
6 Normal Curves and Sampling Distributions
"Playing Atari with deep reinforcement learning."
 Real-Time Scheduling via Reinforcement Learning
Hierarchical POMDP Solutions
Real-Time Motion Planning
“Hard” Optimization Problems
Decision and Risk Analysis
 Real-Time Scheduling via Reinforcement Learning
SIMULATION IN THE FINANCE INDUSTRY BY HARESH JANI
Presentation transcript:

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Learning Safe Strategies in Digital Marketing Mohammad Ghavamzadeh Adobe research & INRIA

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 2

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Medical Diagnosis 3 HEALTHCARE AGENT ACTION - medical decisions (more tests, more hospitalization) STATES - patient’s test results - patient’s record. COST - Cost for hospital - patient’s unhappiness what is the optimal diagnosis???

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Computational Finance 4 FINANCIAL ANALYST AGENT ACTION - buy - Sell STATES - market indicators - value of the portfolio. COST - loss what is the optimal financial strategy???

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Personalized Ad Recommendation 5

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. LTV vs. CTR 6

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation 7 MANAGE R Disaster, Bankruptc y, Death… Dat a Machine Learning Algorith m baseline performance Confidence level

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Different Approaches to this Problem 1. Model-free Approach 2. Model-based Approach 3. Online Approach 4. Risk-sensitive Approach 8

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 9 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 10 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 11 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-free Approach 12 Compute Directly from the Batch of Data MANAGE R Disaster, Bankruptcy, Death… Batch of Data baseline performance Confidence level behavior policy Yes / No historical data new policy baseline performance confidence level Risk Plot

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Main Ingredients 13 Yes / No historical data new polic y baseline performance confidenc e level Risk Plot Weighted Importance Return High Probability Lower-Bound

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results 14 Personalized ad recommendation (real data)

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  finding a tight bound  knowing the policy(ies) that generated the batch of data (using random data)  requires a large number of samples  makes it suitable for digital marketing applications  sacrifice safety for sample efficiency 15  Contextual Bandit (CTR)  line of work from Yahoo research (Langford, Strehl, Li, Dudik, Kakade, …)  Reinforcement Learning (LTV)  four papers at AAAI-15, WWW-15, ICML-15, IJCAI-15 (Thomas, Theocharous, MGH)

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Offline Setting 16 MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy ? ?

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Model-based Approach 17 Build a Simulato r of the System MANAGE R Disaster, Bankruptc y, Death… Batc h of Data baseline performance Confidence level behavior policy Comput e Directly from the Simulat or error Main Question: Given the simulator and the error in building it, how to compute a policy that is guaranteed (with a given confidence level) to perform at least as well as a baseline???

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Problem Formulation  True model of the system  Simulator  Set of feasible models  Baseline performance  Baseline policy 18

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Standard Optimization (ideal scenario) If then 19

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization return 20  solvable using robust value iteration  solution is conservative

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Robust Optimization with a Baseline  computationally more expensive (NP hard), but less conservative solution  non polynomial solution exists  good approximations exist (still with less conservative solution than robust) 21 in preparation – a preliminary version at a NIPS-2014 workshop (Joint work with Yinlam Chow & Marek Petrik)

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  Solving the optimization problem  Solving the optimization problem efficiently (good approximation solutions)  Scaling up the optimization problem  Building a good simulator of the system  difficult in digital marketing  more suitable in robotics & domains with more prior knowledge  more suitable for the domains in which we cannot generate a large amount of data  more suitable in robotics & domains with more prior knowledge  Constructing error bounds for the learned simulator 22

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Online Approach 23 loss of handling of the traffic by instead of ? MANAGE R data traffic Our policy Company’ s policy Company’ s policy if loss we have some preliminary results in the bandit setting

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach 24 Policy Trajectory 1 Trajectory 3 Trajectory 4 Trajectory 2 Return

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Risk-Sensitive Approach CVaR 0.1

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Challenges  Construct conceptually meaningful and computationally tractable risk-sensitive criteria  Optimal solution of risk-sensitive criteria is often not a stationary Markovian policy  Finding the best solution among the stationary Markovian policies is difficult  Policy gradient optimization for a wide range of risk-sensitive criteria 26

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  Traffic Signal Control (mean-var optimization) 27

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  American Option Pricing (mean-CVaR optimization) 28 Distribution of Histogram of

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Experimental Results  American Option Pricing (mean-CVaR optimization) 29 Tail of Papers at NIPS-2013, NIPS-2014, NIPS-2015 (joint work with Prashanth L.A., Yinlam Chow, Aviv Tamar, Shie Mannor)

© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.