Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Trade via Direct Reinforcement

Similar presentations


Presentation on theme: "Learning to Trade via Direct Reinforcement"— Presentation transcript:

1 Learning to Trade via Direct Reinforcement
John Moody International Computer Science Institute, Berkeley & J E Moody & Company LLC, Portland Global Derivatives Trading & Risk Management Paris, May 2008

2 What is Reinforcement Learning?
RL Considers: A Goal-Directed “Learning” Agent interacting with an Uncertain Environment that attempts to maximize Reward / Utility RL is an Active Paradigm: Agent “Learns” by “Trial & Error” Discovery Actions result in Reinforcement RL Paradigms: Value Function Learning (Dynamic Programming) Direct Reinforcement (Adaptive Control) Global Derivatives Trading & Risk Management – May 2008

3 I. Why Direct Reinforcement?
Direct Reinforcement Learning: Finds predictive structure in financial data Integrates Forecasting w/ Decision Making Balances Risk vs. Reward Incorporates Transaction Costs Discover Trading Strategies! Global Derivatives Trading & Risk Management – May 2008

4 Optimizing Trades based on Forecasts
Indirect Approach: Two sets of parameters Forecast error is not Utility Forecaster ignores transaction costs Information bottleneck Global Derivatives Trading & Risk Management – May 2008

5 Learning to Trade via Direct Reinforcement
Trader Properties: One set of parameters A single utility function U includes transaction costs Direct mapping from inputs to actions Global Derivatives Trading & Risk Management – May 2008

6 Direct RL Trader (USD/GBP): ReturnA=15%, SRA=2.3, DDRA=3.3
Global Derivatives Trading & Risk Management – May 2008

7 II. Direct Reinforcement: Algorithms & Illustrations
Recurrent Reinforcement Learning (RRL) Stochastic Direct Reinforcement (SDR) Illustrations: Sensitivity to Transaction Costs Risk-Averse Reinforcement Global Derivatives Trading & Risk Management – May 2008

8 Learning to Trade via Direct Reinforcement
DR Trader: Recurrent policy (Trading signals, Portfolio weights) Takes action, Receives reward (Trading Return w/ Transaction Costs) Causal performance function (Generally path-dependent) Learn policy by varying GOAL: Maximize performance or marginal performance Global Derivatives Trading & Risk Management – May 2008

9 Recurrent Reinforcement Learning (RRL) (Moody & Wu 1997)
Deterministic gradient (batch): with recursion: Stochastic gradient (on-line): stochastic recursion: Stochastic parameter update (on-line): Constant : adaptive learning. Declining : stochastic approx. Global Derivatives Trading & Risk Management – May 2008

10 Global Derivatives Trading & Risk Management – May 2008
Structure of Traders Single Asset - Price series - Return series Traders - Discrete position size - Recurrent policy Observations: Full system State is not known Simple Trading Returns and Profit: Transaction Costs: represented by Global Derivatives Trading & Risk Management – May 2008

11 Risk-Averse Reinforcement: Financial Performance Measures
Performance Functions: Path independent: (Standard Utility Functions) Path dependent: Performance Ratios: Sharpe Ratio: Downside Deviation Ratio: For Learning: Per-Period Returns: Marginal Performance: e.g. Differential Sharpe Ratio . Global Derivatives Trading & Risk Management – May 2008

12 Long / Short Trader Simulation Sensitivity to Transaction Costs
Learns from scratch and on-line Moving average Sharpe Ratio with  = 0.01 Global Derivatives Trading & Risk Management – May 2008

13 Trader Simulation Sharpe Ratio Trading Frequency
Transaction Costs vs. Performance 100 Runs; Costs = 0.2%, 0.5%, and 1.0% Sharpe Ratio Trading Frequency Global Derivatives Trading & Risk Management – May 2008

14 Minimizing Downside Risk: Artificial Price Series w/ Heavy Tails
Global Derivatives Trading & Risk Management – May 2008

15 Comparison of Risk-Averse Traders Underwater Curves
Global Derivatives Trading & Risk Management – May 2008

16 Comparison of Risk-Averse Traders: Draw-Downs
Global Derivatives Trading & Risk Management – May 2008

17 III. Direct Reinforcement vs. Dynamic Programming
Algorithms: Value Function Method (Q-Learning) Direct Reinforcement Learning (RRL) Illustration: Asset Allocation: S&P 500 & T-Bills RRL vs. Q-Learning Global Derivatives Trading & Risk Management – May 2008

18 Global Derivatives Trading & Risk Management – May 2008
RL Paradigms Compared Value Function Learning Origins: Dynamic Programming Learn “optimal” Q-Function Q: state  action  value Solve Bellman’s Equation Action: “Indirect” Direct Reinforcement Origins: Adaptive Control Learn “good” Policy P P: observations  p(action) Optimize “Policy Gradient” Action: “Direct” Global Derivatives Trading & Risk Management – May 2008

19 Global Derivatives Trading & Risk Management – May 2008
S&P-500 / T-Bill Asset Allocation: Maximizing the Differential Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

20 S&P-500: Opening Up the Black Box
85 series: Learned relationships are nonstationary over time Global Derivatives Trading & Risk Management – May 2008

21 Closing Remarks Direct Reinforcement Learning:
Discovers Trading Opportunities in Markets Integrates Forecasting w/ Trading Maximizes Risk-Adjusted Returns Optimizes Trading w/ Transaction Costs Direct Reinforcement Offers Advantages Over: Trading based on Forecasts (Supervised Learning) Dynamic Programming RL (Value Function Methods) Illustrations: Controlled Simulations FX Currency Trader Asset Allocation: S&P 500 vs. Cash & Global Derivatives Trading & Risk Management – May 2008

22 Global Derivatives Trading & Risk Management – May 2008
Selected References: [1] John Moody and Lizhong Wu. Optimization of trading systems and portfolios. Decision Technologies for Financial Engineering, 1997. [2] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17: , 1998. [3] Jonathan Baxter and Peter L. Bartlett. Direct gradient-based reinforcement learning: Gradient estimation algorithms [4] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4): , July 2001. [5] Carl Gold. FX Trading via Recurrent Reinforcement Learning. Proceedings of IEEE CIFEr Conference, Hong Kong, 2003. [6] John Moody, Y. Liu, M. Saffell and K.J. Youn. Stochastic Direct Reinforcement: Application to Simple Games with Recurrence. In Artificial Multiagent Learning, Sean Luke et al. eds, AAAI Press, 2004. Global Derivatives Trading & Risk Management – May 2008

23 Global Derivatives Trading & Risk Management – May 2008
Supplemental Slides Differential Sharpe Ratio Portfolio Optimization Stochastic Direct Reinforcement (SDR) Global Derivatives Trading & Risk Management – May 2008

24 Maximizing the Sharpe Ratio
Exponential Moving Average Sharpe Ratio: with time scale and Motivation: EMA Sharpe ratio emphasizes recent patterns; can be updated incrementally. Global Derivatives Trading & Risk Management – May 2008

25 Differential Sharpe Ratio for Adaptive Optimization
Expand to first order in : Define Differential Sharpe Ratio as: where Global Derivatives Trading & Risk Management – May 2008

26 Learning with the Differential SR
Evaluate “Marginal Utility” Gradient: Motivation for DSR: isolates contribution of to (“marginal utility” ); provides interpretability; adapts to changing market conditions; facilitates efficient on-line learning (stochastic optimization). Global Derivatives Trading & Risk Management – May 2008

27 Trader Simulation Transaction costs vs. Performance
100 runs; Costs = 0.2%, 0.5%, and 1.0% Trading Frequency Cumulative Profit Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

28 Portfolio Optimization (3 Securities)
Global Derivatives Trading & Risk Management – May 2008

29 Stochastic Direct Reinforcement: Probabilistic Policies
Global Derivatives Trading & Risk Management – May 2008

30 Global Derivatives Trading & Risk Management – May 2008
Learning to Trade Single Asset - Price series - Return series Trader - Discrete position size - Recurrent policy Observations: Full system State is not known Simple Trading Returns and Profit: Transaction cost rate Global Derivatives Trading & Risk Management – May 2008

31 Why does Reinforcement need Recurrence?
Consider a learning agent with stochastic policy function whose inputs include recent observations o and actions a : Why should past actions (recurrence) be included? Examples: Games (observations o are opponent’s actions) Trading financial markets In General: Model opponent’s responses o to previous actions a Minimize transaction costs, market impact Recurrence enables discovery of better policies that capture an agent’s impact on the world !! Global Derivatives Trading & Risk Management – May 2008

32 Stochastic Direct Reinforcement (SDR): Maximize Performance
Expected total performance of a sequence of T actions Maximize performance via direct gradient ascent Must evaluate total policy gradient for a policy represented by Global Derivatives Trading & Risk Management – May 2008

33 Stochastic Direct Reinforcement (SDR): Maximize Performance
The goal of SDR is to maximize expected total performance of a sequence of T actions via direct gradient ascent Must evaluate for a policy represented by Notation: The complete history is denoted is a partial history of length (n,m) . Global Derivatives Trading & Risk Management – May 2008

34 Stochastic Direct Reinforcement: First Order Recurrent Policy Gradient
For first order recurrence (m=1), conditional action probability is given by the policy: The probabilities of current actions depend upon the probabilities of prior actions: The total (recurrent) policy gradient is computed as : with partial (naïve) policy gradient : Global Derivatives Trading & Risk Management – May 2008

35 SDR Trader Simulation w/ Transaction Costs
Global Derivatives Trading & Risk Management – May 2008

36 Trading Frequency vs. Transaction Costs
Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008

37 Sharpe Ratio vs. Transaction Costs
Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008


Download ppt "Learning to Trade via Direct Reinforcement"

Similar presentations


Ads by Google