On-Line Portfolio Selection Using Multiplicative Updates Written by David P. Helmbold (Cal), Robert E. Schapire (Cal), Yoram Singer (AT&T) and Manfred.

Slides:

Advertisements

Similar presentations

Linear Regression.

Advertisements

AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo

Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.

Boosting Approach to ML

Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort

© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.

Infinite Horizon Problems

Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Introduction to Analysis of Algorithms

Online Portfolio Balancing in Adverse Markets Ryan McCabe Tim Miller University of Minnesota 27 April 2006.

2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.

CS107 Introduction to Computer Science

Eleg667/2001-f/Topic-1a 1 A Brief Review of Algorithm Design and Analysis.

Object (Data and Algorithm) Analysis Cmput Lecture 5 Department of Computing Science University of Alberta ©Duane Szafron 1999 Some code in this.

CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?

Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 2 Elements of complexity analysis Performance and efficiency Motivation: analysis.

Complexity (Running Time)

Chapter 7 (Part 2) Sorting Algorithms Merge Sort.

Elementary Data Structures and Algorithms

Summary of Algo Analysis / Slide 1 Algorithm complexity * Bounds are for the algorithms, rather than programs n programs are just implementations of an.

CS107 Introduction to Computer Science Lecture 7, 8 An Introduction to Algorithms: Efficiency of algorithms.

Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Algorithm Analysis (Big O)

Neural Networks Lecture 8: Two simple learning algorithms

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014.

COMP s1 Computing 2 Complexity

COMPSCI 102 Introduction to Discrete Mathematics.

Number Sequences Lecture 7: Sep 29 ? overhang. This Lecture We will study some simple number sequences and their properties. The topics include: Representation.

Time Complexity Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.

Lorenzo Coviello and Petros Mol

Lecture 2 Computational Complexity

CSC 201 Analysis and Design of Algorithms Lecture 04: CSC 201 Analysis and Design of Algorithms Lecture 04: Time complexity analysis in form of Big-Oh.

CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 4.

Algorithm Evaluation. What’s an algorithm? a clearly specified set of simple instructions to be followed to solve a problem a way of doing something What.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Case Studies: Bin Packing.

Asymptotic Analysis-Ch. 3

Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.

Image segmentation Prof. Noah Snavely CS1114

Asymptotic Notation (O, Ω, )

MS 101: Algorithms Instructor Neelima Gupta

Fundamentals of Algorithms MCS - 2 Lecture # 8. Growth of Functions.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

3.3 Complexity of Algorithms

Copyright © 2014 Curt Hill Growth of Functions Analysis of Algorithms and its Notation.

CS 206 Introduction to Computer Science II 09 / 18 / 2009 Instructor: Michael Eckmann.

CS 206 Introduction to Computer Science II 01 / 30 / 2009 Instructor: Michael Eckmann.

1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf.

Algorithm Analysis (Big O)

27-Jan-16 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

Computer science is a field of study that deals with solving a variety of problems by using computers. To solve a given problem by using computers, you.

1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.

A Introduction to Computing II Lecture 5: Complexity of Algorithms Fall Session 2000.

CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 4.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

CSC 212 – Data Structures Lecture 15: Big-Oh Notation.

1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign

GC 211:Data Structures Week 2: Algorithm Analysis Tools Slides are borrowed from Mr. Mohammad Alqahtani.

Mathematical Foundations (Growth Functions) Neelima Gupta Department of Computer Science University of Delhi people.du.ac.in/~ngupta.

Data Structures I (CPCS-204) Week # 2: Algorithm Analysis tools Dr. Omar Batarfi Dr. Yahya Dahab Dr. Imtiaz Khan.

GC 211:Data Structures Week 2: Algorithm Analysis Tools

Introduction to Algorithms

Introduction Algorithms Order Analysis of Algorithm

Analysis of Algorithms & Orders of Growth

Algorithm Analysis (not included in any exams!)

COMSOC ’06 6 December 2006 Rob LeGrand

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Richard Anderson Lecture 3

Presentation transcript:

On-Line Portfolio Selection Using Multiplicative Updates Written by David P. Helmbold (Cal), Robert E. Schapire (Cal), Yoram Singer (AT&T) and Manfred K. Warmuth (Cal) Presented by Ryan M. McCabe

Goal Within a menu of a fixed number of stocks, we want to make as much money as possible without relying too much on luck Within a menu of a fixed number of stocks, we want to make as much money as possible without relying too much on luck We’ll compare our results to how well the best single stock, another form of on-line learning (Cover) and a batch learner (BCRP) each performed We’ll compare our results to how well the best single stock, another form of on-line learning (Cover) and a batch learner (BCRP) each performed

Context Remember, this is on-line learning Remember, this is on-line learning Unlike batch learning, the data is coming to us in a stream, and we learn from each example Unlike batch learning, the data is coming to us in a stream, and we learn from each example Still, we do not want to completely ignore what we have learned from history Still, we do not want to completely ignore what we have learned from history

More Context We have a bunch of stocks We have a bunch of stocks We have some wealth We have some wealth Every day we get a report on the stocks Every day we get a report on the stocks Every day we update our current wealth, based on their performance yesterday Every day we update our current wealth, based on their performance yesterday Every day we re-allocate our wealth over the stocks Every day we re-allocate our wealth over the stocks

Preliminaries We have N stocks We have N stocks w is a vector of weights over N stocks w is a vector of weights over N stocks w i from i = 1 to N, sums to 1 w i from i = 1 to N, sums to 1 every w i >= 0 every w i >= 0 We have T total time, superscript t denotes a specific time We have T total time, superscript t denotes a specific time

Preliminaries w t is the vector of weights at time t w t is the vector of weights at time t w t is chosen at the beginning of day t w t is chosen at the beginning of day t x t is the vector of relative performance of all the stocks over the course of day t x t is the vector of relative performance of all the stocks over the course of day t x t = closing price on t / opening price at t x t = closing price on t / opening price at t The wealth resulting from day t is w t * x t The wealth resulting from day t is w t * x t We change w t every day in some way We change w t every day in some way

Follow-Ups If we have time at the end of this presentation, we’ll talk about some things of practical importance If we have time at the end of this presentation, we’ll talk about some things of practical importance Transaction costs Transaction costs Side information Side information Implementation details Implementation details

Four Types of Portfolio Mangers (Best) Constant-Rebalanced Portfolio (Best) Constant-Rebalanced Portfolio Cover Universal Portfolio Cover Universal Portfolio Exact Exponentiated Gradient (ExactEG(  )) Exact Exponentiated Gradient (ExactEG(  )) Approximate Exponential Gradient (EG(  )) Approximate Exponential Gradient (EG(  ))

Constant-Rebalanced Portfolios In a CRP w t is learned over all T by looking back over the data (this is our batch method) In a CRP w t is learned over all T by looking back over the data (this is our batch method) Although the wealth is redistributed every day over the N stocks, w t stays the same from 1…T Although the wealth is redistributed every day over the N stocks, w t stays the same from 1…T w* denotes the w t that maximizes wealth over the given set of x t from 1…T w* denotes the w t that maximizes wealth over the given set of x t from 1…T w* is associated with the Best Constant- Rebalanced Portfolio (BCRP) w* is associated with the Best Constant- Rebalanced Portfolio (BCRP)

Cover Universal Portfolio Another on-line method Another on-line method w t is updated every day w t is updated every day w t is a weighted average over all feasible portfolios w t is a weighted average over all feasible portfolios Guarantees the same asymptotic growth rate as BCRP for any given set of x t Guarantees the same asymptotic growth rate as BCRP for any given set of x t Exponential complexity in N Exponential complexity in N

Exact Exponentiated Gradient Remember on-line regression? Remember on-line regression? F(w t+1 ) =  log(w t+1 * x t ) – d(w t+1, w t ) F(w t+1 ) =  log(w t+1 * x t ) – d(w t+1, w t ) Maximize F(w t+1 ) over w t+1, given w t and x t Maximize F(w t+1 ) over w t+1, given w t and x t log(w t+1 * x t ), maximizes wealth if x t stays still log(w t+1 * x t ), maximizes wealth if x t stays still d(w t+1, w t ), penalizes moving too far from w t d(w t+1, w t ), penalizes moving too far from w t , learning rate - shifts importance between main two terms , learning rate - shifts importance between main two terms But F(w t+1 ) is difficult to maximize But F(w t+1 ) is difficult to maximize

How do we learn w t ? So we use an approximation So we use an approximation Using a first-order Taylor approximation of the first term at w t+1 = w t and a relative entropy distance measure for the second penalty term, waving some hands, we get the EG(  ) update: Using a first-order Taylor approximation of the first term at w t+1 = w t and a relative entropy distance measure for the second penalty term, waving some hands, we get the EG(  ) update:

Exponential Gradient Update This approximate version performs indistinguishably as well as the original Exact EG(  ) = F(w t+1 ) =  log(w t+1 * x t ) – d(w t+1, w t ) This approximate version performs indistinguishably as well as the original Exact EG(  ) = F(w t+1 ) =  log(w t+1 * x t ) – d(w t+1, w t ) It is only linearly complex in N It is only linearly complex in N

Quick ReCap So now we have defined our four methods So now we have defined our four methods Best Constant-Rebalanced Portfolio (BCRP) Best Constant-Rebalanced Portfolio (BCRP) Cover Universal On-Line Portfolio Cover Universal On-Line Portfolio Exact EG(  ) Exact EG(  ) Common EG(  ) Common EG(  ) Let’s see how they perform under pressure…

The Experiments 22 years of NYSE data (T > 5,000) 22 years of NYSE data (T > 5,000) 36 equities (N = {2, 3,…,36}) 36 equities (N = {2, 3,…,36}) Usually 2- or 3-stock subsets were used Usually 2- or 3-stock subsets were used Reproduced each Cover experiment Reproduced each Cover experiment Stocks chosen for volatility reasons Stocks chosen for volatility reasons Found BCRP, then ran w* through from the beginning Found BCRP, then ran w* through from the beginning Ran EG(  ), ExactEG(  ) through from the beginning Ran EG(  ), ExactEG(  ) through from the beginning

Commercial Metals and Kin Ark (Figure 5.1)

IBM and Coca Cola (Figure 5.2)

Gulf, HP, and Schlum (Fig 5.3)

Volatility Elasticity (Table 5.5)

Results Analysis Summary EG(  ) and ExactEG(  ) were always about 1% from each other with EG(  ) running much faster EG(  ) and ExactEG(  ) were always about 1% from each other with EG(  ) running much faster BCRP always did the best BCRP always did the best EG(  ) always outperformed Cover’s Universal Portfolio, despite Cover’s superior analytical worst-case bound EG(  ) always outperformed Cover’s Universal Portfolio, despite Cover’s superior analytical worst-case bound

Talking Points “[S]urprisingly, the wealth achieved by the EG(  ) update was larger than the wealth achieved by the universal portfolio algorithm. This outcome is contrary to the superior worst- case bounds proved for the universal portfolio algorithm.” “[S]urprisingly, the wealth achieved by the EG(  ) update was larger than the wealth achieved by the universal portfolio algorithm. This outcome is contrary to the superior worst- case bounds proved for the universal portfolio algorithm.” Cover = O((N log T)/T) Cover = O((N log T)/T) EG(  ) = O(√((log N)/T)) EG(  ) = O(√((log N)/T)) Any ideas why? Any ideas why?

Talking Points So, the size of N affected relative running times, but how did stock volatility affect relative overall wealth? So, the size of N affected relative running times, but how did stock volatility affect relative overall wealth? Would running time matter in this domain if the algorithms were applied? Why did it matter so much to the authors? Would running time matter in this domain if the algorithms were applied? Why did it matter so much to the authors?

Follow Up Transaction Costs Transaction Costs Scottrade.com charges $7 per transaction Scottrade.com charges $7 per transaction Would you update every stock every day? Would you update every stock every day? Side Information Side Information K-finite states of side info, available to algorithm K-finite states of side info, available to algorithm Computationally the same as K parallel versions running, so no big deal and may increase wealth Computationally the same as K parallel versions running, so no big deal and may increase wealth Implementation Details Implementation Details How do we pick  ? How do we pick  ? How do we pick w 1 ? How do we pick w 1 ?

Done