On Predictability and Profitability: Would AI induced Trading Rules be Sensitive to the Entropy of time Series Nicolas NAVET – INRIA France nnavet@loria.fr.

Slides:



Advertisements
Similar presentations
Value-at-Risk: A Risk Estimating Tool for Management
Advertisements

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
1 Entropy Rate and Profitability of Technical Analysis: Experiments on the NYSE US 100 Stocks Nicolas NAVET – INRIA France Shu-Heng CHEN.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Pretests for genetic-programming evolved trading programs : “zero-intelligence” strategies and lottery trading Nicolas NAVET INRIA - AIECON NCCU
Clustering V. Outline Validating clustering results Randomization tests.
G. Alonso, D. Kossmann Systems Group
1 Market Efficiency in the Emerging Securitized Real Estate Markets Felix Schindler Centre for European Economic Research (ZEW) Milan, 26 th of June 2010.
1 Assessing the Risk and Return of Financial Trading Systems - a Large Deviation Approach Nicolas NAVET – INRIA, France René SCHOTT – IECN,
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
Learning with Cost Intervals Xu-Ying Liu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing.
USING A GENETIC PROGRAM TO PREDICT EXCHANGE RATE VOLATILITY Christopher J. Neely Paul A. Weller.
Business plan overview (1)
Data Mining Techniques Outline
Trading Rules and Market Efficiency Fin250f: Lecture 4.3 Fall 2005 Reading: Taylor, chapter 7.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
Evaluating Hypotheses
Genetic Programming and the Predictive Power of Internet Message Traffic James D Thomas Katia Sycara.
Experimental Evaluation
AGEC 622 Mission is prepare you for a job in business Have you ever made a price forecast? How much confidence did you place on your forecast? Was it correct?
The Lognormal Distribution
NEURAL NETWORKS FOR TECHNICAL ANALYSIS: A STUDY ON KLCI 授課教師:楊婉秀 報告人:李宗霖.
1 Terminating Statistical Analysis By Dr. Jason Merrick.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.
History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.
 1  Outline  stages and topics in simulation  generation of random variates.
Irwin/McGraw-Hill 1 Market Risk Chapter 10 Financial Institutions Management, 3/e By Anthony Saunders.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
COMM W. Suo Slide 1. COMM W. Suo Slide 2  Random Walk - stock price change unpredictably  Actually stock prices follow a positive trend.
Chapter 9 – Classification and Regression Trees
Sponsor: Dr. K.C. Chang Tony Chen Ehsan Esmaeilzadeh Ali Jarvandi Ning Lin Ryan O’Neil Spring 2010.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
1 Statistical Distribution Fitting Dr. Jason Merrick.
1 Chapter 7 Applying Simulation to Decision Problems.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 7 Point Estimation
0 Presentation by: Austin Applegate Michael Cormier Paul Hodulik Carl Nordberg Nikki Zadikoff Global Asset Allocation February, Granite Investments.
Options made very easy. To make money as an option trader you must be: A bargain hunter. Able to identify undervalued and overvalued options. Know when.
ETM 607 – Input Modeling General Idea of Input Modeling Data Collection Identifying Distributions Parameter estimation Goodness of Fit tests Selecting.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
Additive Data Perturbation: the Basic Problem and Techniques.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Tests of Random Number Generators
1 8. Back-testing of trading strategies 8.1Bootstrap Brock et al (1992), Davidson & Hinkley (1997), Fusai & Roncoroni (2008). Bootstrap: picking up at.
Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark.
Financial Time Series Analysis with Wavelets Rishi Kumar Baris Temelkuran.
"Fast estimation of the partitioning Rent characteristic" Fast estimation of the partitioning Rent characteristic using a recursive partitioning model.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Chapter 13 Sampling distributions
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
1 Risk Changes Following Ex-Dates of Stock Splits Shen-Syan Chen National Taiwan University Robin K. Chou National Central University Wan-Chen Lee Ching.
TECHNICAL ANALYSIS.  Technical analysis attempts to exploit recurring and predictable patterns in stock prices to generate high investment returns.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
The Capacity of Trading Strategies Landier (TSE), Simon (CFM), Thesmar (HEC) AFA 2016, San Francisco.
Computer Simulation Henry C. Co Technology and Operations Management,
Market-Risk Measurement
Learning Sequence Motif Models Using Expectation Maximization (EM)
Sequential Pattern Discovery under a Markov Assumption
Memory and Melodic Density : A Model for Melody Segmentation
Constraint Programming and Backtracking Search Algorithms
Stochastic Volatility Models: Bayesian Framework
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Data Transformations targeted at minimizing experimental variance
Presentation transcript:

On Predictability and Profitability: Would AI induced Trading Rules be Sensitive to the Entropy of time Series Nicolas NAVET – INRIA France nnavet@loria.fr Shu-Heng CHEN – AIECON/NCCU Taiwan chchen@nccu.edu.tw 09/04/2008

Outline Entropy Rate : uncertainty remaining in the next information produced given knowledge of the past  measure of predictability Questions : Do stocks exhibit differing entropy rates? Does low entropy imply profitability of TA? Methodology : NYSE US 100 Stocks – daily data – 2000-2006 TA rules induced using Genetic Programming

Estimating entropy Active field of research in neuroscience Maximum-likehood (“Plug-in”): empirical distribution of fixed length word not suited to capture long/medium term dependencies Compression-based techniques : Lempel-Ziv algorithm, Context-Tree Weighting fast convergence rate – suited to long/medium term dependencies

¤ ¤ 3 = Selected estimator i 6 Kontoyannis et al 1998 ^ h = Ã 1 n X ¤ ! ¡ l o g 2 ¤ i : length of the shortest string that does not appear in the i previous symbols Example: 0 1 1 0 0 1 0 1 1 0 0 ¤ 6 = 3

Performance of the estimator Experiments : Uniformly distributed r.v. in {1,2,..,8} – theoretical entropy Boost C++ random generator Sample of size 10000 = ¡ P 8 i 1 l o g 2 p 3 b . c ^ h S M = 2 : 9 6 Random generator is good – convergence rate is rather fast Note 1 : with sample of size 100000, ^ h S M ¸ 2 : 9 Note 2 : with standard C rand() function and sample size = 10000, ^ h S M = 2 : 7

Preprocessing the data (1/2) Log ratio between closing prices: r t = l n ( p ¡ 1 ) Discretization : f r t g 2 R ! A N 3,4,1,0,2,6,2,…

Preprocessing the data (2/2) Discretization is tricky – 2 problems: How many bins? (size of the alphabet) How many values in each bin? Guideline : maximize entropy with a number of bins in link with the sample size Here : alphabet of size 8 same number of values in each bin (“homogeneous partitioning”)

Entropy of NYSE US 100 stocks – period 2000-2006 Mean = Median = 2.75 Max = 2.79 Min = 2.68 Rand() boost = 2.9 NB : a normal distribution of same mean and standard deviation is plotted for comparison.

Entropy is high but price time series are not random! Randomly shuffled time series Original time series

Stocks under study Highest entropy time series b o l E n t r p O X Y 2 : 7 8 9 V L M R 5 B A W G 6 Highest entropy time series S y m b o l E n t r p T W X 2 : 6 7 M C 9 4 1 J P G 3 Lowest entropy time series

BDS tests: are daily log price changes i.i.d ? Lowest entropy time series m ± T W X E M C J P G 2 1 8 . 6 4 3 9 7 5 m ± O X Y V L M R B A W G 2 1 5 . 6 4 7 9 8 3 Highest entropy time series Null that log price changes are i.i.d. always rejected at 1% level but - whatever BDS parameters - rejection is much stronger for high-entropy stocks

Autocorrelation analysis High entropy stock (OXY) Low entropy stock (C) Up to a lag 100, there are 2.7 x more autocorrelations outside the 99% confidence bands for the lowest entropy stocks than for the highest entropy stocks

Part 2 : does low entropy imply better profitability of TA? Addressed here: are GP-induced rules more efficient on low-entropy stocks ?

Out-of-sample interval GP : the big picture Training interval Validation interval Out-of-sample interval 2000 2002 2004 2006 1 ) Creation of the trading rules using GP 2) Selection of the best resulting strategies Further selection on unseen data - One strategy is chosen for out-of-sample Performance evaluation

GP performance assessment Buy and Hold is not a good benchmark GP is compared with lottery trading (LT) of same frequency : avg nb of transactions same intensity : time during which a position is held Implementation of LT: random sequences with the right characteristics, e.g: 0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,1,1,1,… GP>LT ? LT>GP ? Student’s t-test at 95% confidence level – 20 GP runs / 1000 LT runs

Experimental setup Data preprocessed with 100-days MA Trading systems: Entry (long): GP induced rule with a classical set of functions / terminals Exit: Stop loss : 5% Profit target : 10% 90-days stop Fitness: net return - Initial equity: 100K$ - position sizing : 100%

Results: high entropy stocks ¯ s L T > ? O X Y 1 5 : K $ 4 N V 7 M R 8 B A 2 3 W 6 ¡ GP is always profitable LT is never better than GP (at a 95% confidence level) GP outperforms LT 2 times out of 5 (at a 95% confidence level)

Results: low entropy stocks G P n e t p r o ¯ s L T > ? W X ¡ 9 K $ 1 : 5 N Y E M C 6 8 J GP is never better than LT (at a 95% confidence level) LT outperforms GP 2 times out of 5 (at a 95% confidence level)

Typical low entropy stock (EMC) Explanations (1/2) GP is not good when training period is very different from out-of-sample e.g. Typical low entropy stock (EMC) 2000 2006

Explanations (2/2) The 2 cases where GP outperforms LT : training quite similar to out-of-sample BAX WAG

Conclusions EOD NYSE time series have high but differing entropies There are (weak) temporal dependencies Here, more predictable ≠ less risks GP works well if training is similar to out-of-sample

Perspectives Higher predictability level can be observed at intraday timeframe (what about higher timeframes?) Experiments needed with stocks less similar than the ones from NYSE US 100 Predictability tells us about the existence of temporal patterns – but how easy / difficult to discover them ??

?