Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1.

Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1

Multi-signal Strategies Proliferation in industry  E.g., MSCI Quality Index High ROE, low ROE vol., low leverage  “Smart beta” products RAFI: weight on sales, CF, BE, and dividends. Increasingly common in academia  Piotroski’s F-score (9 signals)  Asness et. al. Quality Score (21 signals) 2

Why the increased interest? Because finding “alpha” is hard  And they work great! Impressive backtest performance Too good?  Alpha should be hard to find Lots of smart people looking  Huge incentives to try And even to believe! 3

Issues Every choice has potential to bias results  Much bigger problem with multiple signals Not just which signals are used… But how they are used! Basic issue  Each signal is used so that it individually predicts positive in-sample returns Seems like a small thig—but it’s not! 4

Types of biases Snooping: in-sample aspect of data  guides strategy formation Two types to worry about:  Multiple testing bias Consider multiple strategies, show only best one  Overfitting E.g., Ex post MVE SRs always high  MVE strat buys “winners” and sell “losers” 5

Examples Bet on a series of fair coin flips What if you knew that there were: 1.More heads in the first (or second) half And could bet on just the early (or late) flips? 2.More heads than tails? What sorts of biases?  Do we account for these in finance? 6

First type: multiple testing (or selection)  Don’t really account for it, formally Do suspect (know) people look at more thing Second type: overfitting  Bet heads, not tails!  Account for it? One signal: Absolutely!  t 5% = 1.96 (not 1.65) Multiple signals: No! 7

Thought experiment? 8

Null hypothesis  “Signals” don’t predict differences in average returns E.g., monkeys selecting stocks by throwing darts at the WSJ Performance distribution  t-statistics ~ N(0,1) More or less  Excess kurtosis and heteroscedasticity 9

What if you diversify across the lucky monkeys?  Those with positive alpha Clearly “snooping”  Using in-sample aspect of data to form the strategy How does this bias the results?  Expected t-stat? 10

Get the average return Diversify across their risks Yields a high t-statistic:  Can also frame this in SRs 11

Same thing (essentially) happens if you use all the signals  But sign them so that they “predict” positive in-sample returns Standard statistics account for this…  If and only if N = 1! Again, strategy has high backtested SR  Question: expect high SR going forward? 12

Issues Combine things that backtest well  Get even better backtests Not surprising!  But what do the backtests mean? Biased?  Why? What biases?  If so, by how much? (Quantify!) Other intuitions? 13

Can address these Calculate empirical distributions  When signals are not informative  But multiple signals are used to select stocks Big boot-strapping exercise Derive theoretical distributions  In a simplified model Normal, homoscedastic returns  Use these to develop intuition 14

Strategy Construction 15

16 “Smart beta”Market

Signals Generate individually as pure noise!  Random normal variables Composite signals sum individual signals  Technical reason—mapping to theory Not important for the empirical work Cap multiplier is market equity  Essentially value-weighted strategies Again, not important 17

Best k-of-n strategies “Natural” construction  Investigate n signals  Pick the k “strongest” I.e., with most significant in-sample performance  Combine them how? Bootstrap for k ≤ n ≤ 100  Again, do it 10,000 times  Collect strategy t-statistics 18

Two Issues When k < n, selection bias  When k = 1 < n, multiple testing bias Well understood When k > 1, overfitting  Data snooping In-sample aspect of data used to form strategy  Pure overfitting only if k = n Interaction! 19

Special Cases 20 Overfitting only Multiple-testing only

Pure Selection 21

Pure Overfitting 22

Both Biases 23

General Case What sort of strategies should we worry about?  How do we think researchers design strategies in practice? 3-of-20?  How many signals did MSCI consider for its quality index? 5-of-100? 24

General Case 25

Model (theory) Strategies signal-weight stocks Returns normally dist. (assumption)  Equal volatilities  Uncorrelated Combine signals by averaging  Or weighted averaging  combined strat = portfolio of pure strats So can apply facts from portfolio theory 26

Best k-of-n strategies Yields t-statistic distributions: 28

Critical values Analytic for special cases:  k = 1  k = n, with signal-weighting Generally by numeric integration  Simple computationally But don’t provide much intuition  Also derive good analytic approximations Useful for comparative statics 29

Special Cases 30

Special Cases 31

General Cases 32

General Cases (Empirical) 33

General case, when k ~ n n = 100 34

Tension when increasing k Decreases vol.  improves performance Decreases average signal quality  lowers returns  impairs performance  Initially first effect dominates (esp. w/ large n) “Optimal” use of worst ~1/2 of signals:  Throw them away! Mean k/2-of-k t-stats. ~13% higher than k-of-k Mean k-of-2k t-stats. ~59% higher than k-of-k 37

Alternative Quantification Pure multiple-testing bias equivalence  How many single signals would you have to look at to get the same bias? That is, given any critical value τ (i.e., for some best k-of-n strategy), find n* s.t.  38

Approximate Power Law Best k-of-n strategy bias: Similar to those from a best 1-of-n k strategy!  Using analytic approximation, can show that log-n * roughly affine in log-n With slope ≈ k Can see this graphically 40

Conclusion View multi-signal claims skeptically  Multiple good signals  better performance when combined  Good backtested performance does NOT  any good signals “High tech” solution: use different tests “Low tech”: evaluate signals individually  Marginal power of each variable 42

General Approximation 43

How They Work Specify mean, S.D. of approx. normal  Combine with p-value  how far out in tail E.g., 5% crit.  mean + two standard deviations 44

General Approximation Where 45

Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1.

Similar presentations

Presentation on theme: "Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1.

Similar presentations

Presentation on theme: "Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1."— Presentation transcript:

Similar presentations

About project

Feedback