Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH.

Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH

The Effect of Different Machine Learning Approaches on Predicting Market Trends

Efficient Market Hypothesis (EMH)
Postulated by William Sharpe in the 1960s Proposes prices perfectly reflect all available information Malkiel, B. G. (1973). A Random Walk Down Wall Street [By] Burton G. Malkiel. Norton. Fama, E. (1970). Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance, 25(2), Sharpe, W. F. (1966). Mutual fund performance. The Journal of business, 39(1),

Brief History of Technical Analysis
Nobel Prize winner Harry Markowitz in the 1950s addressed portfolio selection Not until 1980s and 1990s that Machine Learning was reliably applied Well-maintained, reliable data Computational power Markowitz, H. (1952). Portfolio Selection. The Journal of Finance,7(1), Fox, J., & Sklar, A. (2009). The myth of the rational market: A history of risk, reward, and delusion on Wall Street (p. xi). New York: Harper Business.

What is the Status of Current Research?
Three main parties: Academic Economists Academic Computer Scientists Private Sector (Very successful, but secretive) Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Research, 259(2),

Most effective research is not published

Question: How does the accuracy of a Long Short- Term Memory Network’s prediction of the direction of the return indices of constituents of the S&P 500 vary when algorithmic variables such as the forecast horizon, the size of the hidden layer, and the existence of dropout are changed compared to the baseline of the vanilla LSTM?

What is Machine Learning (ML)?
The process of computational learning Begins with observations or data Looks for patterns in data, extrapolates on past data to make future predictions

Machine Learning Processes
Algorithms learn from examples, known as training data You give it the data and it runs it through a series of computations guessing closer to the goal One way to conceptualize it: 2 𝑥 3 + 4𝑥 2 +𝑥− 1 2 X-Values Y-Values -2 -2.5 -1 .5 -.5 1 6.5 2 33.6 A = 2.5 C = 1 B = 2 D = .5 A = 3 C = 2 B = 4 D = -1 A = 2 C = 1 B = 4 D = -.5

Building Blocks of Learning
𝑥 𝑖 𝑥 𝑖𝑖 𝑥 𝑖𝑖𝑖 𝑥 𝑖𝑣 Input Layer Hidden Layer Output Layer 𝑦 𝑖 𝑦 𝑖𝑖 Feed Forward Neural Net Simple Linear Dependencies

How RNNs Work For those feedforward neural nets, the output is only based on the input Recurrent Neural Nets (RNNs), pass in input data, the state of previous hidden layer Connects previous data with present task by creating mathematical memory 1 2 3 Input Layer Hidden layer

Gradient Descent As identified by Yoshua Bengio
Input Layer 1 Hidden layer 1 Input Layer 2 Hidden layer 2 1 Output Layer 1 2 3 T = 100 Input Layer 3 Hidden layer 2 1 3 Memories become more subtle as they phase into the past Error signals from previous timesteps do not make it down the line Vanishing Gradient Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2),

Popular Solution: LSTMs
Replaces ordinary neurons with memory cells Maintains all error signals Preserves short-term memories indefinitely Learns when to make the output call, when to leak the error signals Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8),

Algorithmic Variables Examined
𝑥 𝑖 𝑥 𝑖𝑖 𝑥 𝑖𝑖𝑖 𝑥 𝑖𝑣 Input Layer Hidden Layer Output Layer 𝑦 𝑖 𝑦 𝑖𝑖 Size of hidden layer Forecast Horizon Existence of dropout (but not varied amount) X-Values Y-Values -2 -2.5 -1 .5 -.5

Methods ~

Training Data Acquisition and Formatting
Reuters Datastream for Office and Reuters Eikon through UCSB IRC Leavers Joiners Function Call function to get return index for each current and historical constituent Normalized data using studentized residual normalization, split into training sets 𝜖 𝑖 𝜎 𝑖 = 𝑋 𝑖 − 𝜇 𝑖 𝜎 𝑖 𝜖 𝑖 = Return index for time period i 𝑋 𝑖 = Single day return at day i 𝜇 𝑖 = Standardized mean of all data on index at time period i Thomson Reuters Eikon. (2018a). [LJ, Daily Total Returns of Every Historical Constituent of the S&P500, ].

Why Eikon, RIs, S&P? Best and most reliable data (Bloomberg)
Only company that provided that service to me Return indices are cumulative-dividend prices that account for all relevant corporate actions and splits S&P most accurately represents US economy Training on all constituents eliminates survivor bias Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2),

Methods and Training I define my vanilla LSTM architecture following Fischer, Krauss (2018): Dense Input Layer ~ Shape: 240 timesteps, 1 feature LSTM hidden layer of 25 neurons, with dropout of .01 A Dense Output layer of 2 with a softmax activation (binary classification) Measure with metrics for accuracy To avoid overfitting or unlearning, I utilize a series of callback functions Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), Chollet, F. (2015, March 27). Keras (Version 2.2.4) [Program documentation]. Retrieved 2018, from

Results and Discussion ~

Results and Discussion
Baselines: 55% directional accuracy daily vanilla LSTM predictions (SS) % directional accuracy for daily random guesses ~51% DA when predicting weekly prices (Not SS) 59% predicting monthly (Not SS) 60% predicting yearly (Not SS) Contrary to Siami-Namini & Namin (2018), loss function did get better per epoch and therefore my model was learning Loss Over Time Loss Iteration Abadi, M., et. al,(2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16) (pp ). Siami-Namini, S., & Namin, A. S. (2018). Forecasting economics and financial time series: Arima vs. lstm.

Results and Discussion Continued
57% with medium hidden layer (n=5) (SS) ~51% with large hidden layer(n=25) (SS) Similar for all epochs with dropout (Not SS) Received best accuracy during times of high volatility I reproduced the loss in learning rate after 2001 as identified by Krauss, Do, Huck (2017) in connection to a rise in ML and high volume trading (HVT) Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Research, 259(2),

Conclusions ~

Conclusions on EMH On short time scales such as the day, the market proves semi-efficient with anomalies, which correct themselves efficiently over longer time steps Reasons the market is only semi-efficient: Stocks take time to respond to new information, HVT Stock prices can be affected by emotional decisions, HVT Green, J., Hand, John R. M., Zhang, X. F., (2013). The supraview of return predictive signals. Review of Accounting Studies 18 (3), 692–730

Conclusions on the LSTMs Algorithmic Variables
An LSTM with a medium size hidden layer predicting daily direction annualizes the most accurate predictions when trained on American Stock Data Hypothetically, my findings can be applied to any time series prediction

Further Work and Acknowledgements ~

Suggested Further Work
Bidirectional, convolutional, optimal batch size Different layers (attention, varied dropout, memory) Best way to format dates, different inputs entirely Compare to other RNNs or other algorithms entirely If same results are applicable to other time series Trading strategies based on predictions Zheng, A., & Jin, J (n.d.). Using AI to Make Predictions on Stock Market.

Acknowledgements Jon Jablonski Ted Suter Harry Evry Mark Lowe
Jared Goldberg Patti McNamara Dr. Christopher Krauss Dr. Krister Swanson

Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH.

Similar presentations

Presentation on theme: "Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH.

Similar presentations

Presentation on theme: "Effect of Algorithmic Variables in LSTM's Prediction of Constituent Direction of the S&P 500: as Applied to EMH."— Presentation transcript:

Similar presentations

About project

Feedback