Stock Prediction with ARIMA

Slides:



Advertisements
Similar presentations
FINANCIAL TIME-SERIES ECONOMETRICS SUN LIJIAN Feb 23,2001.
Advertisements

SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.
Time Series Analysis Definition of a Time Series process
Nonstationary Time Series Data and Cointegration
DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Model Building For ARIMA time series
Nonstationary Time Series Data and Cointegration Prepared by Vera Tabakova, East Carolina University.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.
Time Series Building 1. Model Identification
How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.
Multiple Regression Forecasts Materials for this lecture Demo Lecture 2 Multiple Regression.XLS Read Chapter 15 Pages 8-9 Read all of Chapter 16’s Section.
Non-Seasonal Box-Jenkins Models
13 Introduction toTime-Series Analysis. What is in this Chapter? This chapter discusses –the basic time-series models: autoregressive (AR) and moving.
1 Econ 240 C Lecture 3. 2 Part I Modeling Economic Time Series.
1 Econ 240 C Lecture White noise inputoutput 1/(1 – z) White noise input output Random walkSynthesis 1/(1 – bz) White noise input output.
1 Econ 240 C Lecture Time Series Concepts Analysis and Synthesis.
1 Econ 240 C Lecture 3. 2 Time Series Concepts Analysis and Synthesis.
Prediction and model selection
Financial Econometrics
Modern methods The classical approach: MethodProsCons Time series regression Easy to implement Fairly easy to interpret Covariates may be added (normalization)
Non-Seasonal Box-Jenkins Models
BOX JENKINS METHODOLOGY
STAT 497 LECTURE NOTES 4 MODEL INDETIFICATION AND NON- STATIONARY TIME SERIES MODELS 1.
TIME SERIES ANALYSIS Time Domain Models: Red Noise; AR and ARMA models LECTURE 7 Supplementary Readings: Wilks, chapters 8.
Managerial Economics Demand Estimation & Forecasting.
John G. Zhang, Ph.D. Harper College
FORECASTING. Minimum Mean Square Error Forecasting.
K. Ensor, STAT Spring 2005 Model selection/diagnostics Akaike’s Information Criterion (AIC) –A measure of fit plus a penalty term for the number.
Introducing ITSM 2000 By: Amir Heshmaty Far. S EVERAL FUNCTION IN ITSM to analyze and display the properties of time series data to compute and display.
The Properties of Time Series: Lecture 4 Previously introduced AR(1) model X t = φX t-1 + u t (1) (a) White Noise (stationary/no unit root) X t = u t i.e.
Bangladesh Short term Discharge Forecasting time series forecasting Tom Hopson A project supported by USAID.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
Time Series Basics Fin250f: Lecture 8.1 Spring 2010 Reading: Brooks, chapter
Forecasting (prediction) limits Example Linear deterministic trend estimated by least-squares Note! The average of the numbers 1, 2, …, t is.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
The Box-Jenkins (ARIMA) Methodology
Module 4 Forecasting Multiple Variables from their own Histories EC 827.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser.
Demand Management and Forecasting Chapter 11 Portions Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Lecture 12 Time Series Model Estimation
Nonstationary Time Series Data and Cointegration
Financial Econometrics Lecture Notes 2
Basic Estimation Techniques
Ch8 Time Series Modeling
Lecture 8 ARIMA Forecasting II
Bangladesh Short term Discharge Forecasting
Applied Econometric Time Series Third Edition
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Model Building For ARIMA time series
Applied Econometric Time-Series Data Analysis
Basic Estimation Techniques
BA 275 Quantitative Business Methods
A Weighted Moving Average Process for Forecasting “Economics and Environment” By Chris P. Tsokos.
Machine Learning Week 4.
Cross-validation for the selection of statistical models
Module 3 Forecasting a Single Variable from its own History, continued
Unit Root & Augmented Dickey-Fuller (ADF) Test
Chapter 8. Model Identification
Introduction to Time Series
Basic Practice of Statistics - 3rd Edition Inference for Regression
Regression Statistics
State-Space Models for Time Series
Lecturer Dr. Veronika Alhanaqtah
Time Series introduction in R - Iñaki Puigdollers
CH2 Time series.
Threshold Autoregressive
BOX JENKINS (ARIMA) METHODOLOGY
Presentation transcript:

Stock Prediction with ARIMA Kihwan Lee 8/23/2018

Exploratory Data Analysis Data Source Quandle National Unemployment Rate Apple, Google, J.P.Morgan Foreign Exchange Rate – Korea

Correlation and Bollinger Band National Unemployment Rate Observation Apple Stock Closing Price -0.5 Negative correlation, somewhat noticeable Foreign Exchange Rate with South Korea -0.73 Negative correlation, somewhat stronger Bollinger band Provides a relative definition of high and low prices of a market Low band The price has reached a relative low value. A likely time to buy. High band The price has reached a relative high value. A likely time to sell. The stock prices are well bounded by the Bollinger band Bollinger Bands

What is ARIMA? p = order of Auto-Regressive model ARMA: Auto Regression Moving Average model for the stationary data ARIMA: Auto Regression Integrated Moving Average model – a generalization of ARMA ARIMA model goes through differencing steps to eliminate the non-stationary part When the data shows evidence of non-stationarity, where initial differencing steps can be applied multiple times to eliminate the non-stationarity.  ARMA model on the differenced data Auto Regression: y(t) = f(y(t-1)), the current value is regressed with its own lagged value Moving Average: regression error = a linear combination of previous error terms Integrated: data values have been replaced by the difference p = order of Auto-Regressive model D=order of differencing model q = order of Moving-Average model (p,d,q) = non-seasonal components (P,D,Q)s = seasonal components

Akaike Information Criterion vs. Bayesian Information Criterion The Akaike information criterion (AIC) An estimator of the relative quality of statistical models for a given set of data. AIC estimates the relative information lost by a given model: the less information a model loses, the higher the quality of that model. The model with lowest AIC is preferred K = number of the estimated parameters L = maximum value of the likelihood function of the model AIC = 2 * k = 2 ln (L) Dickey-Fuller Test Tests the null hypothesis that a unit root is present in an autoregressive model, meaning non-stationary The alternative hypothesis is the data is stationary Smaller P-value is preferred Accepting the null hypothesis is surprising!!!! Non-surprising is equivalent to non-stationary Surprising is equivalent to stationary The Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) It is independent of the prior. It can measure the efficiency of the parameterized model in terms of predicting the data. It penalizes the complexity of the model where complexity refers to the number of parameters in the model.  the model with the lowest BIC is preferred. It is based, in part, on the likelihood function K = number of the estimated parameters L = maximum value of the likelihood function of the model X = observed data N = the number of data points in x, the numbe of observations, or equivalently, the sample size BIC = ln(n) k – 2 ln(L) Green: Recovery with a unit root Red: Recovery without a unit root Quantile - Quantile Plot 45 degree reference line plotted. Two sets of data from the same population falls approximately along this reference line. The supplied data is plotted with a set of data from the normal distribution. Following reference line means the supplied data follows the normal distribution closely.

Stock Prediction with Auto Arima Moving average? Cyclical white nose? Good enough? Apple stock price prediction from 2016 and 2018 based on the historical data Auto_arima function from pyramid.arima AIC = 1752 p = 2, q = 2, d = 1

Exploration on the original data

Exploration on the logged data

Looking for Seasonality… Case A Plot and Dickey-Fuller Test Case A: Original data Case B: Original data – Shift-by-1 data Case C: Original data – Shift-by-12 data Case D: Shift-by-1 – Shift-by-12 data Case B Case C Lag P-value Test Statistics Case A 14 1 3.715 > 10% CV Case B 10 2e-5 -5.5 < 1% CV Case C 12 0.18 -2.3 ~> 10% CV Case D 0.071 -2.7 < 10% CV Case D

Looking for Seasonality… Case A Plot and Dickey-Fuller Test logged data Case A: Logged data Case B: Logged data – Shift-by-1 data Case C: Logged data – Shift-by-12 data Case D: Shift-by-1 – Shift-by-12 data Case B Case C Lag P-value Test Statistics Case A 0.978 0.309 > 10% CV Case B 2e-30 -17.3 < 1% CV Case C 12 0.017 -3.25 < 5% CV Case D 0.012 -3.36 < 5% CV Case D

ARIMA Models Parameter Summary p,d,q P, D, Q, s AIC BIC QHIC RSS Case 1 30, 0, 0 0, 0, 0,12 1641 1758 1688 516 Case 2 30,0,0 0,0,2,12 1664 1788 1714 517 Case 3 30,0,2 1662 1794 510 Case 11 30, 1, 3 0,1,3,12 1547 1683 1601 508 Case 12 0, 0, 10 8,4,0,12 1365 1428 1391 1027 Case 13 2, 2, 2 2,2,2,12 1628 1661 591 Case 14 0, 0, 5 25,2,0,12 753 810 776 1452 p = order of Auto-Regressive model d = order of differencing model q = order of Moving-Average model (p,d,q) = non-seasonal components (P,D,Q)s = seasonal components

Case 1

Case 1 Dynamic forecast  Prediction 2016 ~ 2018 stock price using historical data up to the end of 2015 One-step ahead forecast  Predicting the next step using true data Future forecast for 2018 to 2010

Case 2

Case 2

Case 3

Case 3

Case 11 One-step ahead forecast  Predicting the next step using true data Case 11 Dynamic forecast  Prediction using true data up to certain point Future forecast

Case 12

Case 13

Case 14 Best match with random noise

Case 14 Shows surprisingly good match with the actual data for 2 years

Summary Exploratory data analysis (EDA) performed on Apple stock price. Demonstrated that Apple stock price variation stays within the Bollinger Bands for the relative high and low prices. Showed periodicity in the Apple stock price history. Demonstrated different Time-Series prediction methods using ARIMA, which were compared to the actual data. Performed Time-Series forecast of the Apple stock price using ARIMA. To-do Understand the working principle of ARIMA model. Governing equation derivation and recognize their limitations. Apply the ARIMA model to a wider range of stock prices. Apply correlations to different set of economic indicators, such as employment rate, inflation rate, trade deficit, and etc. and see if PCA can be applied.

Backup

Model Selection Criteria Akaike Information Criterion and Bayesian Information Criterion  Most commonly used criteria