Income Forecasting
Introduction For some, budgeting while planning for the future can be difficult. In general, people don't know exactly what to expect of their income in the future. Because of this we decided to apply time series prediction algorithms to forecast probable future income.
Abstract This project was decided to be a income forecasting program to find the probable income of a budget given the income history of the budget. We looked into several algorithms to forecast over a time series including: Forward-Backward Moving average model (SMA) Autoregressive integrated moving average (ARIMA) Our goal is to measure the accuracy of their predictions over the time series, as well as compare their time complexity
Problem Statement Natural Language: Formal: Given income history, forecast the probable future income using different algorithm approaches, while measuring their accuracy and execution time. Formal: The forecasting algorithms are measured by comparing their level of error between the forecasted values and the actual values. This difference between the forecast value and the actual value for the corresponding period is expressed as. where the error (E) at period (i) is the absolute value of forecast (F) at period (i) minus the actual value (Y) at period (i). The lower the value E the more accurate the algorithm is for the given period (i)
Problem Statement(cont.) The mean of these absolute error values, known as the mean absolute error(MAE), is given by the equation: Where the number (n) is the absolute errors calculated and (E) is the absolute error calculated at period (i). The MAE shows the average accuracy of each algorithm for a given set of data for the sake of comparison.
Introducing AR, MA and ARMA Auto-Regressive: Model in which Y(t) depends only on its own past values. Y(t) = φ1Y(t−1) + φ2Y(t−2) + . . . + φpY(t−p) + ౬t . Moving Average: Model in which Y(t) depends only on the random error terms, which follows the white noise process. Earthquake Example. Y(t) = ౬t + θ1౬(t−1) + . . . + θq౬(t−q). ARMA(p,q) model: Provides simple tool in time series modeling. Y(t) = φ1Y(t−1) + φ2Y(t−2) + . . . + φpY(t−p) + ౬t + θ1౬(t−1) + . . . + θq౬(t−q).
Pure ARIMA Models General statistical model which is widely used in the field of time series analysis. Given a time series data Y(t), mathematically the pure model is written:
Autoregressive Integrated Moving Average(ARIMA) ARIMA is an acronym for AutoRegressive Integrated Moving-Average. The order of an ARIMA model is usually denoted by ARIMA(p,d,q) Where: p (AR) is the order of the autoregressive part d (I) is the order of differencing q (MA) is the order of the moving-average process -Where L is the lag Operator, alpha(i) are the parameters of autoregressive and theta(i), the parameters of the moving average and (E)t are errors
Implementing a Time Series Analysis 1) Visualize the time series 2) Stationarize the series 3) Plot ACF/PACF charts and find optimal parameters 4) Build the Arima Model 5)Make predictions
Using ARIMA model to forecast values Years Income($) Forecast 2005 200,969 2006 213,189 2007 221,205 2008 215,543 2009 204,402 2010 213,178 2011 228,438 226,425 2012 239,493 236,729 2013 246,313 245,369 2014 250,775 252,692 2015 259,003 Using ARIMA model to forecast values RunTime = 0.05153418 secs
Simple Moving average model (SMA) In a time series, it takes an average of the most recent (Y) values, for some integer n.This is the so-called moving average model (SMA), and its equation for predicting the value of Y at time t+1 based on data up to time t is: n=number of periods in the moving average n can have 3, 5 Periods or more depending on size of the data set y=demand in periods of time
Simple Moving Average Cont. Years income Forecast 2003 178,694 2004 190,253 2005 200,969 2006 213,189 189,972 2007 221,543 201,470 2008 215,543 211,900 2009 204,402 216,758 2010 213,178 213,829 2011 228,438 211,041 2012 239,493 215,339 2013 246,313 227,036 2014 250,775 238,081 2015 Unknown 245,527 Sdfd RunTime =2.706e-06 3 time periods were used for calculations SMA Complexity= O(n)
Forward-Backward Algorithm Also known as hidden Markov models(HMM), evaluates the probability of a sequence of observations occurring when following a given sequence of states. This can be stated as: Where A and B are matrices. x= sequence of states y=number of observation I=current state k= is the number of time steps Xj=i=probability of being in state i at time j
Forward - Backward (HMM) Cont. HMM is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. Analyses of HMM seek to recover the sequence of states from the observed data.
Forward-backward implementation Years Income Forecast (p) 2012 239,493 225,245 2013 246,313 240,625 2014 250,775 246,621 2015 P(x,y) 255.682 states = ('Future', 'Past') end_state = 'E' observations = ('260', '246', '250') start_probability = {'Future': 0.97, 'Past': 0.3} transition_probability = { 'Future' : {'Future': 0.69, 'Past': 0.3, 'E': 0.01}, 'Past' : {'Future': 0.4, 'Past': 0.59, 'E': 0.01}, } emission_probability = { 'Future' : {'260': 0.5, '246': 0.4, '250': 0.1}, 'Past' : {'260': 0.1, '246': 0.3, '250': 0.6}, (years) these are the probabilities 0f 15 states: {'Past': 0.00109578, 'Future': 0.0010418399999999998} {'Past': 0.00394, 'Future': 0.00249} {'Past': 0.01, 'Future': 0.01} these are the probabilities : {'Past': 0.1229889624426741, 'Future': 0.8770110375573259} {'Past': 0.3767719690490461, 'Future': 0.623228030950954} {'Past': 0.7890472951586943, 'Future': 0.2109527048413057} Run Time =0.0318363 sec
Forward - backward cont Backward probability Assume that we start in a particular state (Xt=xi), T=transition Uses a column vector Years Income Forecast (p) 2012 239,493 225,245 2013 246,313 240,625 2014 250,775 246,621 2015 P(x,y) 255.682
Forecast Comparison
Error Comparison MSE for SMA: 18,380.5 MSE for ARIMA: 1,909.5 MSE for FB: 8,029.67
Conclusion ARIMA was the more accurate out of the three algorithms. The ARIMA algorithm can be used to predict fairly accurate yearly income predictions. -HMM FB was second best Model, however the results can vary depending on the initial states. Therefore the result can differ depending on the probability of reaching the next state. HMM is suitable for other types of predictions such as weather, shuffling or stocks since the states change constantly.
Other Work in Income Forecasting Digit has an algorithm that “learns a user's spending and earning patterns”[3] A website called Buxfer offers personal finance forecasting as well There has been much work in comparing the algorithms themselves such as “Time Series Prediction Algorithms” by Kumara M.P.T.R.
Question Q: What is the difference between using the ARMA and the ARIMA models? A: The ARIMA model converts non-stationary data to stationary before operating on it.
Question Q: Why is the simple moving average(SMA) less accurate than ARIMA? A: Arima takes more constraints into account: central moving averages(CMA), weighted moving average (WMA), seasonality and lag (failure to maintain a desired pace) ,
Question Q: What is the Big(O) of HMM? A: n is the number of hidden or latent variables m is the number of observed sequences of observed variables
Questions?
References https://en.wikipedia.org/wiki/Forecasting http://www.slideshare.net/tharindurusira/time-series-prediction- algorithms-literature-review http://www.bloomberg.com/news/articles/2015-07-14/should-you-let-an- algorithm-do-your-saving-for-you- https://catalog.data.gov/dataset?groups=finance3432#topic=finance_navi gation https://people.csail.mit.edu/rameshvs/content/hmms.pdf
References cont. http://web.stanford.edu/class/cs227/Readings/ConstraintsSurveyByKumar.pdf http://docs.roguewave.com/imsl/java/6.1/manual/WordDocuments/api/com/imsl/stat/AutoARIMAEx2.htm https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average http://pages.stern.nyu.edu/~narchak/cmdp_for_budget_optimization.pdf https://investorjunkie.com/39713/digit-review/ https://www.youtube.com/watch?v=Y2khrpVo6qI http://www.bloomberg.com/news/articles/2015-07-14/should-you-let-an-algorithm-do-your-saving-for-you-