Machine Learning Week 4
Basic Probability Envision an experiment for which the result is unknown. The collection of all possible outcomes is called the sample space. A set of outcomes, or subset of the sample space, is called an event. A probability space is a three-tuple (W ,, Pr) where W is a sample space, is a collection of events from the sample space and Pr is a probability law that assigns a number to each event in . For any events A and B, Pr must satsify: Pr() = 1 Pr(A) 0 Pr(AC) = 1 – Pr(A) Pr(A B) = Pr(A) + Pr(B), if A B = . If A and B are events in with Pr(B) 0, the conditional probability of A given B is
Random Variables A random variable is “a number that you don’t know… yet” Discrete vs. Continuous Cumulative distribution function Density function Probability distribution (mass) function Joint distributions Conditional distributions Functions of random variables Moments of random variables Transforms and generating functions
Conditioning Frequently, the conditional distribution of Y given X is easier to find than the distribution of Y alone. If so, evaluate probabilities about Y using the conditional distribution along with the marginal distribution of X: Example: Draw 2 balls simultaneously from a jar containing four balls numbered 1, 2, 3 and 4. X = number on the first ball, Y = number on the second ball, Z = XY. What is Pr(Z > 5)? Key: Maybe easier to evaluate Z if X is known
Moments of Random Variables Expectation = “average” Variance = “volatility” Standard Deviation Coefficient of Variation
Linear Functions of Random Variables Covariance Correlation If X and Y are independent then
Bernoulli Distribution “Single coin flip” p = Pr(success) N = 1 if success, 0 otherwise Chapter 0
Binomial Distribution “n independent coin flips” p = Pr(success) N = # of successes Chapter 0
Geometric Distribution “independent coin flips” p = Pr(success) N = # of flips until (including) first success Chapter 0
Poisson Distribution “Occurrence of rare events” = average rate of occurrence per period; N = # of events in an arbitrary period Chapter 0
Uniform Distribution X is equally likely to fall anywhere within interval (a,b) a b Chapter 0
Exponential Distribution X is nonnegative and it is most likely to fall near 0 Also memoryless; more on this later… Chapter 0
Normal Distribution X follows a “bell-shaped” density function From the central limit theorem, the distribution of the sum of independent and identically distributed random variables approaches a normal distribution as the number of summed random variables goes to infinity. Chapter 0
Stochastic Processes A stochastic process is a random variable that changes over time, or a sequence of numbers that you don’t know yet. Poisson process Continuous time Markov chains Chapter 0
Time Series Autoregressive (AR) and Moving Average (MA) models
Time Series A time series is a sequence of numerical data in which each item is associated with a particular instant in time In fact with the current progress in computer technology we have daily series on interest rates, the hourly "telerate" interest rate index, and stock prices by the minute (or even second).
An analysis of a single sequence of data is called univariate time-series analysis An analysis of several sets of data for the same sequence of time periods is called multivariatetime-series analysis or, more simply, multiple time-series analysis
Stochastic processes Time series are an example of a stochastic or random process A stochastic process is 'a statistical phenomenen that evolves in time according to probabilistic laws' Mathematically, a stochastic process is an indexed collection of random variables
Stochastic processes We are concerned only with processes indexed by time, either discrete time or continuous time processes such as
Continuous vs. Discrete We base our inference usually on a single observation or realization of the process over some period of time, say [0, T] (a continuous interval of time) or at a sequence of time points {0, 1, 2, . . . T}
Specification of a process A simpler approach is to only specify the moments—this is sufficient if all the joint distributions are normal The mean and variance functions are given by
Autocovariance Because the random variables comprising the process are not independent, we must also specify their covariance
Autocorrelation It is useful to standardize the autocovariance function (acvf) Consider stationary case only Use the autocorrelation function (acf)
Stationarity Inference is most easy, when a process is stationary—its distribution does not change over time This is strict stationarity A process is weakly stationary if its mean and autocovariance functions do not change over time
Weak stationarity The autocovariance depends only on the time difference or lag between the two time points involved
White noise This is a purely random process, a sequence of independent and identically distributed random variables Has constant mean and variance Also
Several Models for Time Series (1) a purely random process, (2) a random walk, (3) a movingaverage (MA) process, (4) an autoregressive (AR) process, (5) an autoregressive movingaverage (ARMA) process, and (6) an autoregressive integrated moving average (ARIMA)process.
Purely Random Process Auto-covariance function Auto-correlation function
Random Walk
Moving average processes Start with {Zt} being white noise or purely random, mean zero, s.d. Z {Xt} is a moving average process of order q (written MA(q)) if for some constants 0, 1, . . . q we have Usually 0 =1
Moving average processes The mean and variance are given by The process is weakly stationary because the mean is constant and the covariance does not depend on t
Moving average processes If the Zt's are normal then so is the process, and it is then strictly stationary The autocorrelation is
Moving average processes Note the autocorrelation cuts off at lag q For the MA(1) process with 0 = 1
Moving average processes In order to ensure there is a unique MA process for a given acf, we impose the condition of invertibility This ensures that when the process is written in series form, the series converges For the MA(1) process Xt = Zt + Zt - 1, the condition is ||< 1
Moving average processes For general processes introduce the backward shift operator B Then the MA(q) process is given by
Moving average processes The general condition for invertibility is that all the roots of the equation lie outside the unit circle (have modulus less than one)
Autoregressive processes Assume {Zt} is purely random with mean zero and s.d. z Then the autoregressive process of order p or AR(p) process is
Autoregressive processes The first order autoregression is Xt = Xt - 1 + Zt Provided ||<1 it may be written as an infinite order MA process Using the backshift operator we have (1 – B)Xt = Zt
Autoregressive processes From the previous equation we have
Autoregressive processes Then E(Xt) = 0, and if ||<1
Autoregressive processes The AR(p) process can be written as
Autoregressive processes This is for for some 1, 2, . . . This gives Xt as an infinite MA process, so it has mean zero
Autoregressive processes Conditions are needed to ensure that various series converge, and hence that the variance exists, and the autocovariance can be defined Essentially these are requirements that the i become small quickly enough, for large i
Autoregressive processes The i may not be able to be found however. The alternative is to work with the i The acf is expressible in terms of the roots i, i=1,2, ...p of the auxiliary equation
Autoregressive processes Then a necessary and sufficient condition for stationarity is that for every i, |i|<1 An equivalent way of expressing this is that the roots of the equation must lie outside the unit circle
ARMA processes Combine AR and MA processes An ARMA process of order (p,q) is given by
ARMA processes Alternative expressions are possible using the backshift operator
ARMA processes An ARMA process can be written in pure MA or pure AR forms, the operators being possibly of infinite order Usually the mixed form requires fewer parameters
ARIMA processes General autoregressive integrated moving average processes are called ARIMA processes When differenced say d times, the process is an ARMA process Call the differenced process Wt. Then Wt is an ARMA process and
ARIMA processes Alternatively specify the process as This is an ARIMA process of order (p,d,q)
ARIMA processes The model for Xt is non-stationary because the AR operator on the left hand side has d roots on the unit circle d is often 1 Random walk is ARIMA(0,1,0) Can include seasonal terms—see later
Non-zero mean We have assumed that the mean is zero in the ARIMA models There are two alternatives mean correct all the Wt terms in the model incorporate a constant term in the model