Time Series Analysis Topics in Machine Learning Fall 2011 School of Electrical Engineering and Computer Science.

Slides:



Advertisements
Similar presentations
Time Series Presented by Vikas Kumar vidyarthi Ph.D Scholar ( ),CE Instructor Dr. L. D. Behera Department of Electrical Engineering Indian institute.
Advertisements

Dates for term tests Friday, February 07 Friday, March 07
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Lesson 12.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Time Series and Forecasting
Forecasting Using the Simple Linear Regression Model and Correlation
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
 Coefficient of Determination Section 4.3 Alan Craig
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
1 BIS APPLICATION MANAGEMENT INFORMATION SYSTEM Advance forecasting Forecasting by identifying patterns in the past data Chapter outline: 1.Extrapolation.
An Introduction to Time Series Ginger Davis VIGRE Computational Finance Seminar Rice University November 26, 2003.
STAT 497 APPLIED TIME SERIES ANALYSIS
9. SIMPLE LINEAR REGESSION AND CORRELATION
Statistical Relationship Between Quantitative Variables
Chapter 13 Forecasting.
Chapter Topics Types of Regression Models
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by descriptive statistic.
Introduction to Regression Analysis, Chapter 13,
Time Series and Forecasting
Exponential Smoothing 1 Ardavan Asef-Vaziri 6/4/2009 Forecasting-2 Chapter 7 Demand Forecasting in a Supply Chain Forecasting -2.2 Regression Analysis.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Relationship of two variables
1 MBF 2263 Portfolio Management & Security Analysis Lecture 2 Risk and Return.
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
STAT 497 LECTURE NOTES 2.
Chapter 3 Describing Bivariate Data General Objectives: Sometimes the data that are collected consist of observations for two variables on the same experimental.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
MANAGERIAL ECONOMICS 11 th Edition By Mark Hirschey.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Regression Regression relationship = trend + scatter
Autocorrelation in Time Series KNNL – Chapter 12.
Week 11 Introduction A time series is an ordered sequence of observations. The ordering of the observations is usually through time, but may also be taken.
Data analyses 2008 Lecture Last Lecture Basic statistics Testing Linear regression parameters Skill.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
© 1999 Prentice-Hall, Inc. Chap Chapter Topics Component Factors of the Time-Series Model Smoothing of Data Series  Moving Averages  Exponential.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
MANAGEMENT SCIENCE AN INTRODUCTION TO
MULTIVARIATE TIME SERIES & FORECASTING 1. 2 : autocovariance function of the individual time series.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Components of Time Series Su, Chapter 2, section II.
Time Series and Forecasting Chapter 16 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Geology 5600/6600 Signal Analysis 11 Sep 2015 © A.R. Lowry 2015 Last time: The Central Limit theorem : The sum of a sequence of random variables tends.
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
ELEC 413 Linear Least Squares. Regression Analysis The study and measure of the statistical relationship that exists between two or more variables Two.
Covariance, stationarity & some useful operators
Time Series Analysis and Its Applications
Micro Economics in a Global Economy
Managerial Economics in a Global Economy
Linear Regression.
M248: Analyzing data Block D UNIT D2 Regression.
Chapter 7 Demand Forecasting in a Supply Chain
Regression and Correlation of Data
Basic descriptions of physical data
Presentation transcript:

Time Series Analysis Topics in Machine Learning Fall 2011 School of Electrical Engineering and Computer Science

Time Series Discussions Overview Basic definitions Time domain Forecasting Frequency domain State space

Why Time Series Analysis? Sometimes the concept we want to learn is the relationship between points in time

Time series: a sequence of measurements over time A sequence of random variables x 1, x 2, x 3, … What is a time series?

Time Series Examples Definition: A sequence of measurements over time Finance Social science Epidemiology Medicine Meterology Speech Geophysics Seismology Robotics

Three Approaches Time domain approach –Analyze dependence of current value on past values Frequency domain approach –Analyze periodic sinusoidal variation State space models –Represent state as collection of variable values –Model transition between states

Sample Time Series Data Johnson & Johnson quarterly earnings/share,

Sample Time Series Data Yearly average global temperature deviations

Sample Time Series Data Speech recording of aaa…hhh, 10k pps

Sample Time Series Data NYSE daily weighted market returns

Not all time data will exhibit strong patterns… LA annual rainfall

…and others will be apparent Canadian Hare counts

Time Series Discussions Overview Basic definitions Time domain Forecasting Frequency domain State space

Definitions Mean Variance mean variance

Definitions Covariance Correlation r=0.0 r=0.7 r=1.0 r=-1.0

Definitions Covariance Correlation

Y X Y X Y X r = -1 r = -.6r = 0 Y X Y X r = +.3 r = +1 Y X r = 0 Correlation

Redefined for Time Ergodic? Mean function Autocorrelation lag Autocovariance

Autocorrelation Examples Positive lag Negative lag

Stationarity – When there is no relationship {X t } is stationary if – X (t) is independent of t – X (t+h,t) is independent of t for each h In other words, properties of each section are the same Special case: white noise

Time Series Discussions Overview Basic definitions Time domain Forecasting Frequency domain State space

Linear Regression Fit a line to the data Ordinary least squares –Minimize sum of squared distances between points and line Try this out at y = x +

R 2 : Evaluating Goodness of Fit Least squares minimizes the combined residual Explained sum of squares is difference between line and mean Total sum of squares is the total of these two y = x +

R 2 : Evaluating Goodness of Fit R 2, the coefficient of determination 0 R 2 1 Regression minimizes RSS and so maximizes R 2 y = x +

R 2 : Evaluating Goodness of Fit

Linear Regression Can report: –Direction of trend (>0, <0, 0) –Steepness of trend (slope) –Goodness of fit to trend (R 2 )

Examples

What if a linear trend does not fit my data well? Could be no relationship Could be too much local variation –Want to look at longer-term trend –Smooth the data Could have periodic or seasonality effects –Add seasonal components Could be a nonlinear relationship

Moving Average Compute an average of the last m consecutive data points 4-point moving average is Smooths white noise Can apply higher-order MA Exponential smoothing Kernel smoothing

Power Load Data 5 week 53 week

Piecewise Aggregate Approximation Segment the data into linear pieces Interesting paper

Nonlinear Trend Examples

Nonlinear Regression

Fit Known Distributions

ARIMA: Putting the pieces together Autoregressive model of order p: AR(p) Moving average model of order q: MA(q) ARMA(p,q)

ARIMA: Putting the pieces together Autoregressive model of order p: AR(p) Moving average model of order q: MA(q) ARMA(p,q)

AR(1),

ARIMA: Putting the pieces together Autoregressive model of order p: AR(p) Moving average model of order q: MA(q) ARMA(p,q)

ARIMA: Putting the pieces together Autoregressive model of order p: AR(p) Moving average model of order q: MA(q) ARMA(p,q) –A time series is ARMA(p,q) if it is stationary and

ARMA Start with AR(1) sequence This means Which we can solve given roots z i

ARIMA (AutoRegressive Integrated Moving Average) ARMA only applies to stationary process Apply differencing to obtain stationarity –Replace its value by incremental change from last value A process x t is ARIMA(p,d,q) if –AR(p) –MA(q) –Differenced d times Also known as Box Jenkins Differencedx1x2x3x4 1 timex2-x1x3-x2x4-x3 2 timesx3-2x2+x1x4-2x3+x2

Time Series Discussions Overview Basic definitions Time domain Forecasting Frequency domain State space

Express Data as Fourier Frequencies Time domain –Express present as function of the past Frequency domain –Express present as function of oscillations, or sinusoids

Time Series Definitions Frequency,, measured at cycles per time point J&J data –1 cycle each year –4 data points (time points) each cycle –0.25 cycles per data point Period of a time series, T = 1/ –J&J, T = 1/.25 = 4 –4 data points per cycle –Note: Need at least 2

Fourier Series Time series is a mixture of oscillations –Can describe each by amplitude, frequency and phase –Can also describe as a sum of amplitudes at all time points –(or magnitudes at all frequencies) –If we allow for mixtures of periodic series then Take a look

Example

How Compute Parameters? Regression Discrete Fourier Transform DFTs represent amplitude and phase of series components Can use redundancies to speed it up (FFT)

Breaking down a DFT Amplitude Phase

Example GBP GBP GBP GBP GBP GBP 1 frequency 2 frequencies 3 frequencies 5 frequencies 10 frequencies 20 frequencies

Periodogram Measure of squared correlation between –Data and –Sinusoids oscillating at frequency of j/n –Compute quickly using FFT

Example P(6/100) = 13, P(10/100) = 41, P(40/100) = 85

Wavelets Can break series up into segments –Called wavelets –Analyze a window of time separately –Variable-sized windows

Time Series Discussions Overview Basic definitions Time domain Forecasting Frequency domain State space

State Space Models Current situation represented as a state –Estimate state variables from noisy observations over time –Estimate transitions between states Kalman Filters –Similar to HMMs HMM models discrete variables Kalman filters models continuous variables

Conceptual Overview Lost on a 1-dimensional line Receive occasional sextant position readings –Possibly incorrect Position x(t), Velocity x(t) x

Conceptual Overview Current location distribution is Gaussian Transition model is linear Gaussian The sensor model is linear Gaussian Sextant Measurement at t i : Mean = i and Variance = 2 i Measured Velocity at t i : Mean = i and Variance = 2 i Noisy information

Kalman Filter Algorithm Start with current location (Gaussian) Predict next location –Use current location –Use transition function (linear Gaussian) –Result is Gaussian Get next sensor measurement (Gaussian) Correct prediction –Weighted mean of previous prediction and measurement

Conceptual Overview We generate the prediction for time i+, prediction is Gaussian GPS Measurement: Mean = i+ and Variance = 2 i + They do not match prediction Measurement at i

Conceptual Overview Corrected mean is the new optimal estimate of position New variance is smaller than either of the previous two variances measurement at i+ corrected estimate prediction

Updating Gaussian Distributions One-step predicted distribution is Gaussian After new (linear Gaussian) evidence, updated distribution is Gaussian PriorTransitionPrevious step New measurement

Why Is Kalman Great? The method, that is… Representation of state-based series with general continuous variables grows without bound

Why Is Time Series Important? Time is an important component of many processes Do not ignore time in learning problems ML can benefit from, and in turn benefit, these techniques –Dimensionality reduction of series –Rule discoveryRule discovery –Cluster seriesCluster series –Classify seriesClassify series –Forecast data points –Anomaly detectionAnomaly detection