Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 558 – Data Analytics II, Spring, 2018

Similar presentations


Presentation on theme: "CSC 558 – Data Analytics II, Spring, 2018"— Presentation transcript:

1 CSC 558 – Data Analytics II, Spring, 2018
Time Series Data Analysis Overview notes from Ch. 10 Kotu and Deshpande’s Predictive Analytics and Data Mining. Their slides are pathetic.

2 Data Patterns Spread Across Multiple Instances
Instance gives cross-sectional data. One instance is a slice of a data timeline. Data-driven temporal modeling techniques aggregate data across temporal intervals. Model-driven temporal modeling techniques include time as an independent variable (i.e., as a non-target attribute or attributes).

3 Data-Driven Approaches
Naïve forecast just uses most recent instance as a predictor without aggregating attribute(s) across instances. We will start assignment 3 this way. Simple Average averages attribute(s) for all preceding instances. Moving Average averages attribute(s) for preceding instances within a temporal window.

4 Data-Driven Approaches continued
Weighted moving average applies a decay formula that deemphasizes earlier instances. Fn+1 = (a*yn + b*yn-1 + c*yn-2)/(a+b+c), a > b >c Exponential Smoothing Fn+1 = a * yn + (1 – a) * Fn, a in range [0.0,1.0]. Similar to predictive CPU burst estimation, see slides

5 Data-Driven Approaches continued
Holt’s Two-Parameter Exponential Smoothing If the series shows changing trends, then we need to compute the slope of recent value changes. Fn+1 = Ln + Tn, where: Ln = a * yn + (1 - a) * (Ln-1 + Tn-1) Tn = b * (Ln - Ln-1) + (1 - b) * Tn-1 I would maintain Ln and Tn as separate derived attributes.

6 Data-Driven Approaches continued
Holt’s Three-Parameter Exponential Smoothing has a variable for seasonal cycles. In assignment 3 we will use my Python script timeSeriesFilter.py to create Time-Lagged Attributes to copy attribute-variables from temporally preceding instances into derived attributes in later instances. Time lagging does not flatten the attributes into aggregate values. It copies them across time.

7 Model-Driven Approaches
Ordered time is just another non-target attribute. Linear Regression, Polynomial Regression, Linear Regression with Seasonality, Autoregression (a “lag series” – see next slides), ARIMA Autoregressive Integrated Moving Average is a methodology.

8 Linear Regression with Seasonality
Photosynthesis cycles from fall 2017 analysis of Dissolved Oxygen in seasonal stream data. Search for photosynthesis. Diurnal. Also water temperature correlates with season.

9 Time-Lagged Attributes
Instances must be sorted on temporal attribute. Weka has a timeseriesForecasting filter library that works with Weka It requires each instance to be separated by the same identical temporal interval. My timeSeriesFilter.py allows user to sort on one temporal attribute and specify the units of the time lag. steps (instances), units (numeric attributes), usecs, msecs, secs, mins, hours, days, weeks, years

10 Time-Lagged Attributes
Example from last semester: Problem for time lagging is: How much to lag? You need to know the temporal interval before you can lag the data. Weka’s timeseriesForecasting lets you do this in a trial-and-error manner, but for N instances, time complexity is O(N2).

11 Weka 3.8.2

12 To-do for timeSeriesFilter.py
Major+minor sort fields to sort instances. Movement+channel+tick in assignment 3. Averaging and trend/slope support from preceding slides. Straightforward to implement in Python. Figuring the the amount to lag is still a black art, requires some domain expertise.

13 Other approaches Harmonic analysis – Fourier series as in assignments 1 & 2, adapted to the application data. Wavelets


Download ppt "CSC 558 – Data Analytics II, Spring, 2018"

Similar presentations


Ads by Google