CSC 558 – Data Analytics II, Spring, 2018 Time Series Data Analysis Overview notes from Ch. 10 Kotu and Deshpande’s Predictive Analytics and Data Mining. Their slides are pathetic.
Data Patterns Spread Across Multiple Instances Instance gives cross-sectional data. One instance is a slice of a data timeline. Data-driven temporal modeling techniques aggregate data across temporal intervals. Model-driven temporal modeling techniques include time as an independent variable (i.e., as a non-target attribute or attributes).
Data-Driven Approaches Naïve forecast just uses most recent instance as a predictor without aggregating attribute(s) across instances. We will start assignment 3 this way. Simple Average averages attribute(s) for all preceding instances. Moving Average averages attribute(s) for preceding instances within a temporal window.
Data-Driven Approaches continued Weighted moving average applies a decay formula that deemphasizes earlier instances. Fn+1 = (a*yn + b*yn-1 + c*yn-2)/(a+b+c), a > b >c Exponential Smoothing Fn+1 = a * yn + (1 – a) * Fn, a in range [0.0,1.0]. Similar to predictive CPU burst estimation, see slides 14-16 http://faculty.kutztown.edu/parson/secure/osconcepts9th/ch6.ppt
Data-Driven Approaches continued Holt’s Two-Parameter Exponential Smoothing If the series shows changing trends, then we need to compute the slope of recent value changes. Fn+1 = Ln + Tn, where: Ln = a * yn + (1 - a) * (Ln-1 + Tn-1) Tn = b * (Ln - Ln-1) + (1 - b) * Tn-1 I would maintain Ln and Tn as separate derived attributes.
Data-Driven Approaches continued Holt’s Three-Parameter Exponential Smoothing has a variable for seasonal cycles. In assignment 3 we will use my Python script timeSeriesFilter.py to create Time-Lagged Attributes to copy attribute-variables from temporally preceding instances into derived attributes in later instances. Time lagging does not flatten the attributes into aggregate values. It copies them across time.
Model-Driven Approaches Ordered time is just another non-target attribute. Linear Regression, Polynomial Regression, Linear Regression with Seasonality, Autoregression (a “lag series” – see next slides), ARIMA Autoregressive Integrated Moving Average is a methodology.
Linear Regression with Seasonality Photosynthesis cycles from fall 2017 analysis of Dissolved Oxygen in seasonal stream data. http://faculty.kutztown.edu/parson/fall2017/csc458fall2017answers3.pdf http://faculty.kutztown.edu/parson/fall2017/csc458fall2017answers4.pdf Search for photosynthesis. Diurnal. Also water temperature correlates with season.
Time-Lagged Attributes Instances must be sorted on temporal attribute. Weka has a timeseriesForecasting filter library that works with Weka 3.8.2. It requires each instance to be separated by the same identical temporal interval. My timeSeriesFilter.py allows user to sort on one temporal attribute and specify the units of the time lag. steps (instances), units (numeric attributes), usecs, msecs, secs, mins, hours, days, weeks, years
Time-Lagged Attributes Example from last semester: http://pubs.rsc.org/en/Content/ArticleLanding/2016/EW/c6ew00202a#!divAbstract Problem for time lagging is: How much to lag? You need to know the temporal interval before you can lag the data. Weka’s timeseriesForecasting lets you do this in a trial-and-error manner, but for N instances, time complexity is O(N2).
Weka 3.8.2 https://wiki.pentaho.com/display/DATAMINING/Time+Series+Analysis+and+Forecasting+with+Weka
To-do for timeSeriesFilter.py Major+minor sort fields to sort instances. Movement+channel+tick in assignment 3. Averaging and trend/slope support from preceding slides. Straightforward to implement in Python. Figuring the the amount to lag is still a black art, requires some domain expertise.
Other approaches Harmonic analysis – Fourier series as in assignments 1 & 2, adapted to the application data. https://www.britannica.com/science/harmonic-analysis Wavelets Http://agl.cs.unm.edu/~williams/cs530/arfgtw.pdf