Presentation is loading. Please wait.

Presentation is loading. Please wait.

Présentation EPFL-Public | 2014 1 PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet,

Similar presentations


Presentation on theme: "Présentation EPFL-Public | 2014 1 PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet,"— Presentation transcript:

1 Présentation EPFL-Public | 2014 1 PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet, Puneet Sharma, Saurabh Jain, Mihaela Turcu, Nicolas Tran, Thomas Mühlematter

2 Présentation EPFL-Public | 2014 2 They are EVERYWHERE (financial data, meteorological data…) People measure things… things change over time! What are time series? MotivationTime series A time series is a collection of observations made sequentially in time 050100150200250300350400450500 23 24 25 26 27 28 29 25.2250 25.2500 25.2750 25.3250 25.3500 25.4000 25.3250 25.2250 25.2000 25.1750.. 24.6250 24.6750 24.6250 24.6750 24.7500

3 Présentation EPFL-Public | 2014 3 Tasks Query by Content e.g. “Find past sales patterns that resemble last month” e.g. “List all time series with temperature value 70-80” Motivation Clustering Retrieving Data of interest

4 Présentation EPFL-Public | 2014 4 (Need for) Preprocessing & Transformation SystemNeeds average value of A average value of B A B Subjectivity Different sampling rates Noise, missing data Normalization Amplitude Scaling Resampling Digital Filters DFT Different Distance Measures TRANSFORMATIONS

5 Présentation EPFL-Public | 2014 5 (Need for) Compression & Indexing Very Large Datasets High-Dimensional Data TRANSFORMATIONS INDEXINGCOMPRESSION SystemNeeds

6 Présentation EPFL-Public | 2014 6 System Overview Overview On top of Spark Development in Scala and Java Offline Framework Support for: Custom backends Custom data types Pluggable indexes System

7 Présentation EPFL-Public | 2014 7 Piece-wise Linear Representation (PLR) PL Representation Divide the time series in a set of disjoint segments Model each segment using regression For each modeled segment store: Start time, End time Minimum value, maximum value Model coefficients T_startT_endV_minV_max Model coefficients Tunable parameters such as degree N of polynomial curve and maximum Mean Absolut Error System

8 Présentation EPFL-Public | 2014 8 Querying compressed data SystemPL Representation - Querying Supported Queries: Time point or range query Value point or range query Composite query

9 Présentation EPFL-Public | 2014 9 SAX Representation Tunable Parameters: word size & alphabet size (cardinality) System Cardinality Promotion {1, 1, 0, 0} => {11, 11, 01, 00}

10 Présentation EPFL-Public | 2014 10 Indexing SAX System “Similar” Time Series Same SAX word Indexing SAX Approximate Search: Terminal Node with same SAX representation as the query Exact Search: Approximate Search for pruning Tunable parameter: Number of Time series in a terminal node

11 Présentation EPFL-Public | 2014 11 Scala console tweaking 1.Pseudo-sql statements starting with single quote (') 2.Conversion to Scala 3.Execution Data insertion From CSV scala> 'INSERT csv("path/to/file") INTO timeseries; Using Scala Variables scala> val dna = scala.io.Source.fromPath("path/to/dna").map({ case 'A' => 1; case 'C' => 2;.... }) scala> 'CREATE humanDNA (encodedBase BYTE) BACKEND RowStore scala> 'INSERT @dna INTO humanDNA Column selection scala> 'SELECT column1, column3 FROM timeseriesY WHERE column1 > 2 AND column1 < 3 scala> import past.Transformations scala> 'SELECT @Transformations.mean(columnX) FROM timeseriesY Command Line Utility SystemCommand Line Utility

12 Présentation EPFL-Public | 2014 12 Thank You! PastThank You!


Download ppt "Présentation EPFL-Public | 2014 1 PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet,"

Similar presentations


Ads by Google