Présentation EPFL-Public | PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet, Puneet Sharma, Saurabh Jain, Mihaela Turcu, Nicolas Tran, Thomas Mühlematter
Présentation EPFL-Public | They are EVERYWHERE (financial data, meteorological data…) People measure things… things change over time! What are time series? MotivationTime series A time series is a collection of observations made sequentially in time
Présentation EPFL-Public | Tasks Query by Content e.g. “Find past sales patterns that resemble last month” e.g. “List all time series with temperature value 70-80” Motivation Clustering Retrieving Data of interest
Présentation EPFL-Public | (Need for) Preprocessing & Transformation SystemNeeds average value of A average value of B A B Subjectivity Different sampling rates Noise, missing data Normalization Amplitude Scaling Resampling Digital Filters DFT Different Distance Measures TRANSFORMATIONS
Présentation EPFL-Public | (Need for) Compression & Indexing Very Large Datasets High-Dimensional Data TRANSFORMATIONS INDEXINGCOMPRESSION SystemNeeds
Présentation EPFL-Public | System Overview Overview On top of Spark Development in Scala and Java Offline Framework Support for: Custom backends Custom data types Pluggable indexes System
Présentation EPFL-Public | Piece-wise Linear Representation (PLR) PL Representation Divide the time series in a set of disjoint segments Model each segment using regression For each modeled segment store: Start time, End time Minimum value, maximum value Model coefficients T_startT_endV_minV_max Model coefficients Tunable parameters such as degree N of polynomial curve and maximum Mean Absolut Error System
Présentation EPFL-Public | Querying compressed data SystemPL Representation - Querying Supported Queries: Time point or range query Value point or range query Composite query
Présentation EPFL-Public | SAX Representation Tunable Parameters: word size & alphabet size (cardinality) System Cardinality Promotion {1, 1, 0, 0} => {11, 11, 01, 00}
Présentation EPFL-Public | Indexing SAX System “Similar” Time Series Same SAX word Indexing SAX Approximate Search: Terminal Node with same SAX representation as the query Exact Search: Approximate Search for pruning Tunable parameter: Number of Time series in a terminal node
Présentation EPFL-Public | Scala console tweaking 1.Pseudo-sql statements starting with single quote (') 2.Conversion to Scala 3.Execution Data insertion From CSV scala> 'INSERT csv("path/to/file") INTO timeseries; Using Scala Variables scala> val dna = scala.io.Source.fromPath("path/to/dna").map({ case 'A' => 1; case 'C' => 2;.... }) scala> 'CREATE humanDNA (encodedBase BYTE) BACKEND RowStore scala> INTO humanDNA Column selection scala> 'SELECT column1, column3 FROM timeseriesY WHERE column1 > 2 AND column1 < 3 scala> import past.Transformations scala> FROM timeseriesY Command Line Utility SystemCommand Line Utility
Présentation EPFL-Public | Thank You! PastThank You!