Présentation EPFL-Public | 2014 1 PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet,

Slides:



Advertisements
Similar presentations
GAMPS COMPRESSING MULTI SENSOR DATA BY GROUPING & AMPLITUDE SCALING
Advertisements

Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer.
Indexing DNA Sequences Using q-Grams
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
Mining Time Series.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Standard Tries Compressed Tries Suffix Tries.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Interactive Pattern Search in Time Series (Using TimeSearcher 2) Paolo Buono, Aleks Aris, Catherine Plaisant, Amir Khella, and Ben Shneiderman Proceedings,
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Curve-Fitting Regression
Data Mining: Concepts and Techniques Mining time-series data.
CS490D: Introduction to Data Mining Prof. Chris Clifton
Based on Slides by D. Gunopulos (UCR)
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Data Mining – Intro.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Indexing Time Series.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Slides 13b: Time-Series Models; Measuring Forecast Error
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
LSS Black Belt Training Forecasting. Forecasting Models Forecasting Techniques Qualitative Models Delphi Method Jury of Executive Opinion Sales Force.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Multimedia and Time-series Data
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Analysis of Constrained Time-Series Similarity Measures
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Store segmentation using SAS clustering Baofu Ma Merchandising AUTOZONE ANALYST,MERCH RESEARCH.
Subsequence Matching in Time Series Databases Xiaojin Xu
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Curve-Fitting Regression
Mining Time Series.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
Time-Series Forecasting Overview Moving Averages Exponential Smoothing Seasonality.
Efficient Metric Index For Similarity Search Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen.
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Wobbles, humps and sudden jumps1 Transitions in time: what to look for and how to describe them …
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
VizTree Huyen Dao and Chris Ackermann. Introducing example
Thomas Heinis* Eleni Tzirita Zacharatou ‡ Farhan Tauheed § Anastasia Ailamaki ‡ RUBIK: Efficient Threshold Queries on Massive Time Series § Oracle Labs,
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Data Mining ICCM
Data Transformation: Normalization
Hadoop Tutorials Spark
Basic Estimation Techniques
Spark Presentation.
Supervised Time Series Pattern Discovery through Local Importance
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Chapter 12 Using Descriptive Analysis, Performing
A Time Series Representation Framework Based on Learned Patterns
Basic Estimation Techniques
Linear regression Fitting a straight line to observations.
Wikipedia Traffic Forecasting
Tries 2/27/2019 5:37 PM Tries Tries.
Minwise Hashing and Efficient Search
PlatoDB: Fast Approximating Statistic Queries over Sensor Data with Tight Error Guarantees Chunbin Lin, Etienne Boursier, Korhan Demirkaya, Jacque Brito,
Presentation transcript:

Présentation EPFL-Public | PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet, Puneet Sharma, Saurabh Jain, Mihaela Turcu, Nicolas Tran, Thomas Mühlematter

Présentation EPFL-Public | They are EVERYWHERE (financial data, meteorological data…) People measure things… things change over time! What are time series? MotivationTime series A time series is a collection of observations made sequentially in time

Présentation EPFL-Public | Tasks Query by Content e.g. “Find past sales patterns that resemble last month” e.g. “List all time series with temperature value 70-80” Motivation Clustering Retrieving Data of interest

Présentation EPFL-Public | (Need for) Preprocessing & Transformation SystemNeeds average value of A average value of B A B Subjectivity Different sampling rates Noise, missing data Normalization Amplitude Scaling Resampling Digital Filters DFT Different Distance Measures TRANSFORMATIONS

Présentation EPFL-Public | (Need for) Compression & Indexing Very Large Datasets High-Dimensional Data TRANSFORMATIONS INDEXINGCOMPRESSION SystemNeeds

Présentation EPFL-Public | System Overview Overview On top of Spark Development in Scala and Java Offline Framework Support for: Custom backends Custom data types Pluggable indexes System

Présentation EPFL-Public | Piece-wise Linear Representation (PLR) PL Representation Divide the time series in a set of disjoint segments Model each segment using regression For each modeled segment store: Start time, End time Minimum value, maximum value Model coefficients T_startT_endV_minV_max Model coefficients Tunable parameters such as degree N of polynomial curve and maximum Mean Absolut Error System

Présentation EPFL-Public | Querying compressed data SystemPL Representation - Querying Supported Queries: Time point or range query Value point or range query Composite query

Présentation EPFL-Public | SAX Representation Tunable Parameters: word size & alphabet size (cardinality) System Cardinality Promotion {1, 1, 0, 0} => {11, 11, 01, 00}

Présentation EPFL-Public | Indexing SAX System “Similar” Time Series Same SAX word Indexing SAX Approximate Search: Terminal Node with same SAX representation as the query Exact Search: Approximate Search for pruning Tunable parameter: Number of Time series in a terminal node

Présentation EPFL-Public | Scala console tweaking 1.Pseudo-sql statements starting with single quote (') 2.Conversion to Scala 3.Execution Data insertion From CSV scala> 'INSERT csv("path/to/file") INTO timeseries; Using Scala Variables scala> val dna = scala.io.Source.fromPath("path/to/dna").map({ case 'A' => 1; case 'C' => 2;.... }) scala> 'CREATE humanDNA (encodedBase BYTE) BACKEND RowStore scala> INTO humanDNA Column selection scala> 'SELECT column1, column3 FROM timeseriesY WHERE column1 > 2 AND column1 < 3 scala> import past.Transformations scala> FROM timeseriesY Command Line Utility SystemCommand Line Utility

Présentation EPFL-Public | Thank You! PastThank You!