Optimization of Sequence Queries in Database Systems

Slides:



Advertisements
Similar presentations
Jing-Shin Chang1 Regular Expression: Syntax for Specifying String Patterns Basic Alphabet empty-string: any symbol a in input symbol set Basic Operators.
Advertisements

© 2004 Goodrich, Tamassia Pattern Matching1. © 2004 Goodrich, Tamassia Pattern Matching2 Strings A string is a sequence of characters Examples of strings:
Space-for-Time Tradeoffs
String Searching Algorithm
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo Amir Zarkesh Jafar.
Algorithms and Data Structures. /course/eleg67701-f/Topic-1b2 Outline  Data Structures  Space Complexity  Case Study: string matching Array implementation.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
KMP String Matching Prepared By: Carlens Faustin.
String Matching Fundamental Data Structures and Algorithms April 22, 2003.
MCS 101: Algorithms Instructor Neelima Gupta
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Plagiarism detection Yesha Gupta.
MCS 101: Algorithms Instructor Neelima Gupta
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA.
CS5263 Bioinformatics Lecture 15 & 16 Exact String Matching Algorithms.
Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo Amir Zarkesh Jafar.
A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir.
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
String Searching 2 of 2. String search Simple search –Slide the window by 1 t = t +1; KMP –Slide the window faster t = t + s – M[s] –Never recheck the.
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
Patterns in Sequences and Data Streams
CSG523/ Desain dan Analisis Algoritma
CSE 326: Data Structures: Advanced Topics
Advanced Algorithms Analysis and Design
String Matching (Chap. 32)
CS 430: Information Discovery
Advanced Algorithm Design and Analysis (Lecture 12)
13 Text Processing Hongfei Yan June 1, 2016.
Database Performance Tuning and Query Optimization
Chapter 3 String Matching.
Rabin & Karp Algorithm.
CSCE350 Algorithms and Data Structure
Evaluation of Relational Operations: Other Operations
Chapter 3 String Matching.
Space-for-time tradeoffs
Tuesday, 12/3/02 String Matching Algorithms Chapter 32
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
String-Matching Algorithms (UNIT-5)
Chapter 7 Space and Time Tradeoffs
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
KMP String Matching Donald Knuth Jim H. Morris Vaughan Pratt 1997.
Space-for-time tradeoffs
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong.
Overview of Query Evaluation
Space-for-time tradeoffs
Evaluation of Relational Operations: Other Techniques
Knuth-Morris-Pratt Algorithm.
Chap 3 String Matching 3 -.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Space-for-time tradeoffs
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Space-for-time tradeoffs
Sequences 5/17/ :43 AM Pattern Matching.
15-826: Multimedia Databases and Data Mining
Analysis and design of algorithm
Evaluation of Relational Operations: Other Techniques
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Week 14 - Wednesday CS221.
Presentation transcript:

Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo reza@cs.ucla.edu zaniolo@cs.ucla.edu sadri@procom.com Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com

Time series Analysis Many Applications: What’s needed: Querying purchase patterns for marketing Stock market analysis Studying meteorological data What’s needed: Expressive query language for finding complex patterns in database sequences Efficient and scalable implementation: Query Optimization

SQL-TS A query language for finding complex patterns in sequences Minimal extension of SQL—only the from clause affected A new Query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

Text Search Optimization Boyer and Moore Precomputed shift functions for each character and sub-pattern Dependent to the alphabet size Works best for non-repeating patterns O(mn) worst case time Knuth, Morris and Pratt (KMP) Independent of the alphabet size Most efficient in general: O(m+n) time Karp and Rabin Prefix Hashing This database people have overlooked great classical works in the searching text sequences is ignored Kmp is not only appears to be more efficient but also doesn’t have dependency to the alphabet. It

Optimized string search:KMP Consider text array text and pattern array p: i 1 2 3 4 5 6 7 8 9 10 11 text[i] a b a b a b c a b c a j 1 2 3 4 5 6 pattern[j] a b a b c a ­ After failing, use the information acquired so to: - backtrack to shift(j), rather than i+1, and - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns Fix this

shift and next Success for first j-1 elements of pattern. Failure for jth element (when input is at i) Any shift less than shift(j) is guaranteed to lead to failure, Match elements in the pattern starting at next(j) Shifted Pattern i – j + 1 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) shift(j) + 1 shift(j) + next(j) i j next(j) j - shift(j) Input Pattern shift(j)

Equality Predicates: KMP suffices Find companies whose closing stock price in three consecutive days was 10, 11, and 15. SELECT X.name FROM quote CLUSTER BY name SEQUENCE BY date AS (X, Y, Z) WHERE X.price =10 AND Y.price=11 AND Z.price=15 But in SQL-TS we have general predicates

Optimal Pattern Search (OPS) Search path for naive algorithm vs. optimized algorithm:

Beyond KMP: General Predicates For IBM stock prices, find all instances where the pattern of two successive drops followed by two successive increases, and the drops take the price to a value between 40 and 50, and the first increase doesn't move the price beyond 52. SELECT X.date AS start_date, X.price U.date AS end_date, U.price FROM quote PARTITION BY name ORDER BY date AS (X, Y, Z, T, U) WHERE X.name='IBM' AND Y.price < X.price AND Z.price < Y.price AND 40 < Z.price < 50 AND Z.price < T.price AND T.price < 52 AND T.price < U.price

Beyond KMP: Star Patterns Relaxed Double Bottom: Only considering increases and decreases that are more than 2% *Z (less than 2% change) *U *W *Y *R *V *T

Relaxed Double Bottom: Ninety fold improvement

Relaxed Double Bottom in June 1990

Conclusion Significant speedups—from 6 to 900 times faster Queries, partial ordered domains, aggregates also treated in this approach Many other optimization opportunities: e.g., parallel search for multiple patterns

shift and next Success for first j-1 elements of pattern. Failure for jth element (when input is at i) Any shift less than shift(j) is guaranteed to lead to failure, Match elements in the pattern starting at next(j) Shifted Pattern i – j + 1 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) shift(j) + 1 shift(j) + next(j) i j next(j) j - shift(j) Input Pattern shift(j)

General Predicates--Cont p1(t) = (t.price < t.previous.price) p2(t) = (t.price < t.previous.price)  (40<t.price<50) p3(t) = (t.price > t.previous.price)  (t.price<52) p4(t) = (t.price > t.previous.price) And we need to find the implication between this pattern elements

Matrices q and j: Input tested on pj is now tested against pk pj succeeded: pj failed: Input that was tested against pj Combing values of these lower triangular matrices ( j ³ k), We derive the values of next(j) and shift (j)

Example

STAR Patterns SELECT X.NEXT.date, X.NEXT.price, S.previous.date, S.previous.price FROM quote CLUSTER BY name, SEQUENCE BY date AS (*X, Y, *Z, *T, U, *V, S) WHERE X.name='IBM‘ AND X.price > X.previous.price AND 30 < Y.price AND Y.price < 40 AND Z.price < Z.previous.price AND T.price > T.previous.price AND 35 < U.price AND U.price < 40 AND V.price < V.previous.price AND S.price < 30

Handling Star Patterns Same input, Transitions on Original Pattern vs. Transitions on Pattern after the index set back j-k 21  31  32   41  42  43 Example: Elements j and k are star predicates and jk is U: U  j,k+1 j+1,k j+1,k+1

Possible Transitions U  j,k+1 1 j,k+1 Elements j and k are star predicates and jk is U: U  j,k+1  j+1,k j+1,k+1 Elements j and k are star predicates and jk is 1: 1 j,k+1 j+1,k j+1,k+1 Elements j and k are not star predicates: j,k j,k+1

Implication Graph