ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H nnn

Slides:



Advertisements
Similar presentations
Shailendra Mishra Director (CEP).
Advertisements

TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
Fundamentals, Design, and Implementation, 9/e COS 346 Day 11.
Introduction to Structured Query Language (SQL)
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Concepts of Database Management Sixth Edition
ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H nnn
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
SQL Operations Aggregate Functions Having Clause Database Access Layer A2 Teacher Up skilling LECTURE 5.
Chapter 3 Single-Table Queries
Microsoft Access 2010 Chapter 7 Using SQL. Change the font or font size for SQL queries Create SQL queries Include fields in SQL queries Include simple.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Concepts of Database Management Seventh Edition
SQL-5 (Group By.. Having). Group By  Need: To apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some.
Chapter 8: SQL. Data Definition Modification of the Database Basic Query Structure Aggregate Functions.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Management COP4540, SCS, FIU Structured Query Language (Chapter 8)
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Copyright © 2004, Oracle. All rights reserved. Lecture 4: 1-Retrieving Data Using the SQL SELECT Statement 2-Restricting and Sorting Data Lecture 4: 1-Retrieving.
Queries SELECT [DISTINCT] FROM ( { }| ),... [WHERE ] [GROUP BY [HAVING ]] [ORDER BY [ ],...]
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
SQL Aggregation Oracle and ANSI Standard SQL Lecture 9.
A Guide to SQL, Eighth Edition Chapter Eight SQL Functions and Procedures.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
IST 210 More SQL Todd Bacastow IST 210: Organization of Data.
CS240A: Databases and Knowledge Bases TSQL2 Carlo Zaniolo Department of Computer Science University of California, Los Angeles Notes From Chapter 6 of.
24 Copyright © 2009, Oracle. All rights reserved. Building Views and Charts in Requests.
9/29/2005From Introduction to Oracle:SQL and PL/SQL, Oracle 1 Restricting and Sorting Data Kroenke, Chapter Two.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Restricting and Sorting Data
Chapter 10 NP-Complete Problems.
Retrieving Data Using the SQL SELECT Statement
Querying in Access Objectives: Learn how to use the Access Query Design Tool manipulate data in Access: Sorting data Aggregating Data Performing Calculations.
Module 2: Intro to Relational Model
Writing Basic SQL SELECT Statements
Database Systems SQL cont. Relational algebra
Chapter # 6 The Relational Algebra and Calculus
Introduction to SQL.
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Current outstanding balance
CPSC-608 Database Systems
Chapter 2: Intro to Relational Model
Using SQL to Prepare Data for Analysis
CS 405G: Introduction to Database Systems
ICS 353: Design and Analysis of Algorithms
Restricting and Sorting Data
CIS16 Application Programming with Visual Basic
Chapter 7 Introduction to Structured Query Language (SQL)
Reporting Aggregated Data Using the Group Functions
Contents Preface I Introduction Lesson Objectives I-2
Chapter 2: Intro to Relational Model
Chapter 8 Advanced SQL.
PL/SQL Declaring Variables.
CS240B: Assignment1 Winter 2016.
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
CS240B Midterm: Winter 2017 Your Name: and your ID:
Geo-Databases: lecture 3 Simple Queries in SQL
The New and Improved SQL:2016 Standard
Reporting Aggregated Data Using the Group Functions
Reporting Aggregated Data Using the Group Functions
CS240A: Databases and Knowledge Bases TSQL2
Shelly Cashman: Microsoft Access 2016
Restricting and Sorting Data
Presentation transcript:

Pattern Matching in Sequences of Rows March 2, 2007 Change Proposal (for SQL standards) ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H2-2006-nnn Authors: Fred Zemke (Oracle), Andrew Witkowski (Oracle), Mitch Cherniak (Streambase),Latha Colby (IBM) CS240B Notes by: Carlo Zaniolo Computer Science Department UCLA

Match_Recognize Inspired by SQL-TS, but more verbose and more options. For instance: * — 0 or more matches + — 1 or more matches ? — 0 or 1 match { n } — exactly n matches { n, m } — between n and m (inclusive) matches Alternation: indicated by a vertical bar ( | ). More ...

Example Let Ticker (Symbol, Tstamp, Price) be a table with three columns representing historical stock prices. Symbol is a character column, Tstamp is a timestamp column (for simplicity shown as increasing integers) and Price is a numeric column. We want to partition the data by Symbol, sort it into increasing Tstamp order, and then detect the following pattern in Price: a falling price, followed by a rise in price that goes higher than the price was when the fall began. After finding such patterns, it is desired to report the starting time, starting price, inflection time (last time duringthe decline phase), low price, end time, and end price.

Example FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

Measures: Naming and renaming SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno Measures: Naming and renaming ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price)) Define the pattern and te conditions which must be satisfied in each state of the pattern No condition on A

SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price)) { ONE ROW | ALL ROWS } PER MATCH { MAXIMAL | INCREMENTAL } MATCH AFTER MATCH SKIP { TO NEXT ROW | PAST LAST ROW | TO LAST<variable> | TO FIRST <variable> }

ALL ROWS PER MATCH :one row for each row in the pattern. FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno CLASSIFIER AS Classy SELECT T.Symbol, /* row’s symbol/ * T.Tstamp, /* row’s time */ T.Price, /* row’s price */ T.classy /* row’s classifier */ T.a_tstamp, /* start time */ T.a_price, /* start price */ T.max_c_tstamp, /*inflection time*/ T.last_c_price, /* low price */ T.max_f_tstamp, /* end time */ end price */ ALL ROWS PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price) ) T ALL ROWS PER MATCH :one row for each row in the pattern. In addition to partitioning, ordering and measure columns we can reference other columns. (via T) CLASSIFIER component that may be used to declare a character result column whose contents on each row is the variable name that the row matched with.

Syntactic Sugar Variables can be repeated in the pattern clause SUBSET: to rename a set of variables Portion of the pattern can be excluded (when returning all rows) Special construct to define alternations obtained as permutations of variables

Singletons and group variables FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(a.time) a_firsttime, LAST(d.time) d_lasttime, AVG(b.price) b_avgprice, AVG(d.price) d_avgprice PATTERN ( A B+ C+ D ) DEFINE A AS A.price > 100, B AS B.price > A.price, C AS C.price < AVG (B.price), D AS D.price > PREV(D.price) ) If a variable is a singleton, then only individual columns may be referenced, not aggregates. If the variable is used in an aggregate, then the aggregate is performed over all rows that have matched the variable so far. If desired, we can construe this as providing running aggregates with no special syntax, when a variable is referenced in an aggregate in its own definition, or we can continue to require special syntax to highlight that a running aggregate is meant.

More ALL ROWS PER MATCH—only CLASSIFIER is used to specify the name of a character string column, called the classifier column. In each row of output, the classifier column is set to the variable name in the PATTERN that the row matched. MATCH_NUMBER Matches within a partition are numbered sequentially starting with 1 in the order they are chosenin the previous section. The MATCH_NUMBER component is used to specify a column name for an extra column of output from the MATCH_RECOGNIZE construct. The extra column is an exact numeric with scale 0, and provides the MATCH_NUMBER within a partition, starting with 1 for the first match, 2 for the second, etc. FIRST and LAST special aggregates for group variables

Windows SELECT sum_yprice OVER W, x_time OVER W, AVG(Y.Price) FROM T WINDOW W AS (PARTITION BY .. ORDER BY.. MEASURES SUM(Y.price) AS sum_yprice x.time AS x_time (PATTERN (X Y+ Z)...) )

Conclusions Specs proposed by 2 DBMS vendors (Oracle & IBM) and 2 DSMS startups (Coral8 and Streambase) Very powerful: capabilities of SQL-TS plus several new constructs of convenience—particularly in controlling output. Optimization techniques developed for SQL-TS could also be critical here.