CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles.

Slides:



Advertisements
Similar presentations
TSQL 2 : QUERY LANGUAGE FOR TEMPORAL DATA CS 224 : Advanced Topics in Data Management.
Advertisements

From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
CS240A: Databases and Knowledge Bases Temporal Applications and SQL:1999 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Advanced Databases Temporal Databases Dr Theodoros Manavis
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
CS240A: Databases and Knowledge Bases Introduction Carlo Zaniolo Department of Computer Science University of California, Los Angeles WINTER 2002.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Spatio-Temporal Databases
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Ch1: File Systems and Databases Hachim Haddouti
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 11 Management Decision Making.
CS240A: Databases and Knowledge Bases Time Ontology and Representations Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
The need for Temporal Databases (1)Need for coping with temporal data (both VT and TT) (2)Just adding 1 (or 2, or 4) temporal attributes (and maybe some.
IST Databases and DBMSs Todd S. Bacastow January 2005.
CS240A: Databases and Knowledge Bases Introduction Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Systems analysis and design, 6th edition Dennis, wixom, and roth
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Introduction to SQL Steve Perry
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Recent research : Temporal databases N. L. Sarda
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
Fushen Wang, XinZhou, Carlo Zaniolo Using XML to Build Efficient Transaction- Time Temporal Database Systems on Relational Databases In Time Center, 2005.
資工所 在職碩一 P 莊浚銘 Temporal Database Paper Reading Report.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
CS240A: Databases and Knowledge Bases Temporal Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
CIS Information and Databases 1 Information and Databases.
ER & Relational: Digging Deeper R &G - Chapters 2 & 3.
Temporal Data Modeling
Where does time go ?. Applications abound Temporal database systems provide built-in support for recording and querying time-varying information Application.
CS240A: Databases and Knowledge Bases TSQL2 Carlo Zaniolo Department of Computer Science University of California, Los Angeles Notes From Chapter 6 of.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Jianping Fan.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Chapter 1: Introduction. 1.2 Database Management System (DBMS) DBMS contains information about a particular enterprise Collection of interrelated data.
CS240A: Databases and Knowledge Bases Temporal Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
CS240A: Databases and Knowledge Bases Introduction Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Databases and DBMSs Todd S. Bacastow January
Module 11: File Structure
CS240A: Databases and Knowledge Bases Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations: Other Operations
Technology for School Leadership
Temporal Databases.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Temporal Databases.
Implementation of Relational Operations
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
CS240A: Databases and Knowledge Bases TSQL2
Evaluation of Relational Operations: Other Techniques
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs
Presentation transcript:

CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal Databases  The problem is harder than what you think  A time ontology—Temporal data type in SQL  Temporal Applications and SQL  Plenty of temporally-oriented DB applications  Not supported well by SQL  Many research approaches proposed to solve the problem  TSQL2  The physical level: efficient storage and indexing techniques.

Applications Abound: Examples  Academic: Transcripts record courses taken in previous and the current semester or term and grades for previous courses  Accounting: What bills were sent out and when, what payments were received and when?  Delinquent accounts, cash flow over time  Money­management software such as Quickencan show e.g., account balance over time.  Budgets: Previous and projected budgets, multi­ quarter or multi­year budgets

Temporal DB Applications (cont.)  Data Warehousing: Historical trend analysis for decision support  Financial: Stock market data  Audit: why were financial decisions made, and with what information available?  GIS: Geographic Information Systems ()  Land use over time: boundary of parcels changeover time, as parcels get partitioned and merged.  Title searches  Insurance: Which policy was in effect at each point in time, and what time periods did that policy cover?

Temporal DB Applications (cont.)  Medical records: Patient records, drug regimes, lab tests.Tracking course of disease  Payroll: Past employees, employee salary history, salaries for future months, records of withholdingrequested by employees  Capacity planning for roads and utilities. Configuring new routes, ensuring high utilization  Project scheduling: Milestones, task assignments  Reservation systems: airlines, hotels, trains.  Scientific: Timestamping satellite images. Dating archeological finds

Temporal DBs Applications: Conclusion  It is difficult to identify applications that do not involve the management of temporal data.  These applications would benefit from built­in temporal support in the DBMS. Main benefits:  More efficient application development  Potential increase in performance

A Case Study on using SQL for temporal Apps  University of Arizona's Office of Appointed Personnel has some information in a database. Employee(Name, Salary, Title)  The OAP wishes to add the date of birth Employee(Name, Salary, Title, DateofBirth DATE) SELECT Salary, DateofBirth FROM Employee WHERE Name = 'Bob‘  Finding an employee's DoB is as easy as finding his/her salary.  Managing Dates and instants in time are not a problem--- the hard problems come with periods (a.k.a. intervals)

Converting to a Temporal Database  Now the OAP wishes to computerize the employment history.  Adding validity periods to tuples: Employee (Name, Salary, Title, DateofBirth, Start DATE, Stop DATE)

A Temporal Database using Periods Employee (Name, Salary, Title, DateofBirth,Start DATE, Stop DATE) NameSalaryTitleDateofBirthStartStop Bob AssistantProvost 1945­04­ ­01­ ­06­01 Bob AssistantProvost 1945­04­ ­06­ ­10­01 Bob Provost 1945­04­ ­10­ ­02­01 Bob Professor 1945­04­ ­02­ ­01­01 Here we use closed intervals—intervals open to the right are also used often: if the new period begin at the old one must end at

Temporal Representations  Temporal Intervals (i.e., periods) are the most popular representations for time. For a short overview of the topic see:  Operators and predicates associated with periods include:  Overlap, Contains, Meets, Precedes, and Follows  Several other representations have been used including:  Sets of Periods,  Point-Based Representations  TSQL2: the model is set of periods where each period is a set of cronons--but there is no direct access to those: there manipulation follows implicitly from special constructs.

Allen’s 13 Temporal Predicates on Periods A B A B A B A B A B A B A B A FINISHES B B is FINISHED by A A is BEFORE B B is AFTER A A MEETS B B is MET by A A OVERLAPS B B is OVERLAPPED by A A STARTS B B is STARTED by A A is EQUAL to B B is EQUAL to A A DURING B B CONTAINS A

Period-based temporal representations are simple in SQL  But temporal queries are quite complex..  E.g., Find the employee's salary at a given time: e.g. the current one: SELECT Salary FROM Employee WHERE Name = 'Bob‘ AND Start <= CURRENT_TIMESTAMP AND CURRENT_TIMESTAMP <= Stop Instead of CURRENT_TIMESTAMP we could have given any timestamp or date

Distributing the Salary History  OAP wants to distribute to all employees their salary history. For Bob 3 consecutive periods at  In general: employee could have arbitrarily many title changes between salary changes  Canonical representation: maximal intervals at each salary Name Salary Start Stop Bob ­01­ ­06­01 Bob ­06­ ­01­01  A complex operation called coalescing is needed to compute the maximal intervals—this must be computed after each projection.

Coalescing Using Embedded SQL The coalescing operations that are needed after each projection operation are difficult to express in pure SQL.  Coalescing Using Embedded SQL: Use SQL only to open a cursor on the table and perform the actual coalescing in a programming language.  Coalescing Using a 4GL: solution given in the textbook (1995) is obsolete: use recursive queries in SQL1999/2003

Coaleshing using a 4GL CREATE TABLE Temp(Salary, Start, Stop) AS SELECT Salary, Start, Stop FROM Employee WHERE Name = 'Bob'; repeat UPDATE Temp AS T1 SET (T1.Stop) = (SELECT MAX(T2.Stop) FROM Temp AS T2 WHERE T1.Salary = T2.Salary AND T1.Start = T2.Start AND T1.Stop = T2.Start AND T1.Stop < T2.Stop) until no tuples updated;

Salary History (cont.)  Intervals that are not maximal must be deleted DELETE FROM Temp T1 WHERE EXISTS (SELECT * FROM Temp AS T2 WHERE T1.Salary = T2.Salary AND ( (T1.Start > T2.Start AND T1.Stop = T2.Start AND T1.Stop < T2.Stop) ) The loop is executed lgN times in the worst case, where N is the number of tuples in a chain of overlapping or adjacent, value­equivalent tuples. Then delete extraneous, non­ maximal intervals.

A Better Alternative Recursion  Use recursion.  It two periods overlap merge them into a new period. Overlap: Not(E1 =S2 and E2>=S1.

Surprise: Non-Recursive SQL CREATE TABLE Temp(Salary, Start, Stop) AS SELECT Salary, Start, Stop FROM Employee WHERE Name = 'Bob'; SELECT DISTINCT F.Salary, F.Start, L.Stop FROM Temp AS F, Temp AS L WHERE F.Start < L.Stop AND F.Salary = L.Salary AND NOT EXISTS (SELECT * FROM Temp AS M WHERE M.Salary = F.Salary AND F.Start < M.Start AND M.Start < L.Stop AND NOT EXISTS (SELECT * FROM Temp AS T1 WHERE T1.Salary = F.Salary AND T1.Start < M.Start AND M.Start <= T1.Stop)) AND NOT EXISTS (SELECT * FROM Temp AS T2 WHERE T2.Salary = F.Salary AND ( (T2.Start < F.Start AND F.Start <= T2.Stop) OR (T2.Start < L.Stop AND L.Stop < T2.Stop)))

The Curse of Coalescing  Maximal periods is the standard state-based representation  Projection is very common in queries and was no-op in SQL: but now requires coalescing  Expressing coalescing in SQL is possible but complex  A better solution could be to introduce a special operator—which is basically an aggregate, and aggregates in SQL require a particular structure.  Question: Can we find representations that eliminate or minimize the need for coalescing?

Minimize the need for coalescing by Reorganizing the schema  Separate Salary, Title, and DateofBirth information: Employee1 (Name, Salary, Start DATE, Stop DATE) Employee2 (Name, Title, Start DATE, S top DATE)  Getting the salary information is now easy: SELECT Salary, Start, Stop FROM Employee1 WHERE Name = 'Bob‘  But what if we want a table with both salary and title?

Temporal Joins NameSalaryStartStop Bob ­01­011993­06­01 Bob ­06­011995­01­01 NameTitleStartStop BobAssistantProvost1993­01­011993­10­01 BobProvost1993­10­011994­02­01 BobFullProfessor1994­02­011995­01­01 NameSalaryTitleStartStop Bob60000AssistantProvost1993­01­011993­06­01 Bob70000AssistantProvost1993­06­011993­10­01 Bob70000Provost1993­10­011994­02­01 Bob70000FullProfessor1994­02­011995­01­01 Their Temporal Join: Employee 1 : Employee 2 :

Temporal Join in SQL SELECT E1.Name, Salary, Title, E1.Start, E1.Stop FROM Employee1 AS E1, Employee2 AS E2 WHERE E1.Name=E2.Name AND E2.Start <= E1.Start AND E1.Stop <= E2.Stop UNION ALL SELECT E1.Name, Salary, Title, E1.Start, E2.Stop FROM Employee1 AS E1, Employee2 AS E2 WHERE E1.Name = E2.Name AND E1.Start > E2.Start AND E2.Stop< E1.Stop AND E1.Start < E2.Stop UNION ALL SELECT E1.Name, Salary, Title E2.Start, E1.Stop FROM Employee1 AS E1, Employee2 AS E2 WHERE E1.Name = E2.Name AND E2.Start > E1.Start AND E1.Stop <= E2.Stop AND E2.Start < E1.Stop UNION ALL SELECT E1.Name, Salary, Title E2.Start, E2.Stop FROM Employee1 AS E1, Employee2 AS E2 WHERE E1.Name = E2 Name AND E2.Start => E1.Start AND E2.Stop <= E1.Stop AND NOT (E1.Start = E2.Start AND E1.Stop = E2.Stop)

Simpler Temporal Join in SQL Overlap: E1>=S2 and E2>=S1 What is the actual intersection? larger(X, Y, X) =Y. larger(X, Y, Y) <- X < Y. Symmetrically for smaller. You can also use if… then else in Deals, or CASE in SQL Now: intersection(B, E)<- period(S1, E1), period(S2, E2), larger(B1, B2, B), smaller(E1, E2, E), B<E.

Summary  Coalescing and temporal joins are very difficult to express in SQL.  Solutions proposed …  Time stamp attributes rather than tuples—but then many temporal joins must be used  Point-Based Representation  Others, including combinations of above (more than 40 counted that were using SQL)  We will discuss TSQL2 and XML later.

Reviewing the Situation  The importance of temporal applications has motivated much research on temporal DBs: but no satisfactory solution has been found yet:  SQL does not support temporal queries well  Temporal DBs remain an open research problem.  The problem is much more difficult than it appears at first: we have become so familiar with the time domain that we tend to overlook its intrinsic complexity.  Other issues that we have not discussed yet include:  Support for bitemporal models  temporal clustering, indexing and implementation-oriented issues

This was proposed by temporal DB people Point-Based Model Employee1 (Name, Sal, Day ) Bob ­01­01 … Bob ­05­31 Bob ­05­31 … Bob ­12­31 To project out Sal you only need to eliminate that column. Internally we need to use more concise representations—e.g., the period- based representations: NameSalaryStartStop Bob ­01­011993­06­01 Bob ­06­011995­01­01

Queries in Point-Based  No coalescing needed in the query: e.g., project out salary: SELECT E1.Name, E1.Day FROM Employee1 AS E1  Temporal Joins are simple: SELECT E1.Name, Sal, Title FROM Employee1 AS E1, Employee2 AS E2 WHERE E1.Name = E2.Name AND E1.Day=E2.Day The point –based representation can only be used as a logical view. Since a very different internal representation must be used, support for queries can be a challenge. No serious takers so far.