Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.

Similar presentations


Presentation on theme: "1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research."— Presentation transcript:

1 1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research DEXA 2006, Krakow, Poland September, 2006 http://www.cs.ucla.edu/~zaniolo/papers/dexa06.pdf

2 2 Historical Content Management & Temporal Databases: Overview Support for storing and querying time-varying information Many applications –Accounting, banking, digital library, et al. Many research approaches proposed in the past –TSQL2 proposed complex extensions to SQL-2 standards A struggle for the relational model and SQL –Temporal grouping –Queries on time-varying data are hard to express in SQL –Temporal coalescing Continuing interest by database vendors –Oracle flashback: minimal query support –Microsoft Research: Immortal DB - transaction-time management –Teradata: temporal DB application use cases

3 3 Temporal Coalescing: The Crux and the Pain of Temporal Databases Merges timestamps of adjacent or overlapping tuples that have identical attribute values Difficulty to express it in SQL led to new approaches, including –TSQL2: implicit coalescing –Point-based representations SalaryStartEnd 60001995-01-011995-05-31 70001995-06-011995-09-30 70001995-10-011996-01-31 70001996-02-011996-12-31 SalaryStartEnd 60001995-01-011995-05-31 70001995-06-011996-12-31

4 4 Temporal Queries in Standard SQL Little progress on Temporal SQL—but SQL:2003 provides many significant extensions over SQL-2 We now have four different ways to express coalescing (only the first below was available in SQL-2) 1. Double Negation---SQL for smarties [Joe Celko] 2. Recursive queries 3. OLAP functions—i.e., continous aggregates with windows 4. User-defined aggregate functions [most vendors] Our contribution: Introduce 3 & evaluate it against the others

5 5 SSC: Single Scan Coalescing t6t7 t1t3 t2t4 t5t4 t1t5t2t3 t4 (2) t6t7 Step 1: Detect all individual timestamps t5 t1 t2t3 t4 (2) (s, 1) (e, 0) (s, 3) (e, 1) (e, 2) (s, 3) (e, 3) t6t7 (s, 4) (e, 3) (s, 4) (e, 4) (s, 0) (e, 0) (s, 2) (e, 0) (s, 2) (e, 1) Step 2: For each timestamp, Calculate total number of Start timestamps and End timestamps t1 t5 t6 t7 Step 3: Output coalesced period [t i, t i ] where: at t i-1 : (s,m), (e,m) at t j : (s,n), (e,n)

6 6 Single Scan Property Proof t6t7 t1t3 t2t4 t5t4 t2 (s, 2) (e, 0) t3 (s, 2) (e, 1) t1 (s, 1) (e, 0) (s, 0) (e, 0) t5 (s, 3) (e, 3) t5 t6 (s, 4) (e, 3) t6 t7 (s, 4) (e, 4) t7 (s, 3) (e, 1) t4 (2) (s, 3) (e, 2) Non-coalesced Input Coalesced Output We have two t4 timestamps, we process start timestamp first, end timestamp second

7 7 SQL:2003 Implementation of SSC Detect all start/end timestamps—simple projection query –Stored in temp table: (s, e, ts, attr_value) –for start timestamp: (1, 0, ts, attr_value) –For end timestamp: (0, 1, ts, attr_value) We place these values in a table T1 WITH T1 (Start_ts, End_ts, ts, salary) AS ( SELECT 1, 0, TSTART, SALARY FROM EMPLOYEE_HISTORY WHERE EMPNO = 1001 UNION ALL SELECT 0, 1, TEND, SALARY FROM EMPLOYEE_HISTORY )

8 8 SCC-cont T2 (Crt_Total_ts, Prv_Total_ts, ts, Salary) AS ( SELECT sum (Start_ts - End_ts) OVER (PARTITION BY Salary ORDER BY ts, End_ts ROWS UNBOUNDED PRECEDING), sum (Start_ts - End_ts) OVER (PARTITION BY Salary ORDER BY ts, End_ts ROWS BETWEEN 1 PRECEDING AND UNBOUNDED PRECEDING), ts, Salary FROM T1 WHERE Crt_Total_ts = 0 OR Prv_Total_ts = 0) SELECT Salary, max(ts) OVER (PARTITION BY Salary ORDER BY ts ROWS 1 PRECEDING), ts FROM T2 WHERE Crt_Total_ts = 0;

9 9 User Defined Aggregates (UDAs) – Another Alternative UDAs –Native SQL-based query –No need for specialized external programming language implementation Current RDBMSs supporting UDA –Oracle: UDA –IBM, Teradata: User defined aggregate function as table function –ATLaS/Stream Mill at UCLA A DSMS compatible with UDA, SQL:2003 Support advanced streaming queries and data mining UDAs can be natively defined in SQL (also in an external PL)

10 10 Coalesce: as Oracle UDA SSC_UDA (T.START, T.END) : (TSTART, TEND) 1: Define table Temp (TSTART, TEND) to store the current coalesced period, initially empty; 2: Insert the first tuple’s START and END value into Temp; 3: for every new input tuple T do 4: if T.START <= Temp.TEND then 5: //new tuple coalescable with current coalesced period in Temp 6:Update Temp.TEND with T.END; 7: else 8: //current coalesced period ends, a new coalescing period begins 9: Output the tuple in Temp, then update Temp with T.START and T.END; 10: end if 11: end for ; 12: Output the tuple in Temp; SELECT SSC_UDA (start, end) FROM T GROUP BY T.attr_value

11 11 Support Temporal Coalescing with Current RDBMSs All major DB vendors support –Timestamp data type –SQL:2003 –UDAs or User defined aggregate function as table function Period data type has also been defined by ANSI The above are enough to support SSC without any extension

12 12 Experiments Experiment environment –PC: Athlon XP 1500+ Processor at 1.3 GHz, 512 MB memory –Database: Oracle 10g Release 1 –OS: Fedora Core Version 3 Linux Data set –A simulated employee database –Emp (empno, salary, title, deptno, start, end) –17 years, 4 title values, 20 department values –Total data size 120 MB

13 13 Performance Comparison Queries –Coalescing query on Title –Coalescing query on Deptno –Coalescing query on Empno and Title –Coalescing query on Empno and Deptno Compared implementation –Recursive SQL –SQL with NOT EXISTS –SQL:2003 –UDA

14 14 Performance Query Performance on Single Attribute

15 15 Performance (cont.) Query Performance on Two Attributes

16 16 Scalability Test SQL:2003 implementation of SSC Two-attribute coalescing Test on ¼, ½, ¾, and whole data set

17 17 Conclusion Temporal coalescing –Critical query for temporal RDBMS –Efficient algorithms for coalescing are needed –Ways to express them in SQL also needed. We have proposed: –A new algorithm, SSC, to efficiently support coalescing with single scan of input tuples –Approaches to implement the SSC algorithm in current RDBMS using UDAs and OLAP functions, –A performance comparison for alternative solutions.

18 18 Thank You !


Download ppt "1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research."

Similar presentations


Ads by Google