QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.

QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University

2 Introduction  Dense Region Query Data records are viewed as data points in the d- dimensional data space constructed by the d-attributes. Locate the regions with higher density than their surroundings. timeAgeSalary 1/12030k 1/56832k 2/14350k 2/83570k 3/1120 3/25520k ……… ……… Age Salary (*1000) Dense region

3 Grid-based Approach  The data space is divided into non-overlapping rectangular grids (cells).  Density of a cell: the percentage of data points contained in this cell Age Salary (*1000) 0 10 20 30 40 50 60 70 80 90 100 Dense cell Dense region Maximal connected dense cells

4 Motivation  Previous research tends to ignore the time feature of the data. They execute queries over the entire database.  However, different dense regions may be discovered if different time periods are taken into consideration. (the density of a cell: )  Discovering dense regions over different time intervals is crucial for users to get the interesting patterns hidden in data.

5 Example  Some dense regions may exist in certain time intervals but will not be discovered if taking all data records into account.  Middle-aged people: : the number of customers in different time slots : the number of middle-aged people in different time slots

6 Temporal Dense Region Query  Dense Region Discovery in the constrained time intervals. E.g., each Sunday in May,  Time slots: Derived by segmenting the data points with a time granularity, e.g. hour, week, month, etc. For users to specify a variety of time periods of interest  Problem Definition: Given a set of time slots, and the density threshold ρ, find the dense regions in the queried time slots.

7 QED Framework  Challenge The queried time intervals are unknown in advance.  QED (Querying tEmporal Dense region) Offline Maintaining Phase  Construct a summarized data structure, RF-tree, for each time slot Online Clustering Phase  Answer various user queries based on the RF-trees

8 Online query processing phase Offline maintaining phase QED Framework timeAgeSalary 1/12030k 1/56832k ….…. ….…. ….…. 2/14350k 2/83570k ……… 3/1120 3/25520k ……… W3 W2 W1 Combine Temporal Dense Region Query Query Result

9 Offline Maintaining Phase - Construct the RF-trees  Basic Idea: A number of cells having nearly the density value can be summarized by their average density value.  Uniform Region A region where the cells contained in it have nearly of the same density value 1088 9 79 region

10 Uniform Region  Entropy-based approach  Entropy of a region  Maximum entropy of a region  Uniform region

11 Example (Uniform Region)  Case 1:  Case 2: 1002 230 082 Region A 333 333 333

12 Construct the RF-tree  Recursively partition the data space to find the uniform region  The leaf nodes will be of two cases: A cell A uniform region  RF (Region Feature):

13 Online Query Processing Phase  Step1: Combine the RF-trees of the queried time slots.  Step2: Execute the query on the combined RF-tree.

14 Step1: Combine the RF-trees  Three cases for combining the corresponding regions in two RF-trees. Case 1 : Both are uniform regions Case 2 : Both are non-uniform regions Case 3 : Only one is a uniform region

15 Step2: Execute the query  All leaf nodes in the combined RF-trees are examined to discover the dense cells in the data space.  The leaf nodes will be of two cases: A cell A uniform region: compare the average density with the density thresholdρ  The leaf nodes containing dense cells will be put into a queue for further dense region discovery.

16 Conclusion  The problem of temporal dense region query is explored to discover dense regions in the queried time slots.  We also propose the QED framework to execute temporal dense region queries.  QED is advantageous in that various queries with different density thresholds and time slots can be efficiently supported by using the concept of time slot and proposed RF-tree.

17 References  Yi-Hong Chu, Kun-Ta Chuang, Ming-Syan Chen, QED: an Efficient Framework for Temporal Dense Region Processing, in Proc. of PAKDD, 2005.  W. Wang, J. Yang, and R. Muntz1997, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proc. of VLDB, 1997.  D,-S. Cho, B-H.Hong, and J.Max. Efficient Region Query Processing by Optimal Page Ordering. In Proc. of ADBIS-DASFAA, 2000.

Thank You~ Q & A

QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.

Similar presentations

Presentation on theme: "QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.

Similar presentations

Presentation on theme: "QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National."— Presentation transcript:

Similar presentations

About project

Feedback