Download presentation
Presentation is loading. Please wait.
Published byJemimah Nichols Modified over 8 years ago
1
QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University
2
2 Introduction Dense Region Query Data records are viewed as data points in the d- dimensional data space constructed by the d-attributes. Locate the regions with higher density than their surroundings. timeAgeSalary 1/12030k 1/56832k 2/14350k 2/83570k 3/1120 3/25520k ……… ……… Age Salary (*1000) Dense region
3
3 Grid-based Approach The data space is divided into non-overlapping rectangular grids (cells). Density of a cell: the percentage of data points contained in this cell Age Salary (*1000) 0 10 20 30 40 50 60 70 80 90 100 Dense cell Dense region Maximal connected dense cells
4
4 Motivation Previous research tends to ignore the time feature of the data. They execute queries over the entire database. However, different dense regions may be discovered if different time periods are taken into consideration. (the density of a cell: ) Discovering dense regions over different time intervals is crucial for users to get the interesting patterns hidden in data.
5
5 Example Some dense regions may exist in certain time intervals but will not be discovered if taking all data records into account. Middle-aged people: : the number of customers in different time slots : the number of middle-aged people in different time slots
6
6 Temporal Dense Region Query Dense Region Discovery in the constrained time intervals. E.g., each Sunday in May, Time slots: Derived by segmenting the data points with a time granularity, e.g. hour, week, month, etc. For users to specify a variety of time periods of interest Problem Definition: Given a set of time slots, and the density threshold ρ, find the dense regions in the queried time slots.
7
7 QED Framework Challenge The queried time intervals are unknown in advance. QED (Querying tEmporal Dense region) Offline Maintaining Phase Construct a summarized data structure, RF-tree, for each time slot Online Clustering Phase Answer various user queries based on the RF-trees
8
8 Online query processing phase Offline maintaining phase QED Framework timeAgeSalary 1/12030k 1/56832k ….…. ….…. ….…. 2/14350k 2/83570k ……… 3/1120 3/25520k ……… W3 W2 W1 Combine Temporal Dense Region Query Query Result
9
9 Offline Maintaining Phase - Construct the RF-trees Basic Idea: A number of cells having nearly the density value can be summarized by their average density value. Uniform Region A region where the cells contained in it have nearly of the same density value 1088 9 79 region
10
10 Uniform Region Entropy-based approach Entropy of a region Maximum entropy of a region Uniform region
11
11 Example (Uniform Region) Case 1: Case 2: 1002 230 082 Region A 333 333 333
12
12 Construct the RF-tree Recursively partition the data space to find the uniform region The leaf nodes will be of two cases: A cell A uniform region RF (Region Feature):
13
13 Online Query Processing Phase Step1: Combine the RF-trees of the queried time slots. Step2: Execute the query on the combined RF-tree.
14
14 Step1: Combine the RF-trees Three cases for combining the corresponding regions in two RF-trees. Case 1 : Both are uniform regions Case 2 : Both are non-uniform regions Case 3 : Only one is a uniform region
15
15 Step2: Execute the query All leaf nodes in the combined RF-trees are examined to discover the dense cells in the data space. The leaf nodes will be of two cases: A cell A uniform region: compare the average density with the density thresholdρ The leaf nodes containing dense cells will be put into a queue for further dense region discovery.
16
16 Conclusion The problem of temporal dense region query is explored to discover dense regions in the queried time slots. We also propose the QED framework to execute temporal dense region queries. QED is advantageous in that various queries with different density thresholds and time slots can be efficiently supported by using the concept of time slot and proposed RF-tree.
17
17 References Yi-Hong Chu, Kun-Ta Chuang, Ming-Syan Chen, QED: an Efficient Framework for Temporal Dense Region Processing, in Proc. of PAKDD, 2005. W. Wang, J. Yang, and R. Muntz1997, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proc. of VLDB, 1997. D,-S. Cho, B-H.Hong, and J.Max. Efficient Region Query Processing by Optimal Page Ordering. In Proc. of ADBIS-DASFAA, 2000.
18
Thank You~ Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.