QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.

Slides:



Advertisements
Similar presentations
Incremental Clustering for Trajectories
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Indexing DNA Sequences Using q-Grams
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Mining Mobile Group Patterns: A Trajectory-based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu NSYSU, Taiwan Ee-Peng Lim NTU, Singapore.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Cluster Analysis.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
An Approach to Active Spatial Data Mining Wei Wang Data Mining Lab, UCLA March 24, 1999.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Associative Learning in Hierarchical Self Organizing Learning Arrays Janusz A. Starzyk, Zhen Zhu, and Yue Li School of Electrical Engineering and Computer.
Spatial Temporal Data Mining
Data Mining Techniques
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Data Mining Chun-Hung Chou
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Whitespace Measurement and Virtual Backbone Construction for Cognitive Radio Networks: From the Social Perspective Shouling Ji and Raheem Beyah Georgia.
Crash Cube: An application of Map Cube to Hotspot Discovery in Vehicle Crash Data Mark Dietz, Jesse Vig CSCI 8715 Spatial Databases University of Minnesota.
On the Topology of Wireless Sensor Networks Sen Yang, Xinbing Wang, Luoyi Fu Department of Electronic Engineering, Shanghai Jiao Tong University, China.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Pin-Yun Tarng / An Analysis of WoW Players’ Game Hours Network and Systems Laboratory nslab.ee.ntu.edu.tw IEEE/IFIP DSN 2008 Network and Systems Laboratory.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Clustering High-Dimensional Data. Clustering high-dimensional data – Many applications: text documents, DNA micro-array data – Major challenges: Many.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
1 Using Network Coding for Dependent Data Broadcasting in a Mobile Environment Chung-Hua Chu, De-Nian Yang and Ming-Syan Chen IEEE GLOBECOM 2007 Reporter.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University.
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Efficient Monitoring of Changing Clusters on Multi- dimensional Data Streams Nam Hun Park 1, Kil Hong Joo 2* and Su Young Han 1 1 Dept. of Computer Science,
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.
Data Mining Soongsil University
Progressive Computation of The Min-Dist Optimal-Location Query
Dynamic Indexing in SpatialHadoop
©Jiawei Han and Micheline Kamber Department of Computer Science
Introduction Secondary Users (SUs) Primary Users (PUs)
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
CARPENTER Find Closed Patterns in Long Biological Datasets
CSE572, CBS598: Data Mining by H. Liu
Finding Fastest Paths on A Road Network with Speed Patterns
A Framework for Clustering Evolving Data Streams
Lu Xing CS59000GDM Sept 7th, 2018.
Managing uncertainty and quality in the classification process
CSE572, CBS572: Data Mining by H. Liu
A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*
Outline Ganesan, D., Greenstein, B., Estrin, D., Heidemann, J., and Govindan, R. Multiresolution storage and search in sensor networks. Trans. Storage.
Efficient Cache-Supported Path Planning on Roads
Data Transformations targeted at minimizing experimental variance
Continuous Motion Pattern Query
Continuous Density Queries for Moving Objects
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Online Analytical Processing Stream Data: Is It Feasible?
Topic 5: Cluster Analysis
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Efficient Aggregation over Objects with Extent
Presentation transcript:

QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University

2 Introduction  Dense Region Query Data records are viewed as data points in the d- dimensional data space constructed by the d-attributes. Locate the regions with higher density than their surroundings. timeAgeSalary 1/12030k 1/56832k 2/14350k 2/83570k 3/1120 3/25520k ……… ……… Age Salary (*1000) Dense region

3 Grid-based Approach  The data space is divided into non-overlapping rectangular grids (cells).  Density of a cell: the percentage of data points contained in this cell Age Salary (*1000) Dense cell Dense region Maximal connected dense cells

4 Motivation  Previous research tends to ignore the time feature of the data. They execute queries over the entire database.  However, different dense regions may be discovered if different time periods are taken into consideration. (the density of a cell: )  Discovering dense regions over different time intervals is crucial for users to get the interesting patterns hidden in data.

5 Example  Some dense regions may exist in certain time intervals but will not be discovered if taking all data records into account.  Middle-aged people: : the number of customers in different time slots : the number of middle-aged people in different time slots

6 Temporal Dense Region Query  Dense Region Discovery in the constrained time intervals. E.g., each Sunday in May,  Time slots: Derived by segmenting the data points with a time granularity, e.g. hour, week, month, etc. For users to specify a variety of time periods of interest  Problem Definition: Given a set of time slots, and the density threshold ρ, find the dense regions in the queried time slots.

7 QED Framework  Challenge The queried time intervals are unknown in advance.  QED (Querying tEmporal Dense region) Offline Maintaining Phase  Construct a summarized data structure, RF-tree, for each time slot Online Clustering Phase  Answer various user queries based on the RF-trees

8 Online query processing phase Offline maintaining phase QED Framework timeAgeSalary 1/12030k 1/56832k ….…. ….…. ….…. 2/14350k 2/83570k ……… 3/1120 3/25520k ……… W3 W2 W1 Combine Temporal Dense Region Query Query Result

9 Offline Maintaining Phase - Construct the RF-trees  Basic Idea: A number of cells having nearly the density value can be summarized by their average density value.  Uniform Region A region where the cells contained in it have nearly of the same density value region

10 Uniform Region  Entropy-based approach  Entropy of a region  Maximum entropy of a region  Uniform region

11 Example (Uniform Region)  Case 1:  Case 2: Region A

12 Construct the RF-tree  Recursively partition the data space to find the uniform region  The leaf nodes will be of two cases: A cell A uniform region  RF (Region Feature):

13 Online Query Processing Phase  Step1: Combine the RF-trees of the queried time slots.  Step2: Execute the query on the combined RF-tree.

14 Step1: Combine the RF-trees  Three cases for combining the corresponding regions in two RF-trees. Case 1 : Both are uniform regions Case 2 : Both are non-uniform regions Case 3 : Only one is a uniform region

15 Step2: Execute the query  All leaf nodes in the combined RF-trees are examined to discover the dense cells in the data space.  The leaf nodes will be of two cases: A cell A uniform region: compare the average density with the density thresholdρ  The leaf nodes containing dense cells will be put into a queue for further dense region discovery.

16 Conclusion  The problem of temporal dense region query is explored to discover dense regions in the queried time slots.  We also propose the QED framework to execute temporal dense region queries.  QED is advantageous in that various queries with different density thresholds and time slots can be efficiently supported by using the concept of time slot and proposed RF-tree.

17 References  Yi-Hong Chu, Kun-Ta Chuang, Ming-Syan Chen, QED: an Efficient Framework for Temporal Dense Region Processing, in Proc. of PAKDD,  W. Wang, J. Yang, and R. Muntz1997, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proc. of VLDB,  D,-S. Cho, B-H.Hong, and J.Max. Efficient Region Query Processing by Optimal Page Ordering. In Proc. of ADBIS-DASFAA, 2000.

Thank You~ Q & A