Download presentation
Presentation is loading. Please wait.
Published byMoris Cain Modified over 9 years ago
1
Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin Cao (Aalborg University)
2
Outline Introduction Preliminary Trajectory Processing – Execution Overview – Storage – Indexing Methods – Query Processing Experimental Study Future Works
3
Introduction Location-based services are playing important roles. Large volumes of diverse formats of trajectory data have been accumulated. Traditional centralized technologies may not deal with the large amount of trajectories. Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.
4
Challenge Huge volume, updates frequently, rapidly increasing. Trajectory data is “continuous”, i.e. ordered sequentially. Highly skewed. MapReduce is good at offline data analysis, but not efficient for online query.
5
Our Contributions Extend the MapReduce framework to manage massive sequential data, such as trajectories of moving objects. Study what kind of query processing methods are appropriate for large clusters. Provide two scalable indexing methods to facilitate query processing efficiently.
6
Preliminary Data Model - line segments model – A polyline in three-dimensional space. Query Types – Spatio-temporal Range Query: – Q(E s, E t ) → {S k } – Trajectory-based Query: – Q(O, E t ) → {S k }
7
Trajectory Processing Execution Overviews
8
Storage Data are grouped with key and organized in data chunks in GFS-style storage. The whole data set is divided into several parts, and each part is called a partition and assigned to one data chunk to store. Each trajectory data is assigned to at least one partition according to spatio-temporal information
9
Storage A good spatio-temporal partitioning makes the size of data per chunk is fairly uniform. Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance. Dynamic strategies can resolve load imbalance, but re-split data can cause distantly migration of large volume of data in clusters. Appropriate strategies should be trained
10
PMI (Partition based Multilevel Index) Aim to speed up spatio-temporal range queries. Generate all candidate partitions by invoking space partition strategy. Store together as key/value. – Each data chunk only contains trajectory segments that belong to the same partition. Multilevel index for each node can be built local. (using traditional centralized methods)
11
OII (Object Inverted Index) Aim to speed up trajectory based queries. Collect each object's all historical trajectories. Store together as key/value. – – Access according to key(object identifier).
12
Data Insertion
13
Query Processing Trajectory based Queries – Given any object ID, the system can locate the object's trajectory according to OII. Range Queries
14
Experimental Study Settings – Hadoop version 0.19.0 – 8 PC nodes Ubuntu Linux version 8.04 Pentium IV 1.7GHz CPU 512M memory – Java SDK 1.42 – Experiment data: Network-based Generator
15
Experiments – Load Balance Standard Deviation of Partitioning Load Balance of PRADASE
16
Experiments – Data Importing and Index Creating Data Importing with PMI Data Importing with OII
17
Experiments – Query Processing Spatio-temporal Range Query Processing with PMITrajectory Base Query Processing with OII
18
Future Works More heuristic partitioning methods. Reducing data migration between nodes. Efficient real-time query processing on Cloud infrastructure.
19
Thanks!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.