Presentation is loading. Please wait.

Presentation is loading. Please wait.

SpatialHadoop: A MapReduce Framework for Spatial Data

Similar presentations


Presentation on theme: "SpatialHadoop: A MapReduce Framework for Spatial Data"— Presentation transcript:

1 SpatialHadoop: A MapReduce Framework for Spatial Data
Ahmed Eldawy and Mohamed F. Mokbel ICDE 2015 Presented by: Tin K. Vu Oct 20, 2016

2 Introduction Related works SpatialHadoop Architecture Experiments Conclusion & comments Future work

3 Introduction

4 Big Data

5 Spatial Data Satellites Smartphone Medical devices

6 Big Spatial Data

7 Big Spatial Data

8 Volume: data from hundreds of GB to TB. Velocity: fast data.
Big Spatial Data? Volume: data from hundreds of GB to TB. Velocity: fast data. Variety: data from different sources.

9 Support spatial queries.
Big Spatial Data? Distributed systems High scalability Support spatial queries.

10 Related works

11 Parallel-Secondo MD-HBase Hadoop-GIS Related works Parallel-Secondo
Hadoop is employed as a blackbox

12 SpatialHadoop Architecture

13 Full-fledged MapReduce framework. Native support for spatial data.
SpatialHadoop Full-fledged MapReduce framework. Native support for spatial data. Injects spatial data awareness in Hadoop.

14 Architecture

15 Storage layer

16 This slide is copied from a presentation of Prof. Ahmed in ICDE2015

17 Index structures: Grid, R-Tree, R+-Tree
Indexing process was done by MapReduce.

18 Index Building: Grid Apply for uniform data.
Number of partitions = size / block capacity

19 Index Building: R-Tree
Apply for skewed data. Partition process: Step 1: sampling to find partition boundaries.

20 Index Building: R-Tree
Apply for skewed data. Partition process: Step 1: Sampling to find partition boundaries. Step 2: Scan input file, insert each record to its partition. Step 3: Local indexing for each partition.

21 MapReduce layer MapReduce in Hadoop

22 MapReduce in SpatialHadoop
MapReduce layer MapReduce in SpatialHadoop

23 MapReduce layer Improvements when comparing to Hadoop?
Spatial File Splitter: exploits the global index by pruning non-relevant partitions. Spatial Record Reader: exploits local indexes by accessing records more efficiently.

24 Operations layer Basic operations Range query kNN Spatial join
Computational geometry operations Polygon Union Skyline Convex Hull Farthest/Closest pair Other operations could be added

25 Range query

26 Range query SpatialFileSplitter prunes blocks outside the query range.

27 Range query SpatialFileSplitter prunes blocks outside the query range.
SpatialRecordReader passes local indexes to the map function.

28 Range query SpatialFileSplitter prunes blocks outside the query range.
SpatialRecordReader passes local indexes to the map function. Map function selects records in range.

29 kNN

30 kNN SpatialFileSplitter selects the block that contains the query point.

31 kNN SpatialFileSplitter selects the block that contains the query point. Map function performs kNN in the selected block.

32 kNN SpatialFileSplitter selects the block that contains the query point. Map function performs kNN in the selected block. Check result.

33 kNN SpatialFileSplitter selects the block that contains the query point. Map function performs kNN in the selected block. Check result.

34 kNN SpatialFileSplitter selects the block that contains the query point. Map function performs kNN in the selected block. Check result. Revise result.

35 Language layer: Pigeon
Hides the complexity of the system with a high level language Extends Pig Latin with OGC-compliant primitives Spatial data types (e.g., Polygon) Basic operations (e.g., Area) Spatial predicates (e.g., Touches) Spatial analysis (e.g., Union) Spatial aggregate functions (e.g., Convex Hull) This slide is copied from a presentation of Prof. Ahmed in ICDE2015

36 Spatial Data types Load a file with spatial attributes
Perform primitive operations This slide is copied from a presentation of Prof. Ahmed in ICDE2015

37 Spatial operations Range query kNN Spatial join
This slide is copied from a presentation of Prof. Ahmed in ICDE2015

38 Experiments

39 Experiments Goal: Evaluate the scalability and efficiency of SpatialHadoop compared to traditional Hadoop. Hardware: Amazon EC2 cluster of up to 100 nodes (default is 20 nodes). Datasets TIGER files (US Map) with up 60 GB (70M polygons). OpenStreetMap data of up 70GB (164M polygons). Generated data of up to 128GB (2 Billion rectangles). Satellite data of up to 4.6 TB (120 Billion points).

40 Performance with query size on TIGER data)

41 Scalability with input size (Generated Data)

42 Spatial Join performance with TIGER files

43 Indexing time with satellite data

44 Conclusion & comments

45 Support big spatial data with a distributed, large-scale system.
Conclusion Support big spatial data with a distributed, large-scale system. Overcome limitations of previous systems. Provide efficient tools to work with spatial data.

46 Comments SpatialHadoop only support static data. Indexing process must be executed before other tasks. Data types may be more complex: check-in data (complex point), road network with traffic jams (complex polygon). Thus, indexing process may depend on additional data instead of primitive data types.

47 Future work

48 Dynamic indexing for spatial data.
Future work Dynamic indexing for spatial data. Support other demands: location-based search, routing with cost...

49 Thank you!


Download ppt "SpatialHadoop: A MapReduce Framework for Spatial Data"

Similar presentations


Ads by Google