Download presentation
Presentation is loading. Please wait.
Published byProsper Elliott Modified over 6 years ago
1
Distributed SAR Image Change Detection with OpenCL-Enabled Spark
ASPLOS 2017 Distributed SAR Image Change Detection with OpenCL-Enabled Spark Linyan Qiu, Huming Zhu Xidian University
2
Design and Implementation Experiment Result and Analysis
Contents PART 1 Introduction PART 2 Design and Implementation PART 3 Experiment Result and Analysis
3
PART 1 SAR Image Change Detection Introduction Application Area
Damage assessment Natural disasters monitoring Urban planning Introduction Clustering Algorithm The picture shows the flow chart of SAR image change detection based on unsupervised clustering algorithm. SAR image T1 and SAR image T2 are taken in the same scene at different time. We use normal median filtering to reduce the additive noises in SAR images; A simple logarithm ratio method is applied to generate a Difference Image (DI); (The logarithm ratio method can suppress the influence of the multiple noises effectively). We use unsupervised clustering algorithm to part the DI into changed area and unchanged area. KFCM is used for clustering, which introduces the kernel theory making the algorithm more robust. the remotely sensed data gathered from a single satellite data center are dramatically increasing by several terabytes per day and so KFCM algorithm will consume large amounts of time and space when dealing with large-scale data. We should seek some Acceleration Technique to accelerate the algorithm KFCM Data Volume in Remote Sensing PB, EB
4
PART 1 Spark Introduction distributed computing framework
support MapReduce model speed ease of Use generality runs everywhere Shortcoming: inefficient in computationally intensive applications Introduction Apache Spark is a distributed computing framework and is good at dealing with large-scale data processing. And it support MapReduce model Coprocessors, such as GPU, MIC, etc., which have higher computing power than CPU and are suitable for processing intensive computing tasks.
5
PART 1 Introduction Intel MIC integrated VPU(Vector Processing Unit)
high bandwidth memory controller NVIDIA GPU strong floating-point computing power great energy consumption ratio OpenCL: cross-platform great portability Introduction Apache Spark is a distributed computing framework and is good at dealing with large-scale data processing. And it support MapReduce model Coprocessors, such as GPU, MIC, etc., which have higher computing power than CPU and are suitable for processing intensive computing tasks. MIC GPU
6
PART 1 Introduction Our Contributions Previous researches
W. Huang and L. Meng Spark with YARN D. Manzi and D. Tompkins Spark with GPU ( PyCUDA ) P. Li and N. Zhang Spark with GPU Introduction Our contributions In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. Manzi et al. [6] proposed a method of GPU accelerating Spark by porting non-shuffling operations to GPUs and it is implemented through PyCUDA. Use GPU to accelerate Spark Have a common point : the portability to other accelerator is lacked So we combine Spark and OpenCL to design the Algorithm with portability, the Algorithm can be running on two different coprocessors(GPU and MIC) test the experiment with two different coprocessors(GPU and MIC) combine Spark and OpenCL
7
DESIGN AND IMPLEMENTATION
PART 2 The Process of Spark-KFCM DESIGN AND IMPLEMENTATION The formulation to calculate the membership degrees is applied to every pixels. This means taking the same operation on every elements of large-scale data, and process like this is certainly suitable for Spark map phase, spark map phase belongs to Non-shuffling processes The process of updating the cluster centers mainly consists of two sum operations and a division operation, and Sum is a reduction operation, so we can implement the sum part by spark reduce phase The two main parts of Spark-KFCM: Calculate the membership matrix Calculate the cluster centers Spark Map Phase(Non-shuffling processes) Spark Reduce Phase Phase(shuffling processes
8
DESIGN AND IMPLEMENTATION
PART 2 The Key of the System CPU-Coprocessor communication support in Spark DESIGN AND IMPLEMENTATION JNI is a tool to make java communicate with other language and can be used to realize the CPU-Accelerator communication layer on Spark worker nodes. The picture shows the CPU-Coprocessor communication procedure once calling the native method from CPU, the worker will send the data to native function and then send the data to the accelerator through PCIe. The native function is compiled into the dynamically linked library (*.so) and integrated with the accelerator through JNI Beacause of shuffle processes are expensive to port to accelerators due to the cost of redistributing the data back and forth between the accelerator and CPU We decide to offload the Non-shuffling processes to the accelerator
9
DESIGN AND IMPLEMENTATION
PART 2 SparkCL-KFCM Steps of the SparkCL-KFCM Algorithm: 1.Read data from HDFS 2.Cache data as RDD 3.Distribute the task to workers 4.Offload the workload to the accelerator DESIGN AND IMPLEMENTATION An overview of the proposed OpenCL-enabled Spark framework is illustrated in the picture. First, Spark cluster reads data blocks from the Hadoop Distributed File System (HDFS), then converts them to appropriate data format and caches these data for subsequent distributed processing. Then Distribute the task to workers Within each Spark worker, high-efficiency native programs are implemented with OpenCL programming. By offloading the workload to the accelerator, the original Spark is extended with accelerator option on Spark worker nodes. With OpenCL, we realize the portability to different accelerators, and every accelerator process a task.
10
DESIGN AND IMPLEMENTATION
Flow chart of SparkCL-KFCM PART 2 DESIGN AND IMPLEMENTATION The Part to Offload : Membership matrix calculation
11
PART 3 Environment Introduction
The experiments are performed in the High Performance Computing Center in Xidian University. Spark is consisted of 5 servers, including one master node and 4 worker nodes. PART 3 Environment Introduction Hardware CPU: 2×Intel Xeon E5-2692v2 12 cores, clocked at 2.2GHZ, Memory: 8*8G ECC Registered DDR3 1600, Hard disk: 600G 2.5" 10Krpm SAS, Network: Single port FDR 56Gbps HCA card GPU: NVIDIA Kepler K20M GPU card MIC: Intel Xeon Phi 7110 MIC card Software Red Hat Enterprise Linux Server release 6.4, JDK 1.7, Scala final, Hadoop 2.4.1, Spark 1.1.0
12
PART 3 We have used the SAR image of Yellow River as an example. The detail of datasets are shown on the following table. Algorithm with GPU has higher performance because there is only one MIC card in use. Algorithm on OpenCL-enabled Spark has a certain degree of portability and scalability. Name Dataset Size Source Acquired Time Yellow River 2048×2048 4096×4096 8192×8192 16384×16384 3 m resolution acquired by Radarsat-2 June 2008 and June 2009 Comparison of execution time of KFCM on OpenCL-enabled Spark with GPU and MIC(s) data sets 2048 4096 8192 16384 Spark with GPU /s 34 36 52 128 Spark with MIC /s 77
13
Experimental Results the chosen dataset: 8192×8192 data set
The speedup ratio is from 1.75 to 3.64 when GPUs are enabled for fine-gained computing. the chosen dataset: 8192×8192 data set the baseline setup: Spark 4 core cluster
14
Experimental Results data sets 2048 4096 8192 16384 Spark with GPU /s
Comparison of execution time of KFCM on OpenCL-enabled Spark with GPU and MIC(s) With the size of data sets increasing, the execution time of the algorithm grows almost linearly. Algorithm with GPU has higher performance because there is only one MIC card in use. Algorithm on OpenCL-enabled Spark has certain portability and scalability. data sets 2048 4096 8192 16384 Spark with GPU /s 34 36 52 128 Spark with MIC /s 77
15
ASPLOS 2017 THANK YOU FOR WATCHING
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.