Distributed SAR Image Change Detection with OpenCL-Enabled Spark

Slides:



Advertisements
Similar presentations
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Advertisements

Spark: Cluster Computing with Working Sets
Contemporary Languages in Parallel Computing Raymond Hummel.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Storage in Big Data Systems
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Scaling up R computation with high performance computing resources.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
IMAGE PROCESSING is the use of computer algorithms to perform image process on digital images   It is used for filtering the image and editing the digital.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Image taken from: slideshare
NFV Compute Acceleration APIs and Evaluation
Big Data is a Big Deal!.
Graphics Processor Graphics Processing Unit
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.
Hadoop Aakash Kag What Why How 1.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Introduction to Distributed Platforms
R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde
Parallel Plasma Equilibrium Reconstruction Using GPU
ITCS-3190.
Enabling Effective Utilization of GPUs for Data Management Systems
An Open Source Project Commonly Used for Processing Big Data Sets
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Tutorial: Big Data Algorithms and Applications Under Hadoop
Distributed Network Traffic Feature Extraction for a Real-time IDS
Spark Presentation.
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Meng Cao, Xiangqing Sun, Ziyue Chen May 28th, 2014
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Linchuan Chen, Xin Huo and Gagan Agrawal
Ministry of Higher Education
Introduction to Spark.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan
The Basics of Apache Hadoop
CS110: Discussion about Spark
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Hadoop Technopoints.
Introduction to Apache
Overview of big data tools
Spark and Scala.
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
Apache Hadoop and Spark
MapReduce: Simplified Data Processing on Large Clusters
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed SAR Image Change Detection with OpenCL-Enabled Spark ASPLOS 2017 Distributed SAR Image Change Detection with OpenCL-Enabled Spark Linyan Qiu, Huming Zhu Xidian University zhuhum@mail.xidian.edu.cn

Design and Implementation Experiment Result and Analysis Contents PART 1 Introduction PART 2 Design and Implementation PART 3 Experiment Result and Analysis

PART 1 SAR Image Change Detection Introduction Application Area Damage assessment Natural disasters monitoring Urban planning Introduction Clustering Algorithm The picture shows the flow chart of SAR image change detection based on unsupervised clustering algorithm. SAR image T1 and SAR image T2 are taken in the same scene at different time. We use normal median filtering to reduce the additive noises in SAR images; A simple logarithm ratio method is applied to generate a Difference Image (DI); (The logarithm ratio method can suppress the influence of the multiple noises effectively). We use unsupervised clustering algorithm to part the DI into changed area and unchanged area. KFCM is used for clustering, which introduces the kernel theory making the algorithm more robust. the remotely sensed data gathered from a single satellite data center are dramatically increasing by several terabytes per day and so KFCM algorithm will consume large amounts of time and space when dealing with large-scale data. We should seek some Acceleration Technique to accelerate the algorithm KFCM Data Volume in Remote Sensing PB, EB

PART 1 Spark Introduction distributed computing framework support MapReduce model speed ease of Use generality runs everywhere Shortcoming: inefficient in computationally intensive applications Introduction Apache Spark is a distributed computing framework and is good at dealing with large-scale data processing. And it support MapReduce model Coprocessors, such as GPU, MIC, etc., which have higher computing power than CPU and are suitable for processing intensive computing tasks.

PART 1 Introduction Intel MIC integrated VPU(Vector Processing Unit) high bandwidth memory controller NVIDIA GPU strong floating-point computing power great energy consumption ratio OpenCL: cross-platform great portability Introduction Apache Spark is a distributed computing framework and is good at dealing with large-scale data processing. And it support MapReduce model Coprocessors, such as GPU, MIC, etc., which have higher computing power than CPU and are suitable for processing intensive computing tasks. MIC GPU

PART 1 Introduction Our Contributions Previous researches W. Huang and L. Meng 2016 Spark with YARN D. Manzi and D. Tompkins 2016 Spark with GPU ( PyCUDA ) P. Li and N. Zhang 2016 Spark with GPU Introduction Our contributions In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. Manzi et al. [6] proposed a method of GPU accelerating Spark by porting non-shuffling operations to GPUs and it is implemented through PyCUDA. Use GPU to accelerate Spark Have a common point : the portability to other accelerator is lacked So we combine Spark and OpenCL to design the Algorithm with portability, the Algorithm can be running on two different coprocessors(GPU and MIC) test the experiment with two different coprocessors(GPU and MIC) combine Spark and OpenCL

DESIGN AND IMPLEMENTATION PART 2 The Process of Spark-KFCM DESIGN AND IMPLEMENTATION The formulation to calculate the membership degrees is applied to every pixels. This means taking the same operation on every elements of large-scale data, and process like this is certainly suitable for Spark map phase, spark map phase belongs to Non-shuffling processes The process of updating the cluster centers mainly consists of two sum operations and a division operation, and Sum is a reduction operation, so we can implement the sum part by spark reduce phase The two main parts of Spark-KFCM: Calculate the membership matrix Calculate the cluster centers Spark Map Phase(Non-shuffling processes) Spark Reduce Phase Phase(shuffling processes

DESIGN AND IMPLEMENTATION PART 2 The Key of the System CPU-Coprocessor communication support in Spark DESIGN AND IMPLEMENTATION JNI is a tool to make java communicate with other language and can be used to realize the CPU-Accelerator communication layer on Spark worker nodes. The picture shows the CPU-Coprocessor communication procedure once calling the native method from CPU, the worker will send the data to native function and then send the data to the accelerator through PCIe. The native function is compiled into the dynamically linked library (*.so) and integrated with the accelerator through JNI Beacause of shuffle processes are expensive to port to accelerators due to the cost of redistributing the data back and forth between the accelerator and CPU We decide to offload the Non-shuffling processes to the accelerator

DESIGN AND IMPLEMENTATION PART 2 SparkCL-KFCM Steps of the SparkCL-KFCM Algorithm: 1.Read data from HDFS 2.Cache data as RDD 3.Distribute the task to workers 4.Offload the workload to the accelerator DESIGN AND IMPLEMENTATION An overview of the proposed OpenCL-enabled Spark framework is illustrated in the picture. First, Spark cluster reads data blocks from the Hadoop Distributed File System (HDFS), then converts them to appropriate data format and caches these data for subsequent distributed processing. Then Distribute the task to workers Within each Spark worker, high-efficiency native programs are implemented with OpenCL programming. By offloading the workload to the accelerator, the original Spark is extended with accelerator option on Spark worker nodes. With OpenCL, we realize the portability to different accelerators, and every accelerator process a task.

DESIGN AND IMPLEMENTATION Flow chart of SparkCL-KFCM PART 2 DESIGN AND IMPLEMENTATION The Part to Offload : Membership matrix calculation

PART 3 Environment Introduction The experiments are performed in the High Performance Computing Center in Xidian University. Spark is consisted of 5 servers, including one master node and 4 worker nodes. PART 3 Environment Introduction Hardware CPU: 2×Intel Xeon E5-2692v2 12 cores, clocked at 2.2GHZ, Memory: 8*8G ECC Registered DDR3 1600, Hard disk: 600G 2.5" 10Krpm SAS, Network: Single port FDR 56Gbps HCA card GPU: NVIDIA Kepler K20M GPU card MIC: Intel Xeon Phi 7110 MIC card Software Red Hat Enterprise Linux Server release 6.4, JDK 1.7, Scala 2.9.1.final, Hadoop 2.4.1, Spark 1.1.0

PART 3 We have used the SAR image of Yellow River as an example. The detail of datasets are shown on the following table. Algorithm with GPU has higher performance because there is only one MIC card in use. Algorithm on OpenCL-enabled Spark has a certain degree of portability and scalability. Name Dataset Size Source Acquired Time Yellow River 2048×2048 4096×4096 8192×8192 16384×16384 3 m resolution acquired by Radarsat-2 June 2008 and June 2009 Comparison of execution time of KFCM on OpenCL-enabled Spark with GPU and MIC(s) data sets 2048 4096 8192 16384 Spark with GPU /s 34 36 52 128 Spark with MIC /s 77

Experimental Results the chosen dataset: 8192×8192 data set The speedup ratio is from 1.75 to 3.64 when GPUs are enabled for fine-gained computing. the chosen dataset: 8192×8192 data set the baseline setup: Spark 4 core cluster

Experimental Results data sets 2048 4096 8192 16384 Spark with GPU /s Comparison of execution time of KFCM on OpenCL-enabled Spark with GPU and MIC(s) With the size of data sets increasing, the execution time of the algorithm grows almost linearly. Algorithm with GPU has higher performance because there is only one MIC card in use. Algorithm on OpenCL-enabled Spark has certain portability and scalability. 17 66 267 1125 data sets 2048 4096 8192 16384 Spark with GPU /s 34 36 52 128 Spark with MIC /s 77

ASPLOS 2017 THANK YOU FOR WATCHING