CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the.

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR.
Availability in Globally Distributed Storage Systems
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.
IMapReduce: A Distributed Computing Framework for Iterative Computation Yanfeng Zhang, Northeastern University, China Qixin Gao, Northeastern University,
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
A Platform for Fine-Grained Resource Sharing in the Data Center
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Network Support for Cloud Services Lixin Gao, UMass Amherst.
Pregel: A System for Large-Scale Graph Processing
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Windows Azure Role Cloud Computing Soup to Nuts Mike Benkovich Microsoft Corporation btlod-71.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
An Architecture for Video Surveillance Service based on P2P and Cloud Computing Yu-Sheng Wu, Yue-Shan Chang, Tong-Ying Juang, Jing-Shyang Yen speaker:
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福 徐光成 章博遠 2011, 11th IEEE/ACM International.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Mostafa Abdollahi Mazandaran University Of Science And Technology January 2011.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Molecular Transactions G. Ramalingam Kapil Vaswani Rigorous Software Engineering, MSRI.
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Operating System for the Cloud Runs applications in the cloud Provides Storage Application Management Windows Azure ideal for applications needing:
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft
Chen Qian, Xin Li University of Kentucky
Hadoop Aakash Kag What Why How 1.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Introduction to Spark.
Applying Twister to Scientific Applications
Outline Virtualization Cloud Computing Microsoft Azure Platform
CS110: Discussion about Spark
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Twister4Azure : Iterative MapReduce for Azure Cloud
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Data-Intensive Computing: From Clouds to GPU Clusters
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Parallel Programming in C with MPI and OpenMP
Fast, Interactive, Language-Integrated Cluster Computing
Presentation transcript:

CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the Cloud

Wei LuJared JacksonRoger Barga Ankur Dave

Background MapReduce and its successors reimplement communication and storage But cloud services like EC2 and Azure natively provide these utilities Can we build an efficient distributed application using only the primitives of the cloud?

Design Goals Efficient Fault tolerant Cloud-native

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Windows Azure VM instances Blob storage Queue storage

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

K-Means algorithm Centroids Initialize Assign points to centroids Recalculate centroids Points moved?

K-Means algorithm Initialize Partition work Recalculate centroids Points moved? Assign points to centroids Compute partial sums

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Architecture Initialize Partition work Recalculate centroids Points moved? Assign points to centroids Compute partial sums

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Data locality Multiple-queue pattern unlocks data locality Tradeoff: Complicates fault tolerance Single-queue pattern is naturally fault- tolerant Problem: Not suitable for iterative computation

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Handling failure Initialize Partition work Recalculate centroids Points moved? Assign points to centroids Compute partial sums Stall?

Buddy system Buddy system provides distributed fault detection and recovery

Buddy system Spreading buddies across Azure fault domains provides increased resilience to simultaneous failure Fault domain

Buddy system Cascaded failure detection reduces communication and improves resilience … vs.

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Evaluation Linear speedup with instance count

Evaluation Sublinear speedup with instance size Reason: I/O bandwidth doesn’t scale

Outline Windows Azure K-Means clustering algorithm CloudClustering architecture – Data locality – Buddy system Evaluation Related work

Existing frameworks (MapReduce, Dryad, Spark, MPI, …) all support K-Means, but reimplement reliable communication AzureBlast built directly on cloud services, but algorithm is not iterative

Conclusions CloudClustering shows that it's possible to build efficient, resilient applications using only the common cloud services Multiple-queue pattern unlocks data locality Buddy system provides fault tolerance