Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G0222 2013.06.11 Authors: Huan Liu, Dan Orban Accenture.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

Cloud Computing Development. Shallow Introduction.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
SDN Controller Challenges
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Spark: Cluster Computing with Working Sets
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Data-intensive Computing on the Cloud: Concepts, Technologies and Applications B. Ramamurthy This talks is partially supported by National.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Ajou University, South Korea ICSOC 2003 “Disconnected Operation Service in Mobile Grid Computing” Disconnected Operation Service in Mobile Grid Computing.
Ch 4. The Evolution of Analytic Scalability
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
MapReduce.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Computer System Architectures Computer System Software
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.
Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : Speaker : Sian-Lin Hong IEEE International.
RUNNING PARALLEL APPLICATIONS BEYOND EP WORKLOADS IN DISTRIBUTED COMPUTING ENVIRONMENTS Zholudev Yury.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福 徐光成 章博遠 2011, 11th IEEE/ACM International.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
Server to Server Communication Redis as an enabler Orion Free
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation Presented by Alain Roy, University of Chicago With.
Background Computer System Architectures Computer System Software.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Distributed Platforms
Distributed System 電機四 陳伯翰 b
Ministry of Higher Education
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
Building a Database on S3
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
Lecture 16 (Intro to MapReduce and Hadoop)
MapReduce: Simplified Data Processing on Large Clusters
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture Technology Labs {huan.liu, th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Outline 1.INTRODUCTION 2.CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 3.PROS AND CONS OF CLOUD MAPREDUCE 4. EXPERIMENTAL EVALUATION 5. CONCLUSION 2

INTRODUCTION Like a server Operating System (OS), a cloud OS is responsible for managing resources. In a server (e.g., a PC), the OS is responsible for managing the various hardware resources, such as CPU, memory, disks, network interfaces – everything inside a server’s chassis. 3

INTRODUCTION Instead of managing a single machine’s resources, a cloud OS is responsible for managing the cloud infrastructure, hiding the cloud infrastructure details from the application programmers and coordinating the sharing of the limited resources. But unlike a traditional OS, a cloud OS it much more complex, not only because it has to manage a much bigger infrastructure, but also because it has to serve many more customers. 4

INTRODUCTION We have implemented the MapReduce[1] programming model using services provided by the Amazon cloud OS. 5

INTRODUCTION A. Cloud OS B. Challenges posed by a cloud OS C. Advantages of Cloud MapReduce o Incremental scalability o Symmetry and Decentralization o Heterogeneity D. Contributions 6

INTRODUCTION A.Cloud OS First, it provides compute services, such as Amazon EC2 and Windows Azure workers. Second, it provides storage services, such as Amazon S3 and Windows Azure blob storage. Third, a cloud OS provides communication services, such as Amazon’s Simple Queue Service (SQS) and Windows Azure queue service, which are similar to a pipe on a UNIX OS, where a user can push in messages at one end and pop out messages at the other end. 7

INTRODUCTION Last, a cloud OS also provides persistent storage services, such as Amazon’s SimpleDB and Windows Azure table services. 8

INTRODUCTION B. Challenges posed by a cloud OS A cloud OS’ scalability comes at a price. It has to be traded off with other desirable system properties. 9

INTRODUCTION C. Advantages of Cloud MapReduce By using queues, we easily parallelize the Map and the Shuffling stages. By using Amazon’s visibility timeout mechanism, we easily implement fault-tolerance. By leveraging a cloud OS’s fully distributed implementation, we are able to implement a fully distributed architecture with no single point of failure and scalability bottleneck. 10

INTRODUCTION D. Contributions First, we propose, implement and evaluate a new architecture for the MapReduce programming model on top of a cloud OS. The architecture also uses queues to shuffle results from Map to Reduce. 11

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 12

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION First, it is a synchronization point where workers (a process running on an instance) can coordinate job assignments. Second, a queue serves as a decoupling mechanism to coordinate data flow between different stages. Lastly, we use SimpleDB, which serves as the central job coordination point in our fully distributed implementation. 13

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION Cloud challenges and our general solution approaches Long latency : Since Amazon services are accessed through the network, the latency could be significant. In our measurement, SQS latency ranges from 20ms to 100ms even from within EC2. Horizontal scaling : Although all Amazon cloud services are based on horizontal scaling, we are only able to observe one concrete manifestation: when using SimpleDB, each SimpleDB domain is only able to sustain a small write throughput. 14

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION Failure detection/recovery and conflict resolution We use SQS’s visibility timeout mechanism for failure detection and recovery. 15

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION The user defined Map function must implement the following interface. Pull iterator with sorting: In a pull iterator implementation, the user defined reduce function must implement the following interface. 16

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION The first is the start interface. For example, for the word count example, the start function initializes a count variable in object T and sets its value to 0. 17

CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION For example, in the word count example, the reduce function converts the string to a numerical value, then adds the value to the count variable stored in T. 18

PROS AND CONS OF CLOUD MAPREDUCE CMR is simpler for several reasons, including the following. First, S3 presents a large and reliable file storage abstraction, which alleviates us from having to design our own file system. Second, SimpleDB presents a high bandwidth status vault, which can sustain a high read and write (through striping) throughput. 19

PROS AND CONS OF CLOUD MAPREDUCE Third, both S3 and SQS present a single point of contact that is capable of sustaining a high throughput. We no longer need to worry about communicating with many nodes at the same time. Last, we simply use Amazon’s visibility timeout mechanism to handle failure. No extra logic is needed to detect and recover from failure. 20

EXPERIMENTAL EVALUATION 21

EXPERIMENTAL EVALUATION 22

EXPERIMENTAL EVALUATION 23

EXPERIMENTAL EVALUATION 24

CONCLUSION It is far from obvious that we can simplify large- scale systems ’ design and implementation if we build them on top of a cloud OS. Using MapReduce as an example, we have demonstrated that it is possible to overcome the cloud limitations without performance degradation. 25

CONCLUSION The architecture also uses queues to shuffle results from Map to Reduce. Even though a full scale performance evaluation is beyond the scope of this paper, our preliminary results indicate that CMR is a practical system and its performance is on par with that of Hadoop. Our experimental results also indicate that using queues to overlap the map and shuffling stage seems to be a promising approach to improve MapReduce performance. 26

GG END TY 27