Download presentation
Presentation is loading. Please wait.
Published byDorthy Leonard Modified over 9 years ago
1
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G0222 2013.06.11 Authors: Huan Liu, Dan Orban Accenture Technology Labs {huan.liu, dan.orban}@accenture.com 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
2
Outline 1.INTRODUCTION 2.CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 3.PROS AND CONS OF CLOUD MAPREDUCE 4. EXPERIMENTAL EVALUATION 5. CONCLUSION 2
3
INTRODUCTION Like a server Operating System (OS), a cloud OS is responsible for managing resources. In a server (e.g., a PC), the OS is responsible for managing the various hardware resources, such as CPU, memory, disks, network interfaces – everything inside a server’s chassis. 3
4
INTRODUCTION Instead of managing a single machine’s resources, a cloud OS is responsible for managing the cloud infrastructure, hiding the cloud infrastructure details from the application programmers and coordinating the sharing of the limited resources. But unlike a traditional OS, a cloud OS it much more complex, not only because it has to manage a much bigger infrastructure, but also because it has to serve many more customers. 4
5
INTRODUCTION We have implemented the MapReduce[1] programming model using services provided by the Amazon cloud OS. 5
6
INTRODUCTION A. Cloud OS B. Challenges posed by a cloud OS C. Advantages of Cloud MapReduce o Incremental scalability o Symmetry and Decentralization o Heterogeneity D. Contributions 6
7
INTRODUCTION A.Cloud OS First, it provides compute services, such as Amazon EC2 and Windows Azure workers. Second, it provides storage services, such as Amazon S3 and Windows Azure blob storage. Third, a cloud OS provides communication services, such as Amazon’s Simple Queue Service (SQS) and Windows Azure queue service, which are similar to a pipe on a UNIX OS, where a user can push in messages at one end and pop out messages at the other end. 7
8
INTRODUCTION Last, a cloud OS also provides persistent storage services, such as Amazon’s SimpleDB and Windows Azure table services. 8
9
INTRODUCTION B. Challenges posed by a cloud OS A cloud OS’ scalability comes at a price. It has to be traded off with other desirable system properties. 9
10
INTRODUCTION C. Advantages of Cloud MapReduce By using queues, we easily parallelize the Map and the Shuffling stages. By using Amazon’s visibility timeout mechanism, we easily implement fault-tolerance. By leveraging a cloud OS’s fully distributed implementation, we are able to implement a fully distributed architecture with no single point of failure and scalability bottleneck. 10
11
INTRODUCTION D. Contributions First, we propose, implement and evaluate a new architecture for the MapReduce programming model on top of a cloud OS. The architecture also uses queues to shuffle results from Map to Reduce. 11
12
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 12
13
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION First, it is a synchronization point where workers (a process running on an instance) can coordinate job assignments. Second, a queue serves as a decoupling mechanism to coordinate data flow between different stages. Lastly, we use SimpleDB, which serves as the central job coordination point in our fully distributed implementation. 13
14
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION Cloud challenges and our general solution approaches Long latency : Since Amazon services are accessed through the network, the latency could be significant. In our measurement, SQS latency ranges from 20ms to 100ms even from within EC2. Horizontal scaling : Although all Amazon cloud services are based on horizontal scaling, we are only able to observe one concrete manifestation: when using SimpleDB, each SimpleDB domain is only able to sustain a small write throughput. 14
15
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION Failure detection/recovery and conflict resolution We use SQS’s visibility timeout mechanism for failure detection and recovery. 15
16
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION The user defined Map function must implement the following interface. Pull iterator with sorting: In a pull iterator implementation, the user defined reduce function must implement the following interface. 16
17
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION The first is the start interface. For example, for the word count example, the start function initializes a count variable in object T and sets its value to 0. 17
18
CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION For example, in the word count example, the reduce function converts the string to a numerical value, then adds the value to the count variable stored in T. 18
19
PROS AND CONS OF CLOUD MAPREDUCE CMR is simpler for several reasons, including the following. First, S3 presents a large and reliable file storage abstraction, which alleviates us from having to design our own file system. Second, SimpleDB presents a high bandwidth status vault, which can sustain a high read and write (through striping) throughput. 19
20
PROS AND CONS OF CLOUD MAPREDUCE Third, both S3 and SQS present a single point of contact that is capable of sustaining a high throughput. We no longer need to worry about communicating with many nodes at the same time. Last, we simply use Amazon’s visibility timeout mechanism to handle failure. No extra logic is needed to detect and recover from failure. 20
21
EXPERIMENTAL EVALUATION 21
22
EXPERIMENTAL EVALUATION 22
23
EXPERIMENTAL EVALUATION 23
24
EXPERIMENTAL EVALUATION 24
25
CONCLUSION It is far from obvious that we can simplify large- scale systems ’ design and implementation if we build them on top of a cloud OS. Using MapReduce as an example, we have demonstrated that it is possible to overcome the cloud limitations without performance degradation. 25
26
CONCLUSION The architecture also uses queues to shuffle results from Map to Reduce. Even though a full scale performance evaluation is beyond the scope of this paper, our preliminary results indicate that CMR is a practical system and its performance is on par with that of Hadoop. Our experimental results also indicate that using queues to overlap the map and shuffling stage seems to be a promising approach to improve MapReduce performance. 26
27
GG END TY 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.