Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福 徐光成 章博遠 2011, 11th IEEE/ACM International Symposium on Huan Liu, Dan Orban Accenture Technology Labs 1
OUTLINE I. Introduction II. Cloud MapReduceArchitecture & Implementation III. Pros & Cons of Cloud MapReduce IV. Experimental Evaluation V. Conclusions & Future Works VI. References 江嘉福 徐光成 章博遠
INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS 3. Cloud MapReduce? 4. Advantages of Cloud MapReduce 江嘉福 徐光成 章博遠
What is Cloud OS ? 1.Managing the low level cloud resources 2.Presenting a high level interface to the application programmers 3.key difference : scalable 圖一 江嘉福 徐光成 章博遠
Challenges posed by a cloud OS 1.Scalability comes at a price. 2. Data consistency, system availability, and tolerance to network partition. 圖二 江嘉福 徐光成 章博遠
Cloud MapReduce? 1.MapReduce programming model 2.horizontal scaling 3.eventual consistency 4.overcome limitations 江嘉福 徐光成 章博遠
Advantages of Cloud MapReduce 1.Incremental scalability: Can scale incrementally in the number of computing nodes. 2.Symmetry and Decentralization: Node has the same set of responsibilities. 3.Heterogeneity: Nodes have varying computation capacity 江嘉福 徐光成 章博遠
Cloud MapReduceArchitecture and Implementation 1.The architecture 2.Cloud challnenges 3.General solution approaches 江嘉福 徐光成 章博遠
The Architecture 江嘉福 徐光成 章博遠
Cloud challenges & General solution approaches 1.Long latency 2.Horizontal scaling 3.Don’t know when a queue is created for the first time 江嘉福 徐光成 章博遠
Con’t 4.Duplicate message 5.Potential node failure 6.Indeterminstic eventual consistency windows 江嘉福 徐光成 章博遠
Pros ● 3000 lines of Java code(L.O.C) vs Hadoop L.O.C ● Large & Reliable FS ● High Bandwidth(fast read/write) ● Single point of contact(high throughput) 江嘉福 徐光成 章博遠
Cons ● Uses only network(no local storage) ● Leads to bottleneck 江嘉福 徐光成 章博遠
Evaluation Almost twice as fast! 江嘉福 徐光成 章博遠
Evaluation ● Hadoop - 385s total, network/CPU under utilized ● CMR - 210s, more efficient network/CPU usage 江嘉福 徐光成 章博遠
Evaluation Wiki Word Count ● Combiner: Hadoop - 747s CMR - 436s ● No Combiner: Hadoop s CMR s 江嘉福 徐光成 章博遠
Evaluation Amazon ● Word Count -> 400GB using 100 nodes ● Approx. 1hr ● 983,152 Requests -> $0.98 ● Using SimpleDB? ● 3.7hrs -> $ 江嘉福 徐光成 章博遠
Evaluation Comparison ● Distributed Grep Word Count -> 13GB of data ● CMR = 962 seconds ● Hadoop 1047 seconds ● Results are almost the same, why? ● More CPU intensive tasks 江嘉福 徐光成 章博遠
Evaluation 12GB HTML files ● Hadoop -> 6hrs+ ● CMR -> 297 seconds ● Hadoop - High overhead from task creation 江嘉福 徐光成 章博遠
Conclusion ● Cloud cannot be implemented on any system ● Poor Performance ● CMR techniques overcome cloud limitations ● 0 Performance Degradation ● Good to use for other systems 江嘉福 徐光成 章博遠
REFERENCES 圖一: 圖二: 江嘉福 徐光成 章博遠