Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International.

Slides:



Advertisements
Similar presentations
Cloud Service Models and Performance Ang Li 09/13/2010.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
來源: 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing 作者: Jagmohan Chauhan, Shaiful Alam Chowdhury and Dwight.
Amazon Web Services and Eucalyptus
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Seafile - Scalable Cloud Storage System
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
New Challenges in Cloud Datacenter Monitoring and Management
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
HDFS Hadoop Distributed File System
SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.
Improving Network I/O Virtualization for Cloud Computing.
Hadoop 2 cluster with Oracle Solaris Zones, ZFS and unified archives Orgad Kimchi - Principal Software Engineer September 29, 2014 Oracle Confidential.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
Mining High Utility Itemset in Big Data
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
A Hierarchical MapReduce Framework Yuan Luo and Beth Plale School of Informatics and Computing, Indiana University Data To Insight Center, Indiana University.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
S-Paxos: Eliminating the Leader Bottleneck
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Web Log Data Analytics with Hadoop
A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Load Rebalancing for Distributed File Systems in Clouds.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
Introduction to Load Balancing:
Diskpool and cloud storage benchmarks used in IT-DSS
Distributed Network Traffic Feature Extraction for a Real-time IDS
PA an Coordinated Memory Caching for Parallel Jobs
Hadoop Clusters Tess Fulkerson.
MapReduce: Data Distribution for Reduce
Building a Database on S3
MapReduce.
Hadoop Technopoints.
Distributed computing deals with hardware
Process Migration Troy Cogburn and Gilbert Podell-Blume
Presentation transcript:

Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福 徐光成 章博遠 2011, 11th IEEE/ACM International Symposium on Huan Liu, Dan Orban Accenture Technology Labs 1

OUTLINE I. Introduction II. Cloud MapReduceArchitecture & Implementation III. Pros & Cons of Cloud MapReduce IV. Experimental Evaluation V. Conclusions & Future Works VI. References 江嘉福 徐光成 章博遠

INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS 3. Cloud MapReduce? 4. Advantages of Cloud MapReduce 江嘉福 徐光成 章博遠

What is Cloud OS ? 1.Managing the low level cloud resources 2.Presenting a high level interface to the application programmers 3.key difference : scalable 圖一 江嘉福 徐光成 章博遠

Challenges posed by a cloud OS 1.Scalability comes at a price. 2. Data consistency, system availability, and tolerance to network partition. 圖二 江嘉福 徐光成 章博遠

Cloud MapReduce? 1.MapReduce programming model 2.horizontal scaling 3.eventual consistency 4.overcome limitations 江嘉福 徐光成 章博遠

Advantages of Cloud MapReduce 1.Incremental scalability: Can scale incrementally in the number of computing nodes. 2.Symmetry and Decentralization: Node has the same set of responsibilities. 3.Heterogeneity: Nodes have varying computation capacity 江嘉福 徐光成 章博遠

Cloud MapReduceArchitecture and Implementation 1.The architecture 2.Cloud challnenges 3.General solution approaches 江嘉福 徐光成 章博遠

The Architecture 江嘉福 徐光成 章博遠

Cloud challenges & General solution approaches 1.Long latency 2.Horizontal scaling 3.Don’t know when a queue is created for the first time 江嘉福 徐光成 章博遠

Con’t 4.Duplicate message 5.Potential node failure 6.Indeterminstic eventual consistency windows 江嘉福 徐光成 章博遠

Pros ● 3000 lines of Java code(L.O.C) vs Hadoop L.O.C ● Large & Reliable FS ● High Bandwidth(fast read/write) ● Single point of contact(high throughput) 江嘉福 徐光成 章博遠

Cons ● Uses only network(no local storage) ● Leads to bottleneck 江嘉福 徐光成 章博遠

Evaluation Almost twice as fast! 江嘉福 徐光成 章博遠

Evaluation ● Hadoop - 385s total, network/CPU under utilized ● CMR - 210s, more efficient network/CPU usage 江嘉福 徐光成 章博遠

Evaluation Wiki Word Count ● Combiner: Hadoop - 747s CMR - 436s ● No Combiner: Hadoop s CMR s 江嘉福 徐光成 章博遠

Evaluation Amazon ● Word Count -> 400GB using 100 nodes ● Approx. 1hr ● 983,152 Requests -> $0.98 ● Using SimpleDB? ● 3.7hrs -> $ 江嘉福 徐光成 章博遠

Evaluation Comparison ● Distributed Grep Word Count -> 13GB of data ● CMR = 962 seconds ● Hadoop 1047 seconds ● Results are almost the same, why? ● More CPU intensive tasks 江嘉福 徐光成 章博遠

Evaluation 12GB HTML files ● Hadoop -> 6hrs+ ● CMR -> 297 seconds ● Hadoop - High overhead from task creation 江嘉福 徐光成 章博遠

Conclusion ● Cloud cannot be implemented on any system ● Poor Performance ● CMR techniques overcome cloud limitations ● 0 Performance Degradation ● Good to use for other systems 江嘉福 徐光成 章博遠

REFERENCES 圖一: 圖二: 江嘉福 徐光成 章博遠