Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.

Similar presentations


Presentation on theme: "Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013."— Presentation transcript:

1 Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

2 Agenda Hadoop YARN – Hub for Big Data Applications YARN and Cloud Computing HVE (Hadoop Virtualization Extension) work on YARN

3 Hadoop MapReduce v1 (Classic) JobTracker – Manage cluster resources and job scheduling TaskTracker – Per node agent – Manage tasks

4 MapReduce v1 Limitations Scalability – Manage cluster resources and job scheduling SPOF (Single Point Of Failure) JobTracker failure cause all queued and running job failure – Restart is very tricky due to complex state Hard partition of resources into map and reduce slots – Low resource utilization Lacks support for alternate paradigms Lack of wire-compatible protocols

5 YARN Architecture Splits up the two major functions of JobTracker – Resource Manager (RM) - Cluster resource management – Application Master (AM) - Task scheduling and monitoring NodeManager (NM) - A new per-node slave – launching the applications’ containers – monitoring their resource usage (cpu, memory) and reporting to the Resource Manager. YARN maintains compatibility with existing MapReduce application and support other applications

6 YARN – Hub for Big Data Applications YARN MapReduceTez HDFS Storm Spark HBase Impala OpenMPI Distributed Shell App-specific AM HOYA (Hbase On YArn) – Long running services (YARN-896) LLAMA (Low Latency Application MAster) – Gang Scheduler (YARN-624)

7 Two different prospective: – YARN-centric prospective YARN is the key platform to apps YARN is independent of infrastructure, running on top of Cloud shows YARN’s generality – Cloud-centric prospective YARN is an umbrella kind of applications Supporting YARN shows Cloud’s generality YARN and Cloud

8 YARN and Cloud: YARN-centric Prospective YARN Bare-metal machines MapReduceTez Storm Spark HBase Impala Open MPI Distributed Shell VMware Open Stack Infrastructure Big Data Apps … … Cloud Infrastructure …

9 YARN and Cloud: Cloud-centric Prospective YARN MapReduce TezStorm Spark HBase Impala Open MPI D.S Cloud Infrastructure (VMware, Open Stack, etc.) YARN Apps Legacy Apps Non-YARN Big Data Apps … …

10 Similarity – Target to share resources across applications – Provide Global Resource Management YARN vs. Cloud – YARN managing resource in OS layer vs. Cloud managing resources in Hypervisor (Not comparable, but Hypervisor is more powerful than OS ) – Apps managed by YARN need specific AppMaster, Apps managed by Cloud is exactly the same as running on physical machines (Cloud ) – YARN tracking application-specific metrics/progress, Cloud only track underlayer resources (YARN ) YARN vs. Cloud

11 Why YARN + Cloud? – Leverage virtualization in strong isolation, fine-grained resource sharing and other benefits – Uniform infrastructure to simplify IT in enterprise What it looks like? – Running YARN NM inside of VMs managed by Cloud Infrastructure – Build communication channel between YARN RM and Cloud Resource Manager for coordination How we do? – First thing above is very easy and smoothly – Second things to achieve in two ways YARN can aware/manipulate Cloud resource change YARN provide a generic resource notification mechanism so Cloud Manager can use when resource changing YARN + Cloud

12 VM’s resource boundary can be elastic – CPU is easy – time slicing (with constraints) – Memory is harder – page sharing and memory ballooning – In case of contention, enforce limits and proportional sharing – “Stealing” resources behind apps could cause bad performance (paging) – App aware resource management could address these issues Hadoop YARN Resource Model – Dynamic with adding/removing nodes – But static for per node In this case, shall we enable resource elasticity on VM? – If yes, low performance when resource contention happens. – If no, low utilization as physical boxes because free resources cannot be leveraged by other busy VMs We need better answer. Elastic YARN Node in the Cloud

13 HVE provide the answer! Hadoop Virtualization Extensions – A project to enhance Hadoop running on virtualization Goal: Make Hadoop Cloud-Ready – Provide Virtualization-awareness to Hadoop, i.e. virtual topology, virtual resources, etc. – Deliver generic utility that can be leveraged by virtualized platform Independent of virtualization platform and cloud infrastructure 100% contribution to Apache Hadoop Community

14 HVE Philosophy – make infrastructure related components abstract – deliver different implementations that can be configured properly E.g. BlockPlacementPolicy (Abstract) BlockPlacementPolicy Default BlockPlacementPolicy For Virtualization

15 Virtualization Host Elastic YARN Node in the Cloud Virtual YARN Node Other Workload VMDK Datanode NodeManager Container Add/Remove Resources? Grow/Shrink by tens of GB in memory? Grow/Shrink resource of a VM

16 Implementation – YARN-291 (umbrella) YARN-311 – Core scheduler changes YARN-313 CLI YARN-312 – AdminProtocol changes REST API, JMX, etc. Node Manager SchedulerNode Cloud Resource Manager Resource Manager Resource Tracker Service Scheduler RMContext RMNode Heartbeat Admin CLI AdminService Cluster Resource UpdateNodeResource() yarn rmadmin -updateNodeResource

17 Reference YARN MapReduce 2.0 – https://issues.apache.org/jira/browse/MAPREDUCE- 279 HVE topology extension – https://issues.apache.org/jira/browse/HADOOP-8468 HVE topology extension for YARN – https://issues.apache.org/jira/browse/YARN-18 HVE elastic resource configuration – https://issues.apache.org/jira/browse/YARN-291 Gang Scheduling – https://issues.apache.org/jira/browse/YARN-624 Long-lived services in YARN – https://issues.apache.org/jira/browse/YARN-896

18 Thanks! Junping Du jdu@vmware.com


Download ppt "Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013."

Similar presentations


Ads by Google