Apache Hadoop YARN: Yet Another Resource Manager

Apache Hadoop YARN: Yet Another Resource Manager
V. Vavilapalli, A. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, E. Baldeschwieler Presented by Erik Krogen conceived and started at Yahoo, also developed a lot at Hortonworks

Motivation: Issues in Hadoop 1
People were (ab)using MapReduce in unexpected ways, e.g. using it as a job scheduler Single JobTracker node for the entire cluster - scalability issues Statically allocated map and reduce task slots cause utilization issues Fix these issues while maintaining backwards compatibility running long-running map tasks as an application container one application, Giraph, even did this such that one mapper was “special” and acted as a coordinator JobTracker has to manage fault tolerance, scheduling, locality, task management, etc. etc. for entire cluster fix while of course maintaining all of the scalability, fault tolerance, security, locality awareness, etc. that is expected of Hadoop

Architecture: Hadoop 1

Architecture: Concepts
Application Job submitted to the framework e.g. a MapReduce job, Spark job, Spark Streaming job Container Basic unit of allocation, variable-size resources Analogous to tasks application can be a long-running thing like spark streaming or a batch job containers constrain whatever task is running inside of them to specific resources

Architecture: Hadoop 2 (YARN)

Architecture: ResourceManager
Dedicated node, only one per cluster - single point of failure Centralized scheduler, but has fewer tasks than JobTracker Pluggable scheduling policies Tracks resource usage, node liveness Allocates resources to applications Also requests resources back from applications at time of paper, single point of failure. High availability via ZooKeeper fail-over when RM fails, AMs can continue on with current resources until RM returns Heartbeats from all nodes to update on resource usage and liveness Can preempt resources e.g. because a new higher priority application came along

Architecture: ApplicationMaster
One per application - application-type specific Requests resources from RM - can specify number of containers, resources per container, locality preferences, priority Can subsequently update requests with new requirements Manages all scheduling/execution, fault tolerance, etc. Runs on a worker node - must handle its own failures possible to run on dev machine instead of in-cluster e.g. MapReduce application master, Spark application master AM is on its own in terms of e.g. fault tolerance methodology

Architecture: NodeManager
Daemon on each node, communicates status to RM Monitor resources, report faults, manage containers Physical hardware checks on node Provide services to container, e.g. log aggregation

Architecture: NodeManager
Send Container Launch Context (CLC) to NM to start container Dependencies, env vars, tokens, payloads, command, etc. Containers can share dependencies (e.g. exes, data files) Can configure auxiliary services into NM, e.g. shuffle between map and reduce is an auxiliary shuffle service dependencies are copied locally and kept, periodically GC by NM auxiliary services allow e.g. data file persistence between jobs

Fault Tolerance Left completely up to ApplicationMaster - RM doesn’t help AM must have fault tolerance for its containers as well as itself Some higher-level frameworks on top of YARN exist to make this and other things easier, e.g. Apache REEF (Retainable Evaluator Execution Framework)

Performance Performance generally on-par with Hadoop 1, sometimes slightly worse (overhead of containers, RM to AM communication, etc) Much better utilization large part due to removal of static map and reduce slots Yahoo: “upgrading to YARN was equivalent to adding machines [to this 2500 machine cluster]”

Why YARN? It’s in Hadoop 2 => adoption is automatically widespread
Per-job scheduler means big scalability Per-job scheduler means multiple versions of e.g. MR are possible Less general than Mesos (e.g. request vs offer model) Can be good and bad as opposed to e.g. Mesos Mesos has a per-framework scheduler, still may hit scalability issues went for a more specific, less general approach - especially notable in request vs offer model for resources

Apache Hadoop YARN: Yet Another Resource Manager

Similar presentations

Presentation on theme: "Apache Hadoop YARN: Yet Another Resource Manager"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Apache Hadoop YARN: Yet Another Resource Manager

Similar presentations

Presentation on theme: "Apache Hadoop YARN: Yet Another Resource Manager"— Presentation transcript:

Similar presentations

About project

Feedback