Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-scale cluster management at Google with Borg By: Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Presented.

Similar presentations


Presentation on theme: "Large-scale cluster management at Google with Borg By: Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Presented."— Presentation transcript:

1 Large-scale cluster management at Google with Borg By: Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Presented by: Xianghan Pei EECS 582 – W161

2 Outline Introduction & Motivation The User Perspective Borg Architecture Scalability & Availability Utilization Isolation Lessons Future and Related Work EECS 582 – W162

3 Introduction & Motivation Introduction Borg: An OS of Cluster(Datacenter) Motivation Hide the details(programmer focus on App) Provide resource sharing Provide high reliability and availability for Cluster EECS 582 – W163 The high-level architecture of Borg

4 The User Perspective Workload production(prod) VS non-production(non-prod) Cells Heterogeneous, around 10 k machines in median Jobs and Tasks EECS 582 – W164 > borgcfg.../hello_world_webserver.borg up

5 The User Perspective Allocs Reserved set of resources Priority, Quota, and Admission Control Job has a priority (preempting) Quota is used to decide which jobs to admit for scheduling Naming and Monitoring 50.jfoo.ubar.cc.borg.google.com Monitoring health of the task and thousands of performance metrics EECS 582 – W165

6 Borg Architecture Borgmaster Main Borgmaster process & Scheduler Five replicas Borglet Manage and monitor tasks and resource Borgmaster polls Borglet every few seconds EECS 582 – W166 The high-level architecture of Borg(A Cell)

7 Borg Architecture Scheduling feasibility checking: find machines Scoring: pick one machines User preferences & build-in criteria E-PVM VS best-fit Tradeoff EECS 582 – W167 The high-level architecture of Borg(A Cell)

8 Scalability Separate scheduler Separate threads to poll the Borglets Partition functions across the five replicas Score caching Equivalence classes Relaxed randomization EECS 582 – W168 The high-level architecture of Borg(A Cell)

9 Availability System Borgmaster: Five replicas Borglet: Borgmaster Take checkpoint Job Reschedules evicted tasks Spreading tasks across failure domian … EECS 582 – W169

10 Utilization Cell Sharing Segregating prod and non-prod work into different cells would need more machines EECS 582 – W1610

11 Utilization Cell Size Subdividing cells into smaller ones would require more machines EECS 582 – W1611

12 Utilization Fine-grained resource requests EECS 582 – W1612 No bucket sizes fit most of the tasks wellBucketing resource would need more machines

13 Utilization Resource reclamation estimate how many resources a task will use and reclaim the rest for work Kill non-prod if not available EECS 582 – W1613

14 Utilization Resource reclamation Choose medium EECS 582 – W1614

15 Isolation Security isolation chroot jail as the primary security isolation mechanism Performance isolation High-priority LV task(prod) get best treatment compressible resources: reclaimed non-compressible resources: kill or remove task EECS 582 – W1615

16 Lessons Bad Jobs are restrictive as the only grouping mechanism for tasks One IP address per machine complicates things Optimizing for power users at the expense of casual ones Good Allocs are useful Cluster management is more than task management Introspection is vital The master is the kernel of a distributed system EECS 582 – W1616

17 Future Work Kubernetes Evolved from Borg An open-source system for automating deployment, operations, and scaling of containerized applications Pods: groups of containers Labels Replica controller Services EECS 582 – W1617

18 Relative Work Apache Mesos YARN Tupperware EECS 582 – W1618

19 Questions? EECS 582 – W1619

20 References Borg Paper Borg Slides at InfoQBorg Slides EECS 582 – W1620


Download ppt "Large-scale cluster management at Google with Borg By: Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Presented."

Similar presentations


Ads by Google