Download presentation
Presentation is loading. Please wait.
Published byShanna Price Modified over 8 years ago
1
Datacenter As a Computer Mosharaf Chowdhury EECS 582 – W1613/7/16
2
Announcements Midterm grades are out There were many interesting approaches. Thanks! Meeting on April 11 moved earlier to April 8 (Friday) No reviews for the papers for April 8 Meeting on April 13 moved later to April 15 (Friday) 3/7/16EECS 582 – W162
3
Mid-Semester Presentations Strictly followed 20 minutes per group 15 + 5 minutes Q/A Four parts: motivation, approach overview, current status, and end goal March 21 1.Juncheng and Youngmoon 2.Andrew 3.Dong-hyeon and Ofir 4.Hanyun, Ning, and Xianghan March 23 1.Clyde, Nathan, and Seth 2.Chao-han, Chi-fan, and Yayun 3.Chao and Yikai 4.Kuangyuan, Qi, and Yang 3/7/16EECS 582 – W163
4
Why is One Machine Not Enough? Too much data Too little storage capacity Not enough I/O bandwidth Not enough computing capability 3/7/16EECS 582 – W164
5
Warehouse-Scale Computers Single organization Homogeneity (to some extent) Cost efficiency at scale Multiplexing across applications and services Rent it out! Many concerns Infrastructure Networking Storage Software Power/Energy Failure/Recovery … 3/7/16EECS 582 – W165
6
Architectural Overview 3/7/16EECS 582 – W166 Memory Bus Ethernet SATA PCIe Server ToR Aggregation
7
Datacenter Networks Traditional hierarchical topology Expensive Difficult to scale High oversubscription Smaller path diversity … 3/7/16EECS 582 – W167 Core Agg. Edge
8
Datacenter Networks CLOS topology Cheaper Easier to scale NO/low oversubscription Higher path diversity … 3/7/16EECS 582 – W168 Core Agg. Edge
9
Storage Hierarchy 3/7/16EECS 582 – W169 (https://www.youtube.com/watch?v=IWsjbqbkqh8) L1 cache L2 cache L3 cache RAM 3D Xpoint SSD HDD Across machines, racks, and pods
10
Power, Energy, Modeling, Building,… Many challenges We’ll focus primarily on software infrastructure in this class 3/7/16EECS 582 – W1610
11
Datacenter Needs an Operating System Datacenter is a collection of CPU cores Memory modules SSDs and HDDs All connected by an interconnect A computer is a collection of CPU cores Memory modules SSDs and HDDs All connected by an interconnect 3/7/16EECS 582 – W1611
12
Some Differences 1.High-level of parallelism 2.Diversity of workload 3.Resource heterogeneity 4.Failure is the norm 5.Communication dictates performance 3/7/16EECS 582 – W1612
13
Three Categories of Software 1.Platform-level Software firmware that are present in every machine 2.Cluster-level Distributed systems to enable everything 3.Application-level User-facing applications built on top 3/7/16EECS 582 – W1613
14
Common Techniques TechniquePerformanceAvailability Replication Erasure coding Sharding/partitioning Load balancing Health checks Integrity checks Compression Eventual consistency Centralized controller Canaries Redundant execution 3/7/16EECS 582 – W1614
15
Common Techniques TechniquePerformanceAvailability ReplicationXX Erasure codingX Sharding/partitioningXX Load balancingX Health checksX Integrity checksX CompressionX Eventual consistencyXX Centralized controllerX CanariesX Redundant executionX 3/7/16EECS 582 – W1615
16
Datacenter Programming Models Fault-tolerance, scalable, and easy access to all the distributed datacenter resources Users submit jobs to these models w/o having to worry about low-level details MapReduce Grandfather of big data as we know today Two-stage, disk-based, network-avoiding Spark Common substrate for diverse programming requirements Many-stage, memory-first 3/7/16EECS 582 – W1616
17
Datacenter “Operating Systems” Fair and efficient distribution of resources among many competing programming models and jobs Does the dirty work so that users won’t have to Mesos Started with a simple question – how to run different versions of Hadoop? Fairness-first allocator Borg Google’s cluster manager Utilization-first allocator 3/7/16EECS 582 – W1617
18
Resource Allocation and Scheduling How do we divide the resources anyway? DRF Multi-resource max-min fairness Two-level; implemented in Mesos and YARN HUG: DRF + High utilization Omega Shared-state resource allocator Many schedulers interact through transactions 3/7/16EECS 582 – W1618
19
File Systems Fault-tolerant, efficient access to data GFS Data resides with compute resources Compute goes to data; hence, data locality The game changer: centralization isn’t too bad! FDS Data resides separately from compute Data comes to compute; hence, requires very fast network 3/7/16EECS 582 – W1619
20
Memory Management What to store in cache and what to evict? PACMan Disk locality is irrelevant for fast-enough network All-or-nothing property: caching is useless unless all tasks’ inputs are cached Best eviction algorithm for single machine isn’t so good for parallel computing Parameter Server Shared-memory architecture (sort of) Data and compute are still collocated, but communication is automatically batched to minimize overheads 3/7/16EECS 582 – W1620
21
Network Scheduling Communication cannot be avoided; how do we minimize its impact? DCTCP Application-agnostic; point-to-point Outperforms TCP through ECN-enabled multi-level congestion notifications Varys Application-aware; multipoint-to-multipoint; all-or-nothing in communication Concurrent open-shop scheduling with coupled resources Centralized network bandwidth management 3/7/16EECS 582 – W1621
22
Unavailability and Failure In a 10000-server DC, with 10000-day MTBF machines, one machine will fail everyday on average Build fault-tolerant software infrastructure and hide failure- handling complexity from application-level software as much as possible Configuration is one of the largest sources of service disruption Storage subsystems are the biggest sources of machine crashes Tolerating/surviving from failures is different from hiding failures 3/7/16EECS 582 – W1622
23
What’s the most critical resource in a datacenter? Why? 3/7/16EECS 582 – W1623
24
Will we come back to client-centric models? As opposed to server-centric/datacenter-driven model today If yes, why and when? If not, why not? 3/7/16EECS 582 – W1624
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.