The University of Adelaide, School of Computer Science

Slides:

Advertisements

Similar presentations

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Computer.

Advertisements

Energy in Cloud Computing and Renewable Energy

Cloud Computing Data Centers Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of Computing, UNF.

The University of Adelaide, School of Computer Science

Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Computer.

1 Chapter 01 Authors: John Hennessy & David Patterson.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

1 Copyright © 2011, Elsevier Inc. All rights Reserved. Chapter 6 Authors: John Hennessy & David Patterson.

1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future.

Lecture 13 Warehouse Computing Prof. Xiaoyao Liang 2015/6/10

IT Infrastructure Chap 1: Definition

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.

1 Lecture: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

Background Computer System Architectures Computer System Software.

1 Lecture 24: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

Data Centers and Cloud Computing 1. 2 Data Centers 3.

1 现代计算机体系结构主讲教师：张钢教授天津大学计算机学院通信邮箱：提交作业邮箱： 2015 年.

CS203 – Advanced Computer Architecture Warehouse Scale Computing.

Successfully Implementing The Information System Systems Analysis and Design Kendall and Kendall Fifth Edition.

Warehouse Scaled Computers

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Chapter 2 Memory and process management

Seth Pugsley, Jeffrey Jestes,

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

Lecture 20: WSC, Datacenters

IOT Critical Impact on DC Design

Green cloud computing 2 Cs 595 Lecture 15.

Large-scale file systems and Map-Reduce

Principles of Information Technology

Trends: Technology Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu compute capacity: 18 mos Then and Now Bandwidth 1985:

Morgan Kaufmann Publishers

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 7 Physical Infrastructure of WSC Prof. Zhang Gang

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 13 Using Energy Efficiently Inside the Server Prof. Zhang.

Copyright © 2011, Elsevier Inc. All rights Reserved.

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 4 Storage Prof. Zhang Gang School of.

Software Architecture in Practice

Programmable Logic Controllers (PLCs) An Overview.

Instructors: Yuanqing Cheng

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Cloud Computing Data Centers

System And Application Software

湖南大学-信息科学与工程学院-计算机与科学系

Lecture 18 Warehouse Scale Computing

The University of Adelaide, School of Computer Science

Chapter 2: System Structures

Where Does the Power go in DCs & How to get it Back

Distributed and Cloud Computing K. Hwang, G. Fox and J

Internet and Web Simple client-server model

Cloud Computing Data Centers

Lecture 18 Warehouse Scale Computing

Lecture 18 Warehouse Scale Computing

Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.

The University of Adelaide, School of Computer Science

MapReduce: Simplified Data Processing on Large Clusters

Chapter 21 Successfully Implementing The Information System

Presentation transcript:

The University of Adelaide, School of Computer Science Computer Architecture A Quantitative Approach, Fifth Edition The University of Adelaide, School of Computer Science 14 May 2018 Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Introduction Introduction Warehouse-scale computer (WSC) Provides Internet services Search, social networking, online maps, video sharing, online shopping, email, cloud computing, etc. Differences with HPC “clusters”: Clusters have higher performance processors and network Clusters emphasize thread-level parallelism, WSCs emphasize request-level parallelism Differences with datacenters: Datacenters consolidate different machines and software into one location Datacenters emphasize virtual machines and hardware heterogeneity in order to serve varied customers Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Introduction Introduction Important design factors for WSC: Cost-performance Small savings add up, huge systems Energy efficiency Affects power distribution and cooling Work per joule Dependability via redundancy Network I/O Interactive and batch processing workloads Ample computational parallelism is not important Most jobs are totally independent “Request-level parallelism” Operational costs count Power consumption is a primary, not secondary, constraint when designing system Scale and its opportunities and problems Can afford to build customized systems since WSC require volume purchase Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Introduction Introduction Characteristics different from server architecture: Ample parallelism Server applications need parallelism to utilize the parallel architecture Billions of Web requests in data parallel fashion for WSC Interactive Internet services, also known as software as a service (SaaS) with millions of independent users, rarely has read-write dependence, also refer as request-level parallelism Operational costs count Server is designed for peak performance, only worry about power not to exceed the cooling capacity, often ignore the operation cost WSC has longer life term: Energy, Power, Cooling represent 30% cost Scale and its opportunities and problems Extreme server is very expensive, partly due to custom design and only a few built WSC includes thousand simple servers, hence can cut down the cost, however, with difficult availability, dependability issues Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Prgrm’g Models and Workloads The University of Adelaide, School of Computer Science 14 May 2018 Prgrm’g Models and Workloads Batch processing framework: MapReduce Map: applies a programmer-supplied function to each logical input record Runs on thousands of computers Provides new set of key-value pairs as intermediate values Reduce: collapses values using another programmer-supplied function See Fig 6.2 (next slide) for MapReduce usage at Google Programming Models and Workloads for WSCs Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Prgrm’g Models and Workloads The University of Adelaide, School of Computer Science 14 May 2018 Prgrm’g Models and Workloads Example: (calculate #occurrence of every word) map (String key, String value): // key: document name // value: document contents for each word w in value (occurance=1 in a document) EmitIntermediate(w,”1”); // Produce list of all words reduce (String key, Iterator values): // key: a word // value: a list of counts int result = 0; for each v in values: result += ParseInt(v); // get integer from key-value pair Emit(AsString(result)); Programming Models and Workloads for WSCs Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Prgrm’g Models and Workloads The University of Adelaide, School of Computer Science 14 May 2018 Prgrm’g Models and Workloads MapReduce runtime environment schedules map and reduce task to WSC nodes Can be thought as a generalization of SIMD Map can be partitioned and replicated Reduce is a reduction function from Map outputs Availability: Use replicas of data across different servers Use relaxed consistency: No need for all replicas to always agree Programming Models and Workloads for WSCs Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 WSC Utilization WSC hardware and software must cope with variability in load Use data replication to overcome failure Internal file systems to supply data Use relaxed consistency, allow inconsistency, different from database Workload demands Google search varies by a factor of 2 depending on the time of a day Unlike HPC which tries to maximum utilization Programming Models and Workloads for WSCs Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.3 Average CPU utilization of more than 5000 servers during a 6-month period at Google. Servers are rarely completely idle or fully utilized, instead operating most of the time at between 10% and 50% of their maximum utilization. The column the third from the right in Figure 6.4 calculates percentages plus or minus 5% to come up with the weightings; thus, 1.2% for the 90% row means that 1.2% of servers were between 85% and 95% utilized. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Each rack has 48 1U server Figure 6.5 Hierarchy of switches in a WSC with array of Racks. (Based on Figure 1.2 of Barroso and Hölzle [2009].) Copyright © 2011, Elsevier Inc. All rights Reserved.

Computer Architecture of WSC The University of Adelaide, School of Computer Science 14 May 2018 Computer Architecture of WSC WSC often use a hierarchy of networks for interconnection (cost reduction) Each 19” rack holds 48 1U servers connected to a rack switch Rack switches are uplinked to switch higher in hierarchy Uplink has 48 / n times lower bandwidth, where n = # of uplink ports (to the array switch) “Oversubscription” Goal is to maximize locality of communication relative to the rack (i.e. minimize inter-rack communication) Computer Ar4chitecture of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Storage Storage options: Use disks inside the servers, or Network attached storage through Infiniband WSCs generally rely on local disks Google File System (GFS) uses local disks and maintains at least three relicas Computer Ar4chitecture of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Array Switch Switch that connects an array of racks, designed for 50,000 servers Array switch should have 10 X the bisection bandwidth of rack switch Cost of n-port switch grows as n2 Often utilize content addressable memory chips and FPGAs Computer Ar4chitecture of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 WSC Memory Hierarchy Servers can access DRAM and disks on other servers using a NUMA-style interface Computer Ar4chitecture of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Copyright © 2011, Elsevier Inc. All rights Reserved. DRAM Bandwidth DRAM Latency Figure 6.7 Graph of latency, bandwidth, and capacity of the memory hierarchy of a WSC for data in Figure 6.6 [Barroso and Hölzle 2009]. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.8 The Layer 3 network used to link arrays together and to the Internet [Greenberg et al. 2009]. Some WSCs use a separate border router to connect the Internet to the datacenter Layer 3 switches. (To interconnect 50,000 servers) (See examples about memory latency and data transfer in Page 445-446) Copyright © 2011, Elsevier Inc. All rights Reserved.

Infrastructure and Costs of WSC The University of Adelaide, School of Computer Science 14 May 2018 Infrastructure and Costs of WSC Location of WSC Proximity to Internet backbones, electricity cost, property tax rates, low risk from earthquakes, floods, and hurricanes Power distribution (5 steps, 4 voltage changes) Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Infrastructure and Costs of WSC The University of Adelaide, School of Computer Science 14 May 2018 Infrastructure and Costs of WSC Cooling Air conditioning used to cool server room 64 F – 71 F Keep temperature higher (closer to 71 F) Cooling towers can also be used Minimum temperature is “wet bulb temperature” Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Infrastructure and Costs of WSC The University of Adelaide, School of Computer Science 14 May 2018 Infrastructure and Costs of WSC Cooling system also uses water (evaporation and spills) E.g. 70,000 to 200,000 gallons per day for an 8 MW facility Power cost breakdown: Chillers: 30-50% of the power used by the IT equipment Air conditioning: 10-20% of the IT power, mostly due to fans How many servers can a WSC support? Each server: “Nameplate power rating” gives maximum power consumption To get actual, measure power under actual workloads Oversubscribe cumulative server power by 40%, but monitor power closely Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Measuring Efficiency of a WSC The University of Adelaide, School of Computer Science 14 May 2018 Measuring Efficiency of a WSC Power Utilization Effectiveness (PEU) = Total facility power / IT equipment power Median PUE on 2006 study was 1.69 Performance Latency is important metric because it is seen by users Bing study: users will use search less as response time increases Service Level Objectives (SLOs)/Service Level Agreements (SLAs) E.g. 99% of requests be below 100 ms Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.11 Power utilization efficiency of 19 datacenters in 2006 [Greenberg et al. 2006]. The power for air conditioning (AC) and other uses (such as power distribution) is normalized to the power for the IT equipment in calculating the PUE. Thus, power for IT equipment must be 1.0 and AC varies from about 0.30 to 1.40 times the power of the IT equipment. Power for “other” varies from about 0.05 to 0.60 of the IT equipment. Copyright © 2011, Elsevier Inc. All rights Reserved.

Power in IT Equipment (2007) The University of Adelaide, School of Computer Science 14 May 2018 Power in IT Equipment (2007) 33% of power for processors 30% for DRAM 10% for disk 5% for networking 22% for others (inside the server) Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 14 May 2018 Cost of a WSC (Fig. 6.13) Capital expenditures (CAPEX) Cost to build a WSC Operational expenditures (OPEX) Cost to operate a WSC Physcical Infrastrcuture and Costs of WSC Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.18 The best SPECpower results as of July 2010 versus the ideal energy proportional behavior. The system was the HP ProLiant SL2x170z G6, which uses a cluster of four dual-socket Intel Xeon L5640s with each socket having six cores running at 2.27 GHz. The system had 64 GB of DRAM and a tiny 60 GB SSD for secondary storage. (The fact that main memory is larger than disk capacity suggests that this system was tailored to this benchmark.) The software used was IBM Java Virtual Machine version 9 and Windows Server 2008, Enterprise Edition. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.19 Google customizes a standard 1AAA container: 40 x 8 x 9.5 feet (12.2 x 2.4 x 2.9 meters). The servers are stacked up to 20 high in racks that form two long rows of 29 racks each, with one row on each side of the container. The cool aisle goes down the middle of the container, with the hot air return being on the outside. The hanging rack structure makes it easier to repair the cooling system without removing the servers. To allow people inside the container to repair components, it contains safety systems for fire detection and mist-based suppression, emergency egress and lighting, and emergency power shut-off. Containers also have many sensors: temperature, airflow pressure, air leak detection, and motion-sensing lighting. A video tour of the datacenter can be found at http://www.google.com/corporate/green/datacenters/summit.html. Microsoft, Yahoo!, and many others are now building modular datacenters based upon these ideas but they have stopped using ISO standard containers since the size is inconvenient. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.20 Airflow within the container shown in Figure 6.19. This cross-section diagram shows two racks on each side of the container. Cold air blows into the aisle in the middle of the container and is then sucked into the servers. Warm air returns at the edges of the container. This design isolates cold and warm airflows. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.21 Server for Google WSC. The power supply is on the left and the two disks are on the top. The two fans below the left disk cover the two sockets of the AMD Barcelona microprocessor, each with two cores, running at 2.2 GHz. The eight DIMMs in the lower right each hold 1 GB, giving a total of 8 GB. There is no extra sheet metal, as the servers are plugged into the battery and a separate plenum is in the rack for each server to help control the airflow. In part because of the height of the batteries, 20 servers fit in a rack. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure 6.22 Power usage effectiveness (PUE) of 10 Google WSCs over time. Google A is the WSC described in this section. It is the highest line in Q3 ‘07 and Q2 ’10. (From www.google.com/corporate/green/datacenters/measuring.htm.) Facebook recently announced a new datacenter that should deliver an impressive PUE of 1.07 (see http://opencompute.org/). The Prineville Oregon Facility has no air conditioning and no chilled water. It relies strictly on outside air, which is brought in one side of the building, filtered, cooled via misters, pumped across the IT equipment, and then sent out the building by exhaust fans. In addition, the servers use a custom power supply that allows the power distribution system to skip one of the voltage conversion steps in Figure 6.9. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Slower server Figure 6.24 Query–response time curve. Copyright © 2011, Elsevier Inc. All rights Reserved.

The University of Adelaide, School of Computer Science 14 May 2018 Cloud Computing Cloud Computing WSCs offer economies of scale that cannot be achieved with a datacenter: 5.7 times reduction in storage costs 7.1 times reduction in administrative costs 7.3 times reduction in networking costs This has given rise to cloud services such as Amazon Web Services “Utility Computing” Based on using open source virtual machine and operating system software Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer