Download presentation
Presentation is loading. Please wait.
1
Advanced Computer Architecture Introduction
A.R. Hurson 128 EECH Building, Missouri S&T
2
Advanced Computer Architecture
CpE6110 Advanced Computer Architecture Instructor: A. R. Hurson Office: EECH Office Hours: By appointment Class notes and reading materials are available at: Reference: Computer Architecture a Quantitative Approach, Hennessy and Patterson Outline: 1. Performance and Cost Analysis 2. Instruction set Analysis 3. Scheduling/Load balancing 4. Memory hierarchies (revisit) 5. Advanced Caching a) Internet caching b) Cooperative caching 6. Concurrency — Classifications a) Vector processors b) SIMD array architectures c) MIMD architectures 7. Systolic design 8. VLIW/Super-scalar/Super-pipeline Data-flow processing Multithreaded architecture Transactional Memory Multicore architecture Administrative: Homework Assignments 10% Quizzes 20% Project (term Paper) 20% Midterm Exam 20% Final Exam 30%
3
Advanced Computer Architecture
Race to the top 2011 Rank Site Computer/Year Vendor Country Cores Rmax (Pflops) Rpeak (Pflops) Power (MW) 1 RIKEN Advanced Institute for Computational Science K computer, SPARC64 VIIIfx 2.0GHz,/ 2011 Fujitsu Japan 548,352 8.162 8.774 9.899 2 National Supercomputing Center in Tianjin Tianhe-1A - NUDT / 2010 NUDT China 186,368 2.566 4.701 4.040 3 DOE/SC/Oak Ridge National Laboratory Jaguar - Cray XT5-2.6 GHz / 2009 Cray Inc. USA 224,162 1.759 2.331 6.951 4 National Supercomputing Centre in Shenzhen Nebulae - Dawning TC3600 Blade/ 2010 Dawning 120,640 1.271 2.984 2.580 5 GSIC Center, Tokyo Institute of Technology TSUBAME 2.0/2010 NEC/HP 73,278 1.192 2.288 1.399 6 DOE/NNSA/LANL/SNL Cielo - Cray XE6 8-core 2.4 GHz /2011Cray Inc. 142,272 1.110 1.365 3.980 7 NASA/Ames Research Center/NAS Pleiades Ghz,/ 2011 SGI 111,104 1.088 1.315 4.102 8 DOE/SC/LBNL/NERSC Hopper - Cray XE6 12-core 2.1 GHz / 2010 Cray Inc. 153,408 1.054 1.289 2.910 9 Commissariat a l'Energie Atomique (CEA) Tera Bull bullx super-node S6010/S6030 / 2010 Bull SA France 138,368 1.050 1.255 4.590 10 DOE/NNSA/LANL Roadrunner Ghz /2009 IBM 122,400 1.042 1.376 2.346
4
Advanced Computer Architecture
#1 #500 Sum Race to the top 2011
5
Advanced Computer Architecture
Race to the top 2012
6
Advanced Computer Architecture
Race to the top: If the projection holds, we would expect an Exaflops system around 2019. In 2012 Top500 list was dominated by IBM Blue Gene/Q with four systems in top 10. The largest in Lawrence Livermore National Laboratory with more than 16 Petaflops sustained performance. It replaced K computer from Japan (the first 10 Petaflops machine).
7
Advanced Computer Architecture
Tianhe-2 (MilkyWay-2) is the No. 1 system since 2013. It is used for simulation, analysis, and government security applications. It is a collection of 16,000 computer nodes with a total of 3,120,000 cores. Each of the 16,000 nodes possess 88 gigabytes of memory. The total CPU plus coprocessor memory is 1,375 TiB (approximately 1.34 PiB).
8
Advanced Computer Architecture
Four major challenges have been recognized for an Exaflops machine: Energy and power, Memory and storage, Concurrency and locality, Resiliency
9
Advanced Computer Architecture
It is estimated that an Exaflops machine would consume about 20 Magawatt of power which is equivalent to 50 Gflops/W.
10
Advanced Computer Architecture
Computer Architecture refers to the attributes of a system visible to a programmer — i.e., attributes that have a direct impact on the logical execution of a program. Architectural Attributes include the instruction set, the number of bits used to represent various data types, I/O mechanisms, and techniques for addressing memory.
11
Advanced Computer Architecture
Computer Organization refers to the operational units and their interconnections that realize the architectural specifications. Organizational Attributes include those hardware details transparent to the programmer, such as control signals, interfaces between the computer and peripherals, and the memory technology used.
12
Advanced Computer Architecture
Computer Hardware refers to the hardware detail design — logic design, and the implementation (packaging, power, cooling, ...).
13
Advanced Computer Architecture
For a good design, architecture (instruction set design), organization, and hardware as well as software (compiler and operating system) issues must be considered.
14
Advanced Computer Architecture
Common Performance Metrics Execution time, Bandwidth, Throughput, User CPU Time, MIPS MFLOPS Speed up Efficiency
15
Advanced Computer Architecture
Scaleup ─ Handling large task by increasing the degree of parallelism. It is the ability to process larger tasks in the same amount of time by providing more resources. Power Consumption ─ Becomes an important performance metric when we use mobile wireless access devices. Network Connectivity ─ Becomes of interest when connectivity is through wireless medium. Data reliability and integrity ─ Becomes factor of interest at the presence of mobility and wireless communication.
16
Advanced Computer Architecture
Summary Computer architecture/Computer organization Role of computer architect Some performance Metrics How to improve performance? Hardware technology, Innovative architectural features, and Efficient resource management.
17
Advanced Computer Architecture
Performance improvement Advances in Technology Architectural Advances Better Resource Management Program behavior
18
Advanced Computer Architecture
Questions Network Latency Memory Latency How to manage resources efficiently?
19
Advanced Computer Architecture
Grosch’s Law In 1940s Grosch studied the relationship between the power (computational speed) (P) and cost (C) of a computer. He postulated that: P = k * Cs where k and s are positive constants. He also argued that s is close to 2.
20
Advanced Computer Architecture
Grosch’s Law According to this law, in order to sell a computer for twice as much, it must be four times as fast. With the advances in technology, it is easy to see that Grosch’s law is no longer valid.
21
Advanced Computer Architecture
Performance Measures Amdahl's law — The performance improvement gained by improving some portion of an architecture is limited by the fraction of the time the improved portion is used — a small number of sequential operations can effectively limit the speed up of a parallel algorithm.
22
Advanced Computer Architecture
Performance Measures Amdahl's law allows a quick way to calculate the speed up based on two factors: The fraction of the computation time in the original task that is affected by the enhancement, and The improvement gained by the enhanced execution mode (speed up of the enhanced portion).
23
Advanced Computer Architecture
Performance Measures — Amdahl's law
24
Advanced Computer Architecture
Gustafson-Barsis’s Law Parallel architectures comprised of hundreds of processors can be built with substantial improvement in performance. They argued that in practice, the problem size scales up with the number of processors (n).
25
Advanced Computer Architecture
Gustafson-Barsis’s Law If s and p are the serial and parallel times spend on a parallel system then: s + p * n represents the execution time. They introduced a new factor, scaled speed up factor (SS(n)): SS(n) = (s + p * n) / (s + p)
26
Advanced Computer Architecture
Gustafson-Barsis’s Law Speed up should be measured by scaling the problem to the number of processors, not by fixing the problem size.
27
Advanced Computer Architecture
Performance Measures Maximum Concurrency — For any computer there is a maximum number of bits or bit pairs — maximum concurrency (Cm) — that can be processed concurrently whether it is under single-instruction or multiple- instruction control.
28
Advanced Computer Architecture
Performance Measures Average Concurrency — The maximum concurrency is an indication of the computer processing capability. The actual utilization of this capability is indicated by the average concurrency defined as: where Ci is the concurrency at Dti.
29
Advanced Computer Architecture
Performance Measures Average Concurrency — If ti is set to one, then the average concurrency over a period of T time units is:
30
Advanced Computer Architecture
Performance Measures Hardware Utilization — The average hardware utilization is defined as: where i is the hardware utilization at time i.
31
Advanced Computer Architecture
Performance Measures Cm is determined by the hardware design, Ca or is highly dependent on the software and applications. A general-purpose computer should achieve a high for as many applications as possible. A special-purpose computer would yield a high for at least the intended applications. In either case, maximizing the value of for a computer design is important.
32
Advanced Computer Architecture
Performance Measures Parallel Systems — For a parallel processor the average parallelism is defined as: for T time units.
33
Advanced Computer Architecture
Performance Measures Parallel Systems — Similarly the average hardware utilization is defined as: where ri is the hardware utilization for the parallel processor at time i.
34
Advanced Computer Architecture
Performance Measures Parallel Systems If Pa is the effective parallelism over a period of T, and , Pi, and i are the corresponding effective values, then the effective hardware utilization is: ~
35
Advanced Computer Architecture
Performance Measures A successful parallel processor design should yield: A high , as well as the required throughput for, at least, the intended application (s). This involves not only a proper hardware and software design, but also the development of efficient parallel algorithms for these applications.
36
Advanced Computer Architecture
Summary Amdahl's law More performance metrics CPU Time Concurrency Hardware Utilization
37
Advanced Computer Architecture
Performance Measures Pipeline Systems Latency (L) is defined as the number of time units separating two successive initiations of events. Naturally, the lower the latency the higher the performance. Latency could be any integer value including zero.
38
Advanced Computer Architecture
Performance Measures Pipeline Systems The average latency is defined as the average number of time unit between two initiations. The initiation rate (I) is the average number of the initiations per clock unit:
39
Advanced Computer Architecture
Performance Measures Pipeline Systems For stage Si, stage utilization (USi) indicates on the average how often Si has been used: USi = I * ni where ni represents the number of time Si is used in one initiation.
40
Advanced Computer Architecture
Performance Measures Pipeline Systems For a linear pipe, if i denotes the execution time of stage Si then: = MAX 1 (d ) i U S
41
Advanced Computer Architecture
Performance Measures Means to evaluate a system Application programs — Workload. Real Programs — A collection of programs that are run often by the user. Kernels — Small, key pieces from real programs. Benchmarks — A set of familiar, small, and well behaved programs known to the user. Synthetic benchmarks — An artificial set of small programs that are intended to match the average frequency of operations and operands of a large set of programs.
42
Advanced Computer Architecture
Is one number enough? As per our discussion, so far, performance was the major design constraint. However, the power is becoming a problem. Power consumption became an issue with the growth of wireless technology and mobile devices. However, it is becoming of concern since feeding several Magawatt of power to run a supercomputer is not a trivial task and requires a great amount of supporting infrastructure
43
Advanced Computer Architecture
Is one number enough? It is estimated that each Magawatt of power consumption increases the electricity cost about 1 million $$$ each year. In addition to the cost, environmental impact become an issue as well. Now data centers have a significant share in the global CO2 emission.
44
Advanced Computer Architecture
Is one number enough?
45
Advanced Computer Architecture
Is one number enough?
46
Advanced Computer Architecture
Is one number enough? Alternatively, the so called Top500 Green list based on Flops/W efficiency was developed.
47
Advanced Computer Architecture
Is one number enough? 45
48
Advanced Computer Architecture
Is one number enough? On the Green500 list, as per June 2012, the top 21 spots are held by IBM Blue Gene/Q systems with an efficiency of over 2.1 GFlops/Watt (a huge gap with the top non Blue Gene/Q system).
49
Advanced Computer Architecture
As noted before, an Exaflops machine would consume about 20 Magawatt of power which is equivalent to 50 Gflops/W. Relative to Blue Gene/Q then the power efficiency needs to be improved by a factor of 25.
50
Advanced Computer Architecture
Power and Energy Power is important factor for a data center, since it needs to be fed. Energy is of more concern from an application point of view.
51
Advanced Computer Architecture
Power and Energy It makes no sense to run an energy efficient application in a data center where most of the power is consumed for cooling. Also, an inefficient software application can waste a great amount of energy, even on low- power platform. One needs to develop a holistic approach to reduce power consumption (i.e., Green Computing/Green IT/…)
52
Advanced Computer Architecture
Power: Amount of electricity (energy) produced/consumed at a specific moment in time. Energy: Power consumption over time.
53
Advanced Computer Architecture
Power Usage Effectiveness PTOT = Total power of facility PIT = Power consumed by IT equipment PTOT = PIT + Pinfrastructure
54
Advanced Computer Architecture
Data Center infrastructure Efficiency Typically 1.7 PUE 2 University of Illinois’ National Petascale Computing Facility is aiming at a PUE of 1.2.
55
Advanced Computer Architecture
To summarize: at the application level the common metric is time-to- solution with no indication of power consumption. How about Energy-to-solution as a potential metric? Apparently, neither metric sounds reasonable, the 1st one is not Green and the 2nd one is not high performance.
56
Advanced Computer Architecture
How about: Flops/W (basis for Green 500 list)? Energy-delay-product (EDP) Energy proportional computing Power consumed by a component scales with its utilization Energy consumed by an application * runtime
57
Advanced Computer Architecture
Power distribution in a high performance machine
58
Advanced Computer Architecture
About 50% of power is consumed by processor, 20% by memory, 20% by interconnect and storage subsystem, and 10% by the other units (i.e., fan).
59
Advanced Computer Architecture
Power consumption at CPU level
60
Advanced Computer Architecture
Power consumption at CPU level So when frequency is reduced the voltage is reduced as well: Dynamic Voltage and Frequency Scaling Power Gating
61
Advanced Computer Architecture
Memory Different power States (lowering operational voltage ) i.e., instead of traditional 1.5v to 1.35v
62
Advanced Computer Architecture
Memory
63
Advanced Computer Architecture
Network and Storage
64
Advanced Computer Architecture
Network and Storage
65
Advanced Computer Architecture
Network Energy-performance trade off Congestion A measure of performance Dilation (# of hops) A measure of energy consumption
66
Advanced Computer Architecture
Storage Effect of Remote Direct Memory Access on power consumption in RDMA-enabled network vs. traditional communication (TCP/IP). Replace traditional mechanical disks with solid state drives. This brings us the issue of capacity and cost.
67
Advanced Computer Architecture
Source Chris Johnson, University of Utah, IPDPS2012
68
Advanced Computer Architecture
1 Bit = Binary Digit 8 Bits = 1 Byte 1000 Bytes = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte We can store 3/4 of 1 Exabyte of data using all the trees on the entire planet. Sources: and Mac Air Disk Gb Company Servers Supercomputers The World
69
Advanced Computer Architecture
295 Feb. 2011 all disk storage all digital info new digital info/yr all human documents in 40,000 Yrs Exabytes (10 18) all spoken words in all lives Say “So let us look at one of the consequences of this data and process explosion”. Every two days we create as much data as we did from the beginning of mankind until 2003! amount human minds can store in 1yr Sources: Lesk, Berkeley SIMS, Landauer, EMC, TechCrunch, Smart Planet
70
Advanced Computer Architecture
How many trees does it take to print out an Exabyte? 1 Exabyte = 1000 Petabytes = could hold approximately 500,000,000,000,000 pages of standard printed text It takes one tree to produce 94,200 pages of a book Thus it will take 530,785,562,327 trees to store an Exabyte of data In 2005, there were 400,246,300,201 trees on Earth We can store .75 Exabytes of data using all the trees on the entire planet. Sources: and
71
Brain Information Bandwidth
Source Chris Johnson, University of Utah, IPDPS2012
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.