Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course Description: Parallel Computer Architecture

Similar presentations


Presentation on theme: "Course Description: Parallel Computer Architecture"— Presentation transcript:

1 Course Description: Parallel Computer Architecture
12/8/2018 \course\eleg652-04F\Topic0a.ppt

2 Reading List Slides: Topic1x Henn&Patt: Chapter 1
CullerSingh98: Chapter 1 Other assigned readings from homework and classes 12/8/2018 \course\eleg652-04F\Topic0a.ppt

3 Why Study Parallel Architecture?
Role of a computer architect: To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost. Parallelism: Provides alternative to faster clock for performance Applies at all levels of system design Is a fascinating perspective from which to view architecture Is increasingly central in information processing 12/8/2018 \course\eleg652-04F\Topic0a.ppt

4 Inevitability of Parallel Computing
Application demands Technology Trends Architecture Trends Economics 12/8/2018 \course\eleg652-04F\Topic0a.ppt

5 Application Trends Demand for cycles fuels advances in hardware, and vice-versa Range of performance demands Goal of applications in using parallel machines: Speedup Productivity requirement 12/8/2018 \course\eleg652-04F\Topic0a.ppt

6 Summary of Application Trends
Transition to parallel computing has occurred for scientific and engineering computing In rapid progress in commercial computing Desktop also uses multithreaded programs, which are a lot like parallel programs Demand for improving throughput on sequential workloads Demand on productivity 12/8/2018 \course\eleg652-04F\Topic0a.ppt

7 Technology: A Closer Look
Basic advance is decreasing feature size ( ) Clock rate improves roughly proportional to improvement in  Number of transistors improves like (or faster) Performance > 100x per decade; clock rate 10x, rest transistor count How to use more transistors? Parallelism in processing Locality in data access Both need resources, so tradeoff Proc $ Interconnect 12/8/2018 \course\eleg652-04F\Topic0a.ppt

8 Clock Frequency Growth Rate
30% per year 12/8/2018 \course\eleg652-04F\Topic0a.ppt

9 Transistor Count Growth Rate
1 billion transistors on chip in early 2000’s A.D. Transistor count grows much faster than clock rate - 40% per year, order of magnitude more contribution in 2 decades 12/8/2018 \course\eleg652-04F\Topic0a.ppt

10 Similar Story for Storage
Divergence between memory capacity and speed more pronounced Larger memories are slower Need deeper cache hierarchies Parallelism and locality within memory systems Disks too: Parallel disks plus caching 12/8/2018 \course\eleg652-04F\Topic0a.ppt

11 Moore’s Law and Headcount
Along with the number of transistors, the effort and headcount required to design a microprocessor has grown exponentially 12/8/2018 \course\eleg652-04F\Topic0a.ppt

12 Architectural Trends Architecture: performance and capability
Tradeoff between parallelism and locality Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect Understanding microprocessor architectural trends Four generations of architectural history: tube, transistor, IC, VLSI 12/8/2018 \course\eleg652-04F\Topic0a.ppt

13 Technology Progress Overview
Processor speed improvement: 2x per year (since 85). 100x in last decade. DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade. DISK capacity: 2x per year (since 97) x in last decade. 12/8/2018 \course\eleg652-04F\Topic0a.ppt

14 Motorola’s PowerPC 604 Pentium 12/8/2018
\course\eleg652-04F\Topic0a.ppt

15 12/8/2018 \course\eleg652-04F\Topic0a.ppt

16 Technology Progress Overview
Processor speed improvement: 2x per year (since 85). 100x in last decade. DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade. DISK capacity: 2x per year (since 97) x in last decade. 12/8/2018 \course\eleg652-04F\Topic0a.ppt

17 Summary: Parallel Architecture?
Increasingly attractive Economics, technology, architecture, application Parallelism exploited at many levels Same story from memory system perspective Wide range of parallel architectures make sense 12/8/2018 \course\eleg652-04F\Topic0a.ppt

18 12/8/2018 \course\eleg652-04F\Topic0a.ppt

19 12/8/2018 \course\eleg652-04F\Topic0a.ppt

20 12/8/2018 \course\eleg652-04F\Topic0a.ppt

21 12/8/2018 \course\eleg652-04F\Topic0a.ppt

22 12/8/2018 \course\eleg652-04F\Topic0a.ppt

23 The Earth Simulator Machine in Japan
Max 40 TFLOPS No.1 in TOP500 list General purpose Parallel vector processors 400 M$(development) 12/8/2018 \course\eleg652-04F\Topic0a.ppt

24 12/8/2018 \course\eleg652-04F\Topic0a.ppt

25 HPC Architecture Vector Processor ⇒ 1976~ Parallel Processors ⇒ 1985~
MPU Cluster、Grid ⇒ ~ massively PP ⇒ ~2010 (CRAY-1) (CM-1) (ASCI-RED) (DARPA-HPCS machines GRAPE-DR BlueGene/L BG/C64 ) 12/8/2018 \course\eleg652-04F\Topic0a.ppt

26 Cluster computer of commodity MPU ⇒ 1997~
ASCI Project   ASCI-Q 20TFLOPS(2003)       8,192 CPUs、 ASCI-Purple 100TFLOPS(2005)   12,544 CPUs OLNL project (2004) Limitation of current cluster Low utilization of CPU due to high-latency in interconnection No automatic parallelization Limitation by size and power ASCI-Purple (12,544 CPUs) 3MW ASCI-Q 20TFLOPS 12/8/2018 \course\eleg652-04F\Topic0a.ppt

27 New generation parallel systems ⇒ 2008~
IBM BlueGene/L Project (360TFLOPS、2005) High density parallel processor   (65,536 CPU chips in 64 racks、                            131,072 processors) IBM BlueGene/C64 Project (1.1 PFlops, 2007 ?) HPCS Project IBM PERCS Cray Cascade SUN Hero project  IBM Blue Gene/L 12/8/2018 \course\eleg652-04F\Topic0a.ppt


Download ppt "Course Description: Parallel Computer Architecture"

Similar presentations


Ads by Google