Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/30/2006eleg652-010-06F1 Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet.

Similar presentations


Presentation on theme: "8/30/2006eleg652-010-06F1 Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet."— Presentation transcript:

1 8/30/2006eleg652-010-06F1 Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do

2 8/30/2006eleg652-010-06F2 Contact Information Instructor Name: Joseph B. Manzano Office: 137 Evans Hall Phone: N/A Email: jmanzano@capsl.udel.edujmanzano@capsl.udel.edu Teaching Assistants Name: Juergen Ributzka Office: 326 Dupont Hall Phone: (302) 831 0327 Email: ributzka@capsl.udel.eduributzka@capsl.udel.edu Course Webpage: http://www.capsl.udel.edu/courses/eleg652/2006/ http://www.capsl.udel.edu/courses/eleg652/2006/ Name: Eunjung Park Office: 326 Dupont Hall Phone: (302) 831 0327 Email: epark@capsl.udel.eduepark@capsl.udel.edu

3 8/30/2006eleg652-010-06F3 Important Course Information Final Quiz Final Project Due date Grade Distribution Four Homeworks, a comprehensive final examination and a class project assigned by the instructor with a mentor Activities Wednesday December 6th, 2006 Friday, December 8th, 2006

4 8/30/2006eleg652-010-06F4 Reference Material Reference Books John Henessy and David Patterson Computer Architecture: A Quantitative Approach Third Edition Morgan Kaufmann Publishers, Inc. 2003 D. E. Culler, J.P. Singh, and A. Gupta Parallel Computer Architecture Morgan Kaufmann Publishers, Inc. 1999 1 2

5 8/30/2006eleg652-010-06F5 Supporting Materials Selected publications from Journals IEEE Transaction on Parallel and Distributed Systems IEEE Computer IEEE Transactions in Computers Conference Proceedings PACT MICRO ISCA HPCA PLDI Parallel Architectures and Compilation Techniques ACM/IEEE Symposium on Micro-Architectures International Symposium on Computer Architectures ACM/IEEE Symposium High Performance Computer Architecture International Symposium on Parallel Language Design and Implementation

6 8/30/2006eleg652-010-06F6 Course Contents Provides an overview of technologies that are applicable in almost all aspects of computers and, soon to be, part of consumer electronics in general. Shows the principles in which parallel machines are built and how these concepts have infiltrated other parts of the computer and entertainment industry. Provides an understanding about how these concepts affects both hardware and software on its target machine and their different implementations.

7 8/30/2006eleg652-010-06F7 Expectations about this Course You should learn: A basic idea about the lingo that is used in today's supercomputer/parallel machine market Vector Processing and its place in consumer electronics Different forms of parallelism and their current implementations Shared memory models Parallel Programming Models and Synchronization Multi threaded Architectures

8 8/30/2006eleg652-010-06F8 Why Study Parallel Architectures? Concepts that soon should become ubiquitous Productively write software that takes advantages of new features of upcoming or existing hardware Understand how current technologies have evolved and how they can be improved

9 8/30/2006eleg652-010-06F9 Course Overview Terminology and General Knowledge Vector Processing and its Legacy Instruction Level Parallelism: a brief overview Multicore and Cellular architectures Parallel (shared) memory models and synchronization primitives Advance Topics such as Dataflow and Transactional Memory

10 8/30/2006eleg652-010-06F10 Course Introduction The Role of a Computer Architect Maximize Productivity and Performance Productivity = Programmability and a reduction in development time Performance = “Reasonable” Throughput given technology and cost limitations Parallelism Two or more tasks may execute at the same time Alternative to higher frequency clocks Applies to all levels of computer design Importance has been constantly raising since several “walls” were hit In the near future, it will be become the paradigm on all aspects of computing

11 8/30/2006eleg652-010-06F11 The Transition Most consumer electronics will have some form of parallel architecture inside of them by next year (2007) Reasons for the Change An evolutionary change in computing due to: Technology Applications Architecture Economics Decrease in feature size Allowing more components into a chip Effectively organizing components to maximize uses of resources and minimizing damaging size effects Find Cost Effective ways to get the desired performance out of the given Hardware / Software combo More and more performance and power hungry applications

12 8/30/2006eleg652-010-06F12 Applications Requirements Demand for more cycles = More sophisticated Hardware Wide Range of Performance Demands Audio Processing = Real time response with an allowed threshold of error Business Loads = A given quanta of time with no error allowed Application and parallel computer: Obtain a speed up in application runtime Productive Parallel Systems Current Systems work on parallel concepts and designs (i.e. Desktop systems are Multithreaded) Parallel Computing and computers are becoming ubiquitous as we speak

13 8/30/2006eleg652-010-06F13 Technology: An Overview Decrease in Feature Size (Lambda) Clock rates ~ proportional toin Lambda Number of Transistors>= Lambda square Performance: An increase of roughly 1000x in the last decade The fastest supercomputer in June 1996 (Tokyo's SR2201) was 220 GFLOPS The fastest supercomputer now is 280 TFLOPS (IBM's eServer Blue Gene Solution) and an increase of roughly 200x in the same decade with respect to clock frequency Intel Pentium Pro at 150 ~ 200 Mhz in 1996 Intel Pentium D at 3.2 Ghz in 2006 Extra components: Parallelism V.S. Data locality: Fighting for Real State

14 8/30/2006eleg652-010-06F14 Intel: An Example of Clock Frequency Growth Growth has been steady until now!!!!

15 8/30/2006eleg652-010-06F15 Pentium M Thermal Maps from the Pentium M obtained from simulated power density (left) and IREM measurement (right). Heat levels goes from black (lowest), red, orange, yellow and white (highest) Figures courtesy of Dani Genossar and Nachum Shamir in their paper Intel ® Pentium ® M Processor Power Estimation, Bugdeting, Optimization and Validation published in the Intel Technical Journal, May 21, 2003

16 8/30/2006eleg652-010-06F16 Storage and Transistor Count Growth Expected to reach one billion during this decade (2000) Grow faster than clock rate: 40 % per year Storage Transistor Count Gap between storage and speed more pronounced Larger memories = slower = Larger memory hierarchies (i.e. Caches, write / read buffers, etc) Parallelism and Locality inside memory systems: Multi port memory, parallel caches, RAIDs, parallel disks with caching, etc

17 8/30/2006eleg652-010-06F17 Moore's Law The complexity for minimum component costs has increased at a rate of roughly a factor of two per year... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer. Gordon Moore's original statement. "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965 In Layman terms: The number of components on integrated circuits will roughly double every 18 months. With that, the complexity (effort) and the headcount should increase proportionally

18 8/30/2006eleg652-010-06F18 Architectural Trends Designed for performance Higher Frequency == Higher Performance ? Memory V.S. Processor Architectural Trends: Hide Latencies at all cost!!! Overlap Computation with Memory accesses [DMA] Multithreaded execution and sharing of resources [SMT and HT technologies, MTA] Give more chip real state to speculative execution [Branch prediction and prefetching] Bring more used-data closer to the processor [memory hierarchies] Power Problem? Go Multicore!!!! Takes N time to finish a M size problem using T amount of power xx Takes N/2 + 2X time to finish a M size problem using T/2 amount of power per unit

19 8/30/2006eleg652-010-06F19 Technology Progress Overview Processor speeds = much faster (around 1000x) Memory (RAM) speeds are increasing too but at a slower rate (around 10x) But Memory (RAM) dimensions have grown even faster than processor's speed (around 1,000,000x) Computation is almost free but bandwidth is very expensive

20 8/30/2006eleg652-010-06F20 The Pentium Chip

21 8/30/2006eleg652-010-06F21 Intel Pentium 4 Nine Years and Millions of Dollars Later

22 8/30/2006eleg652-010-06F22 Next Gen The Cell Chip Layout Many of them, simpler and cheaper!!!

23 8/30/2006eleg652-010-06F23 The Dawn of Parallelism Parallel architectures are becoming more attractive Milestone: the introduction of Pentium D (2005) and Centrino Duo (2006) Future Projects: IBM PERCS project, Cray Eldorado, Sun Hero, IBM Cell project, etc... All the factors listed contributed to this “epiphany” in computing technology. Parallelism can be exploited at many levels in many ways

24 8/30/2006eleg652-010-06F24 The World's Fastest Japan Dominance Numerical Wind Tunnel CP PACS 192 GFLOPS 368 GFLOPS

25 8/30/2006eleg652-010-06F25 The World Fastest USA Takes the Lead ASCI Red ASCI White SP Power3 375 Mhz 1.3 TFLOPS 7.3 TFLOPS

26 8/30/2006eleg652-010-06F26 The World Fastest Japan Second Wind EARTH Simulator 35 TFLOPS

27 8/30/2006eleg652-010-06F27 The World's Fastest and again... BlueGene L Beta BlueGene L eServer Solution 70 TFLOPS 280 TFLOPS

28 8/30/2006eleg652-010-06F28 The World's Fastest BlueGeneL eServer Solution 65536 Dual Processors arrange in a 32 x 32 x 64 3D torus network. Global Tree structure for fast reduction and broadcast operations over all nodes A I/O node per 64 nodes –Inside a 64 group: Tree structure connections between I/O node and computation nodes with an aggregate bandwidth of 2.1 GB/s –Across 64 groups: Torus like connections Total Memory: 32 TeriBytes Total Power Consumption: 1.5 MegaWatts

29 8/30/2006eleg652-010-06F29 The World's Fastest BlueGeneL eServer Solution

30 8/30/2006eleg652-010-06F30 The Next Step So what is next? Multicore, System on a chip, PIM, etc –Simpler, colder, cheaper... Intel Pentium D and Centrino Duo AMD Opteron The DARPA HPCS Project IBM, Cray and SUN Multicore chips: CELL, Cyclops, BlueGene, Alternatives: Clearspeed [Programmable Co-Processors], etc...


Download ppt "8/30/2006eleg652-010-06F1 Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet."

Similar presentations


Ads by Google