CS433 Spring 2001 Introduction Laxmikant Kale. 2 Course objectives and outline You will learn about: –Parallel programming models Emphasis on 3: message.

Slides:



Advertisements
Similar presentations
COE 502 / CSE 661 Parallel and Vector Architectures Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Advertisements

Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
Introduction CS 524 – High-Performance Computing.
Tuesday, September 04, 2006 I hear and I forget, I see and I remember, I do and I understand. -Chinese Proverb.
ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.
Chapter 1 An Overview of Personal Computers
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Why Parallel Architecture? Todd C. Mowry CS 495 January 15, 2002.
Lecture 1: Introduction to High Performance Computing.
1 Chapter 4 The Central Processing Unit and Memory.
Chapter 1 Sections 1.1 – 1.3 Dr. Iyad F. Jafar Introduction.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Computer performance.
Computer System Architectures Computer System Software
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Computing Laxmikant Kale
Multi-core architectures. Single-core computer Single-core CPU chip.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Multi-Core Architectures
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
CS 6461: Computer Architecture Fall 2013 History and Trends Instructor: Morris Lancaster.
Led the WWII research group that broke the code for the Enigma machine proposed a simple abstract universal machine model for defining computability devised.
ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
2015/10/14Part-I1 Introduction to Parallel Processing.
CS 320 Spring 2003 Introduction Laxmikant Kale
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
3/15/2002CSE Final Remarks Concluding Remarks SOAP.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
Computer Organization and Design Computer Abstractions and Technology
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
1 Today About the class Introductions Any new people? Start of first module: Parallel Computing.
ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.
Computer Organization & Assembly Language © by DR. M. Amer.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
CS 258 Parallel Computer Architecture Lecture 1 Introduction to Parallel Architecture January 23, 2002 Prof John D. Kubiatowicz.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Background Computer System Architectures Computer System Software.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Introduction to Parallel Processing
Introduction.
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
CMSC 611: Advanced Computer Architecture
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Constructing a system with multiple computers or processors
Architecture & Organization 1
CS775: Computer Architecture
What is Parallel and Distributed computing?
Architecture & Organization 1
Lecture 1: Parallel Architecture Intro
Performance of computer systems
Course Description: Parallel Computer Architecture
CS 258 Parallel Computer Architecture
Constructing a system with multiple computers or processors
Computer Evolution and Performance
COMS 361 Computer Organization
Performance of computer systems
Presentation transcript:

CS433 Spring 2001 Introduction Laxmikant Kale

2 Course objectives and outline You will learn about: –Parallel programming models Emphasis on 3: message passing, shared memory, and shared objects Ongoing evaluation and comparison of models –Parallel application classes –Parallel architectures Message passing support, routing, interconnection networks Cache-coherent scalable shared memory, synchronization Relaxed consistency models Novel architectures: Tera, Blue Gene, Processors-in-memory –Commonly needed parallel algorithms/operations –Performance analysis of parallel applications –Parallel application case studies

3 Project and homeworks Significant (effort and grade percentage) course project –groups of 5 students Homeworks/machine problems: –weekly (sometimes biweekly) Parallel machines: –NCSA Origin 2000, PC/SUN clusters

4 Resources Much of the course will be run via the web –Lecture slides, assignments, will be available on the course web page –Most of the reading material (papers, manuals) will be on the web –Projects will coordinate and submit information on the web Web pages for individual pages will be linked to the course web page –Newsgroup: uiuc.class.cs433 You are expected to read the newsgroup and web pages regularly

5 Advent of parallel computing “Parallel computing is necessary to increase speeds” –cry of the ‘70s –processors kept pace with Moore’s law: Doubling speeds every 18 months Now, finally, the time is ripe –uniprocessors are commodities (and proc. speeds shows signs of slowing down) –Highly economical to build parallel machines

6 Why parallel computing It is the only way to increase speed beyond uniprocessors –Except, of course, waiting for uniprocessors to become faster! –Several applications require orders of magnitude higher performance than feasible on uniprocessors Cost effectiveness: –older argument –in 1985, a supercomputer cost 2000 times more than a desktop, yet performed only 400 times faster. –So: combine microcomputers to get speed at lower costs –Incremental scalability: can get inbetween performance points with 20, 50, 100,… processors –But: You may get speedup lower than 400 on 2000 processors! Microcomputers became faster, killing supercomputers, effectively

7 Technology Trends The natural building block for multiprocessors is now also about the fastest!

8 General Technology Trends Microprocessor performance increases 50% - 100% per year Transistor count doubles every 3 years DRAM size quadruples every 3 years Huge investment per generation is carried by huge commodity market Not that single-processor performance is plateauing, but that parallelism is a natural way to improve it IntegerFP Sun MIPS M/120 IBM RS MIPS M2000 HP DEC alpha

9 Technology: A Closer Look Basic advance is decreasing feature size (  ) –Circuits become either faster or lower in power Die size is growing too –Clock rate improves roughly proportional to improvement in –Number of transistors improves like   (or faster) Performance > 100x per decade; clock rate 10x, rest transistor count How to use more transistors? –Parallelism in processing multiple operations per cycle reduces CPI –Locality in data access avoids latency and reduces CPI also improves processor utilization –Both need resources, so tradeoff Fundamental issue is resource distribution, as in uniprocessors Proc$ Interconnect

10 Clock Frequency Growth Rate 30% per year

11 Transistor Count Growth Rate 100 million transistors on chip by early 2000’s A.D. Transistor count grows much faster than clock rate - 40% per year, order of magnitude more contribution in 2 decades

12 Similar Story for Storage Divergence between memory capacity and speed –Capacity increased by 1000x from , speed only 2x –Gigabit DRAM by c. 2000, but gap with processor speed greater Larger memories are slower, while processors get faster –Need to transfer more data in parallel –Need deeper cache hierarchies –How to organize caches? Parallelism increases effective size of each level of hierarchy, without increasing access time Parallelism and locality within memory systems too –New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface –Buffer caches most recently accessed data Disks too: Parallel disks plus caching

13 Architectural Trends Architecture translates technology’s gifts to performance and capability Resolves the tradeoff between parallelism and locality –Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect –Tradeoffs may change with scale and technology advances Understanding microprocessor architectural trends –Helps build intuition about design issues or parallel machines –Shows fundamental role of parallelism even in “sequential” computers Four generations of architectural history: –Vaccum tube, transistor, IC, VLSI –Here focus only on VLSI generation Greatest delineation in VLSI has been in type of parallelism exploited

14 Architectural Trends Greatest trend in VLSI generation is increase in parallelism –Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit slows after 32 bit adoption of 64-bit now under way, 128-bit far (not performance issue) great inflection point when 32-bit micro and cache fit on a chip –Mid 80s to mid 90s: instruction level parallelism pipelining and simple instruction sets, + compiler advances (RISC) on-chip caches and functional units => superscalar execution greater sophistication: out of order execution, speculation, prediction –to deal with control transfer and latency problems

15 Economics Commodity microprocessors not only fast but CHEAP Development cost is tens of millions of dollars (5-100 typical) BUT, many more are sold compared to supercomputers –Crucial to take advantage of the investment, and use the commodity building block –Exotic parallel architectures no more than special-purpose Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors Standardization by Intel makes small, bus-based SMPs commodity Desktop: few smaller processors versus one larger one? –Multiprocessor on a chip

16 What to Expect? Parallel Machine classes: –Cost and usage defines a class! Architecture of a class may change. –Desktops, Engineering workstations, database/web servers, suprtcomputers, Commodity (home/office) desktop: –less than $10,000 –possible to provide processors for that price! –Driver applications: games, video /signal processing, possibly “peripheral” AI: speech recognition, natural language understanding (?), smart spaces and agents New applications?

17 Engineeering workstations Price: less than $100,000 (used to be): –new proce level acceptable may be $50,000 –100+ processors, large memory, –Driver applications: CAD (Computer aided design) of various sorts VLSI Structural and mechanical simulations… Etc. (many specialized applications)

18 Commercial Servers Price range: variable ($10,000 - several hundreds of thousands) –defining characteristic: usage –Database servers, decision support (MIS), web servers, e-commerce High availability, fault tolerance are main criteria Trends to watch out for: –Likely emergence of specialized architectures/systems E.g. Oracle’s “No Native OS” approach Currently dominated by database servers, and TPC benchmarks –TPC: transactions per second –But this may change to data mining and application servers, with corresponding impact on architecure.

19 Supercomputers “Definition”: expensive system?! –Used to be defined by architecture (vector processors,..) –More than a million US dollars? –Thousands of processors Driving applications –Grand challenges in science and engineering: –Global weather modeling and forecast –Rational Drug design / molecular simulations –Processing of genetic (genome) information –Rocket simulation –Airplane design (wings and fluid flow..) –Operations research?? Not recognized yet –Other non-traditional applications?

20 Consider Scientific Supercomputing Proving ground and driver for innovative architecture and techniques –Market smaller relative to commercial as MPs become mainstream –Dominated by vector machines starting in 70s –Microprocessors have made huge gains in floating-point performance high clock rates pipelined floating point units (e.g., multiply-add every cycle) instruction-level parallelism effective use of caches (e.g., automatic blocking) –Plus economics Large-scale multiprocessors replace vector supercomputers –Well under way already

21 Scientific Computing Demand

22 Engineering Computing Demand Large parallel machines a mainstay in many industries –Petroleum (reservoir analysis) –Automotive (crash simulation, drag analysis, combustion efficiency), –Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism), –Computer-aided design –Pharmaceuticals (molecular modeling) –Visualization in all of the above entertainment (films like Toy Story) architecture (walk-throughs and rendering) –Financial modeling (yield and derivative analysis) –etc.

23 Applications: Speech and Image Processing Also CAD, Databases, processors gets you 10 years, 1000 gets you 20 !

24 Learning Curve for Parallel Applications AMBER molecular dynamics simulation program Starting point was vector code for Cray MFLOP on Cray90, 406 for final version on 128-processor Paragon, 891 on 128-processor Cray T3D

25 Raw Uniprocessor Performance: LINPACK

Fastest Computers Number of systems u u u u n n n n s s s s 11/9311/9411/9511/ n PVP u MPP s SMP