ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
COE 502 / CSE 661 Parallel and Vector Architectures Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Ch1. Fundamentals of Computer Design 3. Principles (5) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
ECE669 L2: Architectural Perspective February 3, 2004 ECE 669 Parallel Computer Architecture Lecture 2 Architectural Perspective.
1: Operating Systems Overview
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Why Parallel Architecture? Todd C. Mowry CS 495 January 15, 2002.
Parallel Processing Architectures Laxmi Narayan Bhuyan
EECC756 - Shaaban #1 lec # 1 Spring Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate.
COMP25212 SYSTEM ARCHITECTURE Antoniu Pop Jan/Feb 2015COMP25212 Lecture 1.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
0 Introduction to parallel processing Chapter 1 from Culler & Singh Winter 2007.
Computer performance.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Computing Laxmikant Kale
Multi-core architectures. Single-core computer Single-core CPU chip.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Multi-Core Architectures
ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.
2015/10/14Part-I1 Introduction to Parallel Processing.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Today About the class Introductions Any new people? Start of first module: Parallel Computing.
ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.
CS433 Spring 2001 Introduction Laxmikant Kale. 2 Course objectives and outline You will learn about: –Parallel programming models Emphasis on 3: message.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
CS 258 Parallel Computer Architecture Lecture 1 Introduction to Parallel Architecture January 23, 2002 Prof John D. Kubiatowicz.
INEL6067 Parallel Architectures Why build parallel machines? °To help build even bigger parallel machines °To help solve important problems °Speed – more.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Background Computer System Architectures Computer System Software.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Introduction to Parallel Processing
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Architecture & Organization 1
CS775: Computer Architecture
Architecture & Organization 1
Lecture 1: Parallel Architecture Intro
Performance of computer systems
Course Description: Parallel Computer Architecture
Parallel Processing Architectures
CS 258 Parallel Computer Architecture
Adaptive Single-Chip Multiprocessing
Computer Evolution and Performance
Chapter 4 Multiprocessors
Performance of computer systems
The University of Adelaide, School of Computer Science
Presentation transcript:

ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of Electrical and Computer Engineering

ECE669 L1: Course Introduction January 29, 2004 Welcome to ECE 669/CA720-A °Parallel computer architectures °Interactive: your questions welcome! °Grading 6 Homework assignments (30%) Mid-term exam (35%) Final exam (35%) °Experiments with network and cache simulators °Culler text – research papers °Acknowledgment Prof Anant Agarwal (MIT) Prof David Culler (California – Berkeley)

ECE669 L1: Course Introduction January 29, 2004 Parallel Architectures Why build parallel machines? °To help build even bigger parallel machines °To help solve important problems °Speed – more trials, less time °Cost °Larger problems °Accuracy Must understand typical problems

ECE669 L1: Course Introduction January 29, 2004 MIT Computer Architecture Group – early 1990’s Alewife MachineJ-Machine NuMesh

ECE669 L1: Course Introduction January 29, 2004 Applications ---> Requirements Processing Communication Memory I/O Synchronization °Architect must provide all of the above Numeric Symbolic Combinatorial

ECE669 L1: Course Introduction January 29, 2004 Requirements ---> Examples Relaxation - near-neighbor communication Multigrid - 1, 2, 4, 8,... communication Numeric comp - Floating point Symbolic - Tags, branches Database - I/O Data parallel - Barrier synchronicity Dictionary - Memory, I/O

ECE669 L1: Course Introduction January 29, 2004 Communication requirements ---> Example °Relaxation °So, let’s build a special machine! ° But... pitfalls!

ECE669 L1: Course Introduction January 29, 2004 Specialized machines: Pitfalls Faster algorithms appear … with different communication requirements Cost effectiveness -Economies of scale Simpler hardware & software mechanisms -More flexible -May even be faster! –e.g. Specialized support for synchronization across multiple processors

ECE669 L1: Course Introduction January 29, 2004 Technology ---> Limitations & Opportunities Wires -Area -Propogation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay -Power Can not make light go any faster Three dimensions max KISS rule

ECE669 L1: Course Introduction January 29, 2004 Major theme Look at typical applications Understand physical limitations Make tradeoffs ARCHITECTURE Application requirements Technological constraints

ECE669 L1: Course Introduction January 29, 2004 Unfortunately °Requirements and constraints are often at odds with each other! °Architecture ---> making tradeoffs Full connectivity ! Gasp!!!

ECE669 L1: Course Introduction January 29, 2004 Putting it all together °The systems approach Lesson from RISCs Hardware software tradeoffs Functionality implemented at the right level -Hardware -Runtime system -Compiler -Language, Programmer -Algorithm

ECE669 L1: Course Introduction January 29, 2004 What will you get out of this course? °In-depth understanding of the design and engineering of modern parallel computers Technology forces Fundamental architectural issues -naming, replication, communication, synchronization Basic design techniques -cache coherence, protocols, networks, pipelining, … Underlying engineering trade-offs Case studies Influence of applications °From moderate to very large scale °Across the hardware/software boundary

ECE669 L1: Course Introduction January 29, 2004 Speedup °Speedup (p processors) = °For a fixed problem size (input data set), performance = 1/time °Speedup fixed problem (p processors) = Performance (p processors) Performance (1 processor) Time (1 processor) Time (p processors)

ECE669 L1: Course Introduction January 29, 2004 Is Parallel Computing Inevitable? °Application demands: Our insatiable need for computing cycles °Technology Trends °Architecture Trends °Economics °Current trends: Today’s microprocessors have multiprocessor support Servers and workstations becoming MP: Sun, SGI, DEC, HP!... Tomorrow’s microprocessors are multiprocessors

ECE669 L1: Course Introduction January 29, 2004 Application Trends °Application demand for performance fuels advances in hardware, which enables new appl’ns, which... Cycle drives exponential increase in microprocessor performance Drives parallel architecture harder - most demanding applications °Range of performance demands Need range of system performance with progressively increasing cost New Applications More Performance

ECE669 L1: Course Introduction January 29, 2004 Commercial Computing °Relies on parallelism for high end Computational power determines scale of business that can be handled °Databases, online-transaction processing, decision support, data mining, data warehousing... °TPC benchmarks Explicit scaling criteria provided Size of enterprise scales with size of system Problem size not fixed as p increases. Throughput is performance measure (transactions per minute or tpm)

ECE669 L1: Course Introduction January 29, 2004 TPC-C Results for March 1996 °Parallelism is pervasive °Small to moderate scale parallelism very important °Difficult to obtain snapshot to compare across vendor platforms

ECE669 L1: Course Introduction January 29, 2004 Scientific Computing Demand

ECE669 L1: Course Introduction January 29, 2004 Also CAD, Databases, processors gets you 10 years, 1000 gets you 20 ! Applications: Speech and Image Processing

ECE669 L1: Course Introduction January 29, 2004 Is better parallel arch enough? °AMBER molecular dynamics simulation program °Starting point was vector code for Cray-1 °145 MFLOP on Cray90, 406 for final version on 128- processor Paragon, 891 on 128-processor Cray T3D

ECE669 L1: Course Introduction January 29, 2004 Summary of Application Trends °Transition to parallel computing has occurred for scientific and engineering computing °In rapid progress in commercial computing Database and transactions as well as financial Usually smaller-scale, but large-scale systems also used °Desktop also uses multithreaded programs, which are a lot like parallel programs °Demand for improving throughput on sequential workloads Greatest use of small-scale multiprocessors °Solid application demand exists and will increase

ECE669 L1: Course Introduction January 29, 2004 Technology Trends °Today the natural building-block is also fastest!

ECE669 L1: Course Introduction January 29, 2004 Proc$ Interconnect Technology: A Closer Look °Basic advance is decreasing feature size (  ) Circuits become either faster or lower in power °Die size is growing too Clock rate improves roughly proportional to improvement in Number of transistors improves like   (or faster) °Performance > 100x per decade clock rate < 10x, rest is transistor count °How to use more transistors? Parallelism in processing -multiple operations per cycle reduces CPI Locality in data access -avoids latency and reduces CPI -also improves processor utilization Both need resources, so tradeoff °Fundamental issue is resource distribution, as in uniprocessors

ECE669 L1: Course Introduction January 29, % per year Growth Rates 40% per year

ECE669 L1: Course Introduction January 29, 2004 Architectural Trends °Architecture translates technology’s gifts into performance and capability °Resolves the tradeoff between parallelism and locality Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect Tradeoffs may change with scale and technology advances °Understanding microprocessor architectural trends => Helps build intuition about design issues or parallel machines => Shows fundamental role of parallelism even in “sequential” computers

ECE669 L1: Course Introduction January 29, 2004 Phases in “VLSI” Generation

ECE669 L1: Course Introduction January 29, 2004 Architectural Trends °Greatest trend in VLSI generation is increase in parallelism Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit -slows after 32 bit -adoption of 64-bit now under way, 128-bit far (not performance issue) -great inflection point when 32-bit micro and cache fit on a chip Mid 80s to mid 90s: instruction level parallelism -pipelining and simple instruction sets, + compiler advances (RISC) -on-chip caches and functional units => superscalar execution -greater sophistication: out of order execution, speculation, prediction –to deal with control transfer and latency problems Next step: thread level parallelism

ECE669 L1: Course Introduction January 29, 2004 How far will ILP go? °Infinite resources and fetch bandwidth, perfect branch prediction and renaming –real caches and non-zero miss latencies

ECE669 L1: Course Introduction January 29, 2004 Threads Level Parallelism “on board” °Micro on a chip makes it natural to connect many to shared memory –dominates server and enterprise market, moving down to desktop °Faster processors began to saturate bus, then bus technology advanced –today, range of sizes for bus-based systems, desktop to large servers Proc MEM

ECE669 L1: Course Introduction January 29, 2004 What about Multiprocessor Trends?

ECE669 L1: Course Introduction January 29, 2004 What about Storage Trends? °Divergence between memory capacity and speed even more pronounced Capacity increased by 1000x from , speed only 2x Gigabit DRAM by c. 2000, but gap with processor speed much greater °Larger memories are slower, while processors get faster Need to transfer more data in parallel Need deeper cache hierarchies How to organize caches? °Parallelism increases effective size of each level of hierarchy, without increasing access time °Parallelism and locality within memory systems too New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface Buffer caches most recently accessed data °Disks too: Parallel disks plus caching

ECE669 L1: Course Introduction January 29, 2004 Economics °Commodity microprocessors not only fast but CHEAP Development costs tens of millions of dollars BUT, many more are sold compared to supercomputers Crucial to take advantage of the investment, and use the commodity building block °Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors °Standardization makes small, bus-based SMPs commodity °Desktop: few smaller processors versus one larger one? °Multiprocessor on a chip?

ECE669 L1: Course Introduction January 29, 2004 Consider Scientific Supercomputing °Proving ground and driver for innovative architecture and techniques Market smaller relative to commercial as MPs become mainstream Dominated by vector machines starting in 70s Microprocessors have made huge gains in floating-point performance -high clock rates -pipelined floating point units (e.g., multiply-add every cycle) -instruction-level parallelism -effective use of caches (e.g., automatic blocking) Plus economics °Large-scale multiprocessors replace vector supercomputers

ECE669 L1: Course Introduction January 29, 2004 Raw Parallel Performance: LINPACK °Even vector Crays became parallel X-MP (2-4) Y-MP (8), C-90 (16), T94 (32) °Since 1993, Cray produces MPPs too (T3D, T3E)

ECE669 L1: Course Introduction January 29, 2004 Where is Parallel Arch Going? Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Architecture Uncertainty of direction paralyzed parallel software development! Old view: Divergent architectures, no predictable pattern of growth.

ECE669 L1: Course Introduction January 29, 2004 Modern Layered Framework CAD MultiprogrammingShared address Message passing Data parallel DatabaseScientific modeling Parallel applications Programming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication hardware Physical communication medium Hardware/software boundary

ECE669 L1: Course Introduction January 29, 2004 Summary: Why Parallel Architecture? °Increasingly attractive Economics, technology, architecture, application demand °Increasingly central and mainstream °Parallelism exploited at many levels Instruction-level parallelism Multiprocessor servers Large-scale multiprocessors (“MPPs”) °Focus of this class: multiprocessor level of parallelism °Same story from memory system perspective Increase bandwidth, reduce average latency with many local memories °Spectrum of parallel architectures make sense Different cost, performance and scalability