INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

Slides:



Advertisements
Similar presentations
ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
COE 502 / CSE 661 Parallel and Vector Architectures Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Parallel Programming Languages Andrew Rau-Chaplin.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
ECE669 L2: Architectural Perspective February 3, 2004 ECE 669 Parallel Computer Architecture Lecture 2 Architectural Perspective.
1: Operating Systems Overview
EECC756 - Shaaban #1 lec # 2 Spring Parallel Architectures History Application Software System Software SIMD Message Passing Shared Memory.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Chapter 17 Parallel Processing.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.
Why Parallel Architecture? Todd C. Mowry CS 495 January 15, 2002.
Parallel Processing Architectures Laxmi Narayan Bhuyan
EECC756 - Shaaban #1 lec # 2 Spring Parallel Architectures History Application Software System Software SIMD Message Passing Shared Memory.
EECC756 - Shaaban #1 lec # 1 Spring Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate.
Computer architecture II
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
0 Introduction to parallel processing Chapter 1 from Culler & Singh Winter 2007.
Computer performance.
Computer System Architectures Computer System Software
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Graduate Computer Architecture I Lecture 10: Shared Memory Multiprocessors Young Cho.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Computing Laxmikant Kale
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Today About the class Introductions Any new people? Start of first module: Parallel Computing.
ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
CS433 Spring 2001 Introduction Laxmikant Kale. 2 Course objectives and outline You will learn about: –Parallel programming models Emphasis on 3: message.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
CS 258 Parallel Computer Architecture Lecture 1 Introduction to Parallel Architecture January 23, 2002 Prof John D. Kubiatowicz.
Evolution and Convergence of Parallel Architectures Todd C. Mowry CS 495 January 17, 2002.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Memory COMPUTER ARCHITECTURE
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Parallel Programming Models Dr. Xiao Qin Auburn.
Architecture & Organization 1
CS775: Computer Architecture
What is Parallel and Distributed computing?
Architecture & Organization 1
Shared Memory Multiprocessors
Lecture 1: Parallel Architecture Intro
Convergence of Parallel Architectures
Course Description: Parallel Computer Architecture
Parallel Processing Architectures
CS 258 Parallel Computer Architecture
Computer Evolution and Performance
Chapter 4 Multiprocessors
Presentation transcript:

INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay -Power Can not make light go any faster KISS rule (Keep It Simple, Stupid)

INEL6067 Major theme Look at typical applications Understand physical limitations Make tradeoffs ARCHITECTURE Application requirements Technological constraints

INEL6067 Unfortunately °Requirements and constraints are often at odds with each other! °Architecture ---> making tradeoffs Full connectivity ! Gasp!!!

INEL6067 Putting it all together °The systems approach Lesson from RISCs Hardware software tradeoffs Functionality implemented at the right level -Hardware -Runtime system -Compiler -Language, Programmer -Algorithm

INEL6067 Commercial Computing °Relies on parallelism for high end Computational power determines scale of business that can be handled °Databases, online-transaction processing, decision support, data mining, data warehousing...

INEL6067 Scientific Computing Demand

INEL6067 Also CAD, Databases, processors gets you 10 years, 1000 gets you 20 ! Applications: Speech and Image Processing

INEL6067 Is better parallel arch enough? °AMBER molecular dynamics simulation program °Starting point was vector code for Cray-1 °145 MFLOP on Cray90, 406 for final version on 128- processor Paragon, 891 on 128-processor Cray T3D

INEL6067 Summary of Application Trends °Transition to parallel computing has occurred for scientific and engineering computing °In rapid progress in commercial computing Database and transactions as well as financial Usually smaller-scale, but large-scale systems also used °Desktop also uses multithreaded programs, which are a lot like parallel programs °Demand for improving throughput on sequential workloads Greatest use of small-scale multiprocessors °Solid application demand exists and will increase

INEL6067 Technology Trends °Today the natural building-block is also fastest!

INEL6067 Proc$ Interconnect Technology: A Closer Look °Basic advance is decreasing feature size (  ) Circuits become either faster or lower in power °Die size is growing too Clock rate improves roughly proportional to improvement in Number of transistors improves like   (or faster) °Performance > 100x per decade clock rate < 10x, rest is transistor count °How to use more transistors? Parallelism in processing -multiple operations per cycle reduces CPI Locality in data access -avoids latency and reduces CPI -also improves processor utilization Both need resources, so tradeoff °Fundamental issue is resource distribution, as in uniprocessors

INEL % per year Growth Rates 40% per year

INEL6067 Architectural Trends °Architecture translates technology’s gifts into performance and capability °Resolves the tradeoff between parallelism and locality Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect Tradeoffs may change with scale and technology advances °Understanding microprocessor architectural trends => Helps build intuition about design issues or parallel machines => Shows fundamental role of parallelism even in “sequential” computers

INEL6067 Phases in “VLSI” Generation

INEL6067 Architectural Trends °Greatest trend in VLSI generation is increase in parallelism Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit -slows after 32 bit -adoption of 64-bit now under way, 128-bit far (not performance issue) -great inflection point when 32-bit micro and cache fit on a chip Mid 80s to mid 90s: instruction level parallelism -pipelining and simple instruction sets, + compiler advances (RISC) -on-chip caches and functional units => superscalar execution -greater sophistication: out of order execution, speculation, prediction –to deal with control transfer and latency problems Next step: thread level parallelism

INEL6067 How far will ILP go? °Infinite resources and fetch bandwidth, perfect branch prediction and renaming –real caches and non-zero miss latencies

INEL6067 Threads Level Parallelism “on board” °Micro on a chip makes it natural to connect many to shared memory –dominates server and enterprise market, moving down to desktop °Faster processors began to saturate bus, then bus technology advanced –today, range of sizes for bus-based systems, desktop to large servers Proc MEM

INEL6067 What about Multiprocessor Trends?

INEL6067 What about Storage Trends? °Divergence between memory capacity and speed even more pronounced Capacity increased by 1000x from , speed only 2x Gigabit DRAM by c. 2000, but gap with processor speed much greater °Larger memories are slower, while processors get faster Need to transfer more data in parallel Need deeper cache hierarchies How to organize caches? °Parallelism increases effective size of each level of hierarchy, without increasing access time °Parallelism and locality within memory systems too New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface Buffer caches most recently accessed data °Disks too: Parallel disks plus caching

INEL6067 Economics °Commodity microprocessors not only fast but CHEAP Development costs tens of millions of dollars BUT, many more are sold compared to supercomputers Crucial to take advantage of the investment, and use the commodity building block °Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors °Standardization makes small, bus-based SMPs commodity °Desktop: few smaller processors versus one larger one? °Multiprocessor on a chip?

INEL6067 Consider Scientific Supercomputing °Proving ground and driver for innovative architecture and techniques Market smaller relative to commercial as MPs become mainstream Dominated by vector machines starting in 70s Microprocessors have made huge gains in floating-point performance -high clock rates -pipelined floating point units (e.g., multiply-add every cycle) -instruction-level parallelism -effective use of caches (e.g., automatic blocking) Plus economics °Large-scale multiprocessors replace vector supercomputers

INEL6067 Raw Parallel Performance: LINPACK °Even vector Crays became parallel X-MP (2-4) Y-MP (8), C-90 (16), T94 (32) °Since 1993, Cray produces MPPs too (T3D, T3E)

INEL6067 Where is Parallel Arch Going? Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Architecture Uncertainty of direction paralyzed parallel software development! Old view: Divergent architectures, no predictable pattern of growth.

INEL6067 Modern Layered Framework CAD MultiprogrammingShared address Message passing Data parallel DatabaseScientific modeling Parallel applications Programming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication hardware Physical communication medium Hardware/software boundary

INEL6067 Summary: Why Parallel Architecture? °Increasingly attractive Economics, technology, architecture, application demand °Increasingly central and mainstream °Parallelism exploited at many levels Instruction-level parallelism Multiprocessor servers Large-scale multiprocessors (“MPPs”) °Focus of this class: multiprocessor level of parallelism °Same story from memory system perspective Increase bandwidth, reduce average latency with many local memories °Spectrum of parallel architectures make sense Different cost, performance and scalability

INEL6067 Threads Level Parallelism “on board” °Micro on a chip makes it natural to connect many to shared memory –dominates server and enterprise market, moving down to desktop °Faster processors began to saturate bus, then bus technology advanced –today, range of sizes for bus-based systems, desktop to large servers Proc MEM

INEL6067 What about Multiprocessor Trends?

INEL6067 What about Storage Trends? °Divergence between memory capacity and speed even more pronounced Capacity increased by 1000x from , speed only 2x Gigabit DRAM by c. 2000, but gap with processor speed much greater °Larger memories are slower, while processors get faster Need to transfer more data in parallel Need deeper cache hierarchies How to organize caches? °Parallelism increases effective size of each level of hierarchy, without increasing access time °Parallelism and locality within memory systems too New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface Buffer caches most recently accessed data °Disks too: Parallel disks plus caching

INEL6067 Economics °Commodity microprocessors not only fast but CHEAP Development costs tens of millions of dollars BUT, many more are sold compared to supercomputers Crucial to take advantage of the investment, and use the commodity building block °Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors °Standardization makes small, bus-based SMPs commodity °Desktop: few smaller processors versus one larger one? °Multiprocessor on a chip?

INEL6067 Raw Parallel Performance: LINPACK °Even vector Crays became parallel X-MP (2-4) Y-MP (8), C-90 (16), T94 (32) °Since 1993, Cray produces MPPs too (T3D, T3E)

INEL6067 Where is Parallel Arch Going? Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Architecture Uncertainty of direction paralyzed parallel software development! Old view: Divergent architectures, no predictable pattern of growth.

INEL6067 Modern Layered Framework CAD MultiprogrammingShared address Message passing Data parallel DatabaseScientific modeling Parallel applications Programming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication hardware Physical communication medium Hardware/software boundary

INEL6067 Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Architecture History °Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth. Mid 80s revival

INEL6067 Programming Model °Look at major programming models Where did they come from? What do they provide? How have they converged? °Extract general structure and fundamental issues °Reexamine traditional camps from new perspective SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Architecture

INEL6067 Programming Model °Conceptualization of the machine that programmer uses in coding applications How parts cooperate and coordinate their activities Specifies communication and synchronization operations °Multiprogramming no communication or synch. at program level °Shared address space like bulletin board °Message passing like letters or phone calls, explicit point to point °Data parallel: more regimented, global actions on data Implemented with shared address space or message passing

INEL6067 Adding Processing Capacity °Memory capacity increased by adding modules °I/O by controllers and devices ° Add processors for processing! For higher-throughput multiprogramming, or parallel programs

INEL6067 Historical Development °“Mainframe” approach Motivated by multiprogramming Extends crossbar used for Mem and I/O Processor cost-limited => crossbar Bandwidth scales with p High incremental cost -use multistage instead °“Minicomputer” approach Almost all microprocessor systems have bus Motivated by multiprogramming, TP Used heavily for parallel computing Called symmetric multiprocessor (SMP) Latency larger than for uniprocessor Bus is bandwidth bottleneck -caching is key: coherence problem Low incremental cost

INEL6067 Shared Physical Memory °Any processor can directly reference any memory location °Any I/O controller - any memory °Operating system can run on any processor, or all. OS uses shared memory to coordinate °Communication occurs implicitly as result of loads and stores ° What about application processes?

INEL6067 Shared Virtual Address Space °Process = address space plus thread of control °Virtual-to-physical mapping can be established so that processes shared portions of address space. User-kernel or multiple processes °Multiple threads of control on one address space. Popular approach to structuring OS’s Now standard application capability ° Writes to shared address visible to other threads Natural extension of uniprocessors model conventional memory operations for communication special atomic operations for synchronization - also load/stores

INEL6067 Structured Shared Address Space °Add hoc parallelism used in system code °Most parallel applications have structured SAS °Same program on each processor shared variable X means the same thing to each thread