Outline Classification ILP Architectures Data Parallel Architectures

Slides:



Advertisements
Similar presentations
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Advertisements

SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Parallel computer architecture classification
Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Chapter 17 Parallel Processing.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Parallel Processing Architectures Laxmi Narayan Bhuyan
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Computer Architecture Parallel Processing
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Chapter One Introduction to Pipelined Processors.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
Pipelining and Parallelism Mark Staveley
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
COMP 740: Computer Architecture and Implementation
Parallel Architecture
Parallel computer architecture classification
buses, crossing switch, multistage network.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
CS 147 – Parallel Processing
Morgan Kaufmann Publishers
MIMD Multiple instruction, multiple data
buses, crossing switch, multistage network.
Mattan Erez The University of Texas at Austin
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Chapter 4 Multiprocessors
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
CSL718 : Multiprocessors 13th April, 2006 Introduction
The University of Adelaide, School of Computer Science
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks Flynn’s [66] Feng’s [72] Händler’s [77] Modern (Sima, Fountain & Kacsuk)

Flynn’s Classification Architecture Categories SISD SIMD MISD MIMD

SISD M IS C IS P DS

SIMD M P DS IS C P DS

MISD IS C P M IS DS IS C IS P DS

MIMD IS C P M IS DS IS C IS P DS

Feng’s Classification 16K MPP 256 STARAN PEPE bit slice length IlliacIV 64 16 C.mmP CRAY-1 PDP11 IBM370 1 1 16 32 64 word length

Händler’s Classification < K x K’ , D x D’ , W x W’ > control data word dash  degree of pipelining TI - ASC <1, 4, 64 x 8> CDC 6600 <1, 1 x 10, 60> x <10, 1, 12> (I/O) C.mmP <16,1,16> + <1x16,1,16> + <1,16,16> PEPE <1 x 3, 288, 32> Cray-1 <1, 12 x 8, 64 x (1 ~ 14)>

Modern Classification Parallel architectures Function-parallel architectures Data-parallel architectures

Data Parallel Architectures Vector architectures Associative And neural architectures SIMDs Systolic architectures

Function Parallel Architectures Instr level Parallel Arch Thread level Parallel Arch Process level Parallel Arch (ILPs) (MIMDs) Pipelined processors VLIWs Superscalar processors Distributed Memory MIMD Shared Memory MIMD

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks Pipelining VLIW Superscalar

Pipelining resource sharing across cycles all instructions may not take same cycles IF D RF EX/AG M WB faster throughput with pipelining

Hazards in Pipelining Procedural dependencies => Control hazards conditional and unconditional branches, calls/returns Data dependencies => Data hazards RAW (read after write) WAR (write after read) WAW (write after write) Resource conflicts => Structural hazards use of same resource in different stages

Frequency of interruptions - b Pipeline Performance T S stages Frequency of interruptions - b CPI = 1 + (S - 1) * b Time = CPI * T / S

Single multi-operation instruction multi-operation instruction ILP in VLIW processors Cache/ memory Fetch Unit Single multi-operation instruction FU FU FU Register file multi-operation instruction

ILP in Superscalar processors Decode and issue unit Cache/ memory Fetch Unit Multiple instruction FU FU FU Sequential stream of instructions Instruction/control Register file Data FU Funtional Unit

Why Superscalars are popular ? Binary code compatibility among scalar & superscalar processors of same family Same compiler works for all processors (scalars and superscalars) of same family Assembly programming of VLIWs is tedious Code density in VLIWs is very poor - Instruction encoding schemes

Issues in VLIW Architecture FU FU FU Register file Instruction encoding Scalability: Access time, area, power consumption sharply increase with number of register ports

Tasks of superscalar processing Parallel Superscalar Parallel Preserving the Preserving the decoding instruction instruction sequential sequential issue execution consistency of consistency of execution exception processing

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks SIMD Processors Vector Processors Associative Processors Systolic Arrays

Data Parallel Architectures SIMD Processors Multiple processing elements driven by a single instruction stream Vector Processors Uni-processors with vector instructions Associative Processors SIMD like processors with associative memory Systolic Arrays Application specific VLSI structures

Systolic Arrays [H.T. Kung 1978] Simplicity, Regularity, Concurrency, Communication Example : Band matrix multiplication

T=0 B31 A23 A22 A12 B21 A31 A21 A11 B11 B12

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks MIMD Processors - Shared Memory - Distributed Memory

Why Process level Parallel Architectures? Data-parallel architectures Function-parallel architectures Instruction level PAs Thread level PAs Process level PAs (MIMDs) Built using general purpose processors Distributed Memory MIMD Shared Memory MIMD

MIMD Architectures Design Space Extent of address space sharing Location of memory modules Uniformity of memory access

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks User’s perspective Architect’s perspective

Issues from user’s perspective Specification / Program design explicit parallelism or implicit parallelism + parallelizing compiler Partitioning / mapping to processors Scheduling / mapping to time instants static or dynamic Communication and Synchronization

Parallel programming models Concurrent control flow Functional or logic program Vector/array operations Concurrent tasks/processes/threads/objects With shared variables or message passing Relationship between programming model and architecture ?

Issues from architect’s perspective Coherence problem in shared memory with caches Efficient interconnection networks

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks Coherence Protocols - Bus or directory based - Invalidate or update - Definition of states

Cache Coherence Problem Multiple copies of data may exist  Problem of cache coherence Options for coherence protocols What action is taken? Invalidate or Update Which processors/caches communicate? Snoopy (broadcast) or directory based Status of each block?

Outline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures Cache coherence problem Interconnection networks Switching and control Topology

Interconnection Networks Architectural Variations: Topology Direct or Indirect (through switches) Static (fixed connections) or Dynamic (connections established as required) Routing type store and forward/worm hole) Efficiency: Delay Bandwidth Cost

Books D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997. M.J. Flynn, "Computer Architecture : Pipelined and Parallel Processor Design", Narosa Publishing House/ Jones and Bartlett, 1996. D.A. Patterson, J.L. Hennessy, "Computer Architecture : A Quantitative Approach", Morgan Kaufmann Publishers, 2002. K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993. H.G. Cragon, "Memory Systems and Pipelined Processors", Narosa Publishing House/ Jones and Bartlett, 1998. D.E. Culler, J.P Singh and Anoop Gupta, "Parallel Computer Architecture, A Hardware/Software Approach", Harcourt Asia / Morgan Kaufmann Publishers, 2000.