Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.

Slides:



Advertisements
Similar presentations
Parallel Processors.
Advertisements

Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Parallel computer architecture classification
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
Parallell Processing Systems1 Chapter 4 Vector Processors.
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
1 Vector Architectures Sima, Fountain and Kacsuk Chapter 14 CSE462.
1 Introduction to Data Parallel Architectures Sima, Fountain and Kacsuk Chapter 10 CSE462.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
History of Distributed Systems Joseph Cordina
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
Introduction to Parallel Processing Ch. 12, Pg
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Parallel Architectures
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
1 CS 161 Introduction to Programming and Problem Solving Chapter 4 Computer Taxonomy Herbert G. Mayer, PSU Status 10/11/2014.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Outline Classification ILP Architectures Data Parallel Architectures
Parallel and Distributed Computing References Introduction to Parallel Computing, Second Edition Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer Architectures 2 nd week References Flynn’s Taxonomy Classification of Parallel.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Parallel Computing.
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
Computer Architecture SIMD Ola Flygt Växjö University
Data Structures and Algorithms in Parallel Computing Lecture 1.
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Outline Why this subject? What is High Performance Computing?
Playstation2 Architecture Architecture Hardware Design.
Lecture 3: Computer Architectures
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Higher Level Parallelism
Parallel computer architecture classification
How does an SIMD computer work?
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Chap. 9 Pipeline and Vector Processing
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
CSL718 : Multiprocessors 13th April, 2006 Introduction
Presentation transcript:

Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006

Anshul Kumar, CSE IITD Data Parallel Architectures SIMD Processors –Multiple processing elements driven by a single instruction stream Associative Processors –SIMD like processors with associative memory Vector Processors –Uni-processors with vector instructions Systolic Arrays –Application specific VLSI structures

Anshul Kumar, CSE IITD SIMDSIMD C P P M IS DS One of the earliest model of parallel computer

Anshul Kumar, CSE IITD ILLIAC IV SIMD Model P M P M P M P M Interconnection network PE1PE2PEn CU I/O bus Planned for 64 x 4 PEs, built only 64

Anshul Kumar, CSE IITD Burroughs Scientific Processor (BSP) Model P M P1P1 M1M1 P2P2 M2M2 PnPn MkMk Interconnection network CU I/O bus

Anshul Kumar, CSE IITD SIMD algorithms: sum of vector elements Si = ai + ai+1 i = 0,2,4,6 Si = Si + Si+2 i = 0,4 Si = Si + Si+4 i = 0 a0a1a2a3a4a5a6a7 a0+a1a2+a3a4+a5a6+a7 a0+a1+ a2+a3 a4+a5+ a6+a7 a0+a1+a2+a3+ a4+a5+a6+a7 step 1: step 2: step 3: Si = ai + ai+4 i = 0,1,2,3 Si = Si + Si+2 i = 0,1 Si = Si + Si+1 i = 0 OR

Anshul Kumar, CSE IITD No. of processors vs time Adding vector elements: –n processors – log n steps –n/log n processors – log n steps Matrix multiplication: –n processor – n 2 steps –n 2 processors – n steps –n 3 processors – log n steps –n 3 /log n processors – log n steps Important factors: data distribution, network

Anshul Kumar, CSE IITD Rise and fall of SIMDs Introduced in 60’s (e.g. Illiac, BSP) Problems: –not cost effective –serial fraction and Amdahl’s law –I/O bottle neck Overshadowed by Vector Processors Resurrected in 80’s (MPP from Goodyear, Connection machine from Thinking Machines Inc., MP-1 from MasPar) Did not survive because of high cost

Anshul Kumar, CSE IITD Related ideas Coarse grain SIMD with off the shelf processors (synchronized MIMD), e.g. CM5 of Thinking Machines This gave rise to SPMD (single program multiple data) MMX and SIMD instructions in Pentium

Anshul Kumar, CSE IITD Vector Processors I-cache D-cache Mem control I-unit and control V-regGPRs address unit VFU FU Buses Memory

Anshul Kumar, CSE IITD Four Generations of CRAY systems (vector processors) SystemCPUsClockFlops/WordsMflopsGates/ MHzclock/moved/chip CPUclk/CPU CRAY X-MP Y-MP C

Anshul Kumar, CSE IITD Cray History

Anshul Kumar, CSE IITD CRAY C90 8GB central memory shared by 16 CPUs 128 CPU - mem paths word = 64 bits + 16 ECC Dual vector pipes 128 element segments Memory 8 sections 8x8 sub sections 8x8x2 bank groups 8x8x2x8 banks

Anshul Kumar, CSE IITD Convex C4/XA system CPU: 7.5 ns clock, 1620 MFLOPs Mem: 32 MB x 32 banks, 64 bit word, 50ns access time 3 FP pipes, 2 results each Vector regs - FPU cross bar 1.1 GB/s per I/O port 5 x 5 crossbar CPUs memories I/Outilities

Anshul Kumar, CSE IITD Other examples NEC SX - X 4 CPUs 4 x 2 pipes each Fujitsu VP CPUs 2 LS pipes 3 Func pipes 2 mask pipes Fujitsu VP CPUs

Anshul Kumar, CSE IITD Systolic Arrays (H.T. Kung 1978) Simplicity, Regularity, Concurrency, Communication Example : Band matrix multiplication

B 11 B 12 B 21 B 31 A 11 A 12 A 21 A 22 A 31 A 23 T=0

B 11 B 12 B 21 B 31 B 22 A 11 A 12 A 21 A 22 A 31 A 23 A 32 T=1

A 11 A 12 A 21 A 22 A 31 A 23 A 32 A 33 B 11 B 12 B 21 B 31 B 22 B 32 T=2

A 21 A 22 A 31 A 23 A 32 A 33 A 34 B 12 B 31 B 22 B 32 B 42 A 11 B 11 A 42 B 23 A 12 B 21 T=3

A 22 A 31 A 23 A 32 A 33 A 34 B 31 B 22 B 32 B 42 A 11 B 11 A 12 B 21 A 42 B 23 A 11 B 12 A 21 B 11 B 33 A 43 T=4

A 23 A 32 A 33 A 34 B 31 B 32 B 42 A 42 B 23 B 33 A 43 A 11 B 12 A 12 B 22 A 21 B 12 A 21 B 11 A 22 B 21 C 11 A 31 B 11 T=5

A 33 A 34 B 32 B 42 A 42 B 33 A 43 A 21 B 12 A 22 B 22 A 21 B 11 A 22 B 21 A 23 B 31 C 11 A 31 B 12 A 31 B 11 A 32 B 21 C 12 A 12 B 23 A 53 A 44 B 43 T=6

Anshul Kumar, CSE IITD WARP: Programmable Systolic Processor [Kung, CMU 1987] Complete contrast to the original idea not application specific not a single VLSI complex cell (pipelined FP adder, mult, FIFOs, RAM, cross bar) linear asynchronous

Anshul Kumar, CSE IITD ReferencesReferences D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993.