ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Novel Architectures Copyright 2004 Daniel J. Sorin Duke University.

Slides:

Advertisements

Similar presentations

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Advertisements

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.

The University of Adelaide, School of Computer Science

1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems part 5: Special and weird ‘processor’

Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.

Reference: Message Passing Fundamentals.

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Examples of Two- Dimensional Systolic Arrays. Obvious Matrix Multiply Rows of a distributed to each PE in row. Columns of b distributed to each PE in.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.

ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

CSE 690: GPGPU Lecture 4: Stream Processing Klaus Mueller Computer Science, Stony Brook University.

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

Basics and Architectures

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Yulia Newton CS 147, Fall 2009 SJSU. What is it? “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Institute for Software Science – University of ViennaP.Brezany Parallel and Distributed Systems Peter Brezany Institute for Software Science University.

CDA 5155 Superscalar, VLIW, Vector, Decoupled Week 4.

Performance of mathematical software Agner Fog Technical University of Denmark

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:

Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.

Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Introduction to MMX, XMM, SSE and SSE2 Technology

Parallel Computing.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Vector and symbolic processors

Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.

Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.

Chapter 5 Computer Systems Organization. Levels of Abstraction – Figure 5.1e The Concept of Abstraction.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 2 – Von Neumann.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 6 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 3 – Instruction.

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Copyright © Curt Hill SIMD Single Instruction Multiple Data.

Hank Childs, University of Oregon Jan. 30 th, 2014 CIS 610: Vector Models for Data-Parallel Computing.

CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 20/11/2015 Compilers for Embedded Systems.

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Single Instruction Multiple Data

A Common Machine Language for Communication-Exposed Architectures

Flynn’s Classification Of Computer Architectures

Morgan Kaufmann Publishers

Foundations of Computer Science

Linchuan Chen, Peng Jiang and Gagan Agrawal

Parallel Architectures

Coe818 Advanced Computer Architecture

Samuel Larsen and Saman Amarasinghe, MIT CSAIL

The Vector-Thread Architecture

Mastering Memory Modes

Introduction to CUDA.

Multicore and GPU Programming

Copyright © 2013 Elsevier Inc. All rights reserved.

Presentation transcript:

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Novel Architectures Copyright 2004 Daniel J. Sorin Duke University

2 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Non-MIMD Architectures We’ve primarily looked at MIMD architectures There are other organizations out there, though! –Vector (a type of SIMD) –Dataflow –Grid –Systolic array –Neural network –Etc. We’re going to look at some of these now

3 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Vector/SIMD SIMD = single instruction, multiple data One implementation of SIMD is a vector computer System that can operate on vectors of data –Most machines operate on single (scalar) values Vector operations may include –Add Vector A to Vector B –Sum the elements of Vector C –Load Vector D from memory

4 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Vector Machines The classic vector architecture is the Cray-1 Cray used vectors in their supercomputers Now, many architectures include vector support –Alpha Tarantula –Intel MMX (scalars treated as vectors of small data pieces) Old saying: “Vector machines work great on vectorizable code” … with the implication that most code isn’t vectorizable What kind of algorithms are vectorizable?

5 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Cray-1 DISCUSSION

6 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Alpha Tarantula PRESENTATION

7 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Dataflow Architectures Not a Von Neumann machine organization Directly execute a dataflow graph Requires new programming model –Or the compiler must create the dataflow graph from the code –Several dataflow languages created, including Id (MIT) In theory, a great idea In practice, tough to implement efficiently –However, dataflow systems do exist … but within microprocessors!

8 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Tagged Token Dataflow DISCUSSION

9 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Grid Architectures Not “grid computing” in the popular sense –Instead, a grid of maybe dozens of processing elements (PEs) Two examples –MIT RAW machine –Texas TRIPS machine RAW uses the compiler to map computation onto the grid (and to schedule the communication across it) TRIPS exploits physical locality among PEs to speed up communication of data

10 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 MIT Raw Machine PRESENTATION

11 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Texas TRIPS PRESENTATION