Parallel computer architecture classification

Slides:

Advertisements

Similar presentations

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Lecture 6: Multicore Systems

Today’s topics Single processors and the Memory Hierarchy

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 4 By Rohit Khokher Department.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Introduction CS 524 – High-Performance Computing.

Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

Lecture 1: Introduction to High Performance Computing.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.

Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.

Outline Classification ILP Architectures Data Parallel Architectures

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:

Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.

Parallel Computing.

CS591x -Cluster Computing and Parallel Programming

Outline Why this subject? What is High Performance Computing?

Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.

EKT303/4 Superscalar vs Super-pipelined.

Lecture 3: Computer Architectures

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.

CS203 – Advanced Computer Architecture Performance Evaluation.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Processor Level Parallelism 1

These slides are based on the book:

CS203 – Advanced Computer Architecture

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

18-447: Computer Architecture Lecture 30B: Multiprocessors

Parallel computer architecture classification

buses, crossing switch, multistage network.

Parallel Processing - introduction

CS 147 – Parallel Processing

Super Computing By RIsaj t r S3 ece, roll 50.

Flynn’s Classification Of Computer Architectures

Modern Processor Design: Superscalar and Superpipelining

Multi-Processing in High Performance Computer Architecture:

MIMD Multiple instruction, multiple data

Chapter 17 Parallel Processing

Symmetric Multiprocessing (SMP)

buses, crossing switch, multistage network.

Overview Parallel Processing Pipelining

AN INTRODUCTION ON PARALLEL PROCESSING

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Multicore and GPU Programming

The University of Adelaide, School of Computer Science

Utsunomiya University

Multicore and GPU Programming

Presentation transcript:

Parallel computer architecture classification

Hardware Parallelism Computing: execute instructions that operate on data. Flynn’s taxonomy (Michael Flynn, 1967) classifies computer architectures based on the number of instructions that can be executed and how they operate on data. Computer Instructions Data

Flynn’s taxonomy Single Instruction Single Data (SISD) Traditional sequential computing systems Single Instruction Multiple Data (SIMD) Multiple Instructions Multiple Data (MIMD) Multiple Instructions Single Data (MISD) Computer Architectures SISD SIMD MIMD MISD

SISD At one time, one instruction operates on one data Traditional sequential architecture

SIMD At one time, one instruction operates on many data Data parallel architecture Vector architecture has similar characteristics, but achieve the parallelism with pipelining. Array processors

Array processor (SIMD) IP MAR MEMORY OP ADDR MDR A1 B1 C1 A2 B2 C2 AN BN CN DECODER ALU ALU ALU

MIMD Multiple instruction streams operating on multiple data streams Classical distributed memory or SMP architectures

MISD machine Not commonly seen. Systolic array is one example of an MISD architecture.

Flynn’s taxonomy summary SISD: traditional sequential architecture SIMD: processor arrays, vector processor Parallel computing on a budget – reduced control unit cost Many early supercomputers MIMD: most general purpose parallel computer today Clusters, MPP, data centers MISD: not a general purpose architecture.

Flynn’s classification on today’s architectures Multicore processors Superscalar: Pipelined + multiple issues. SSE (Intel and AMD’s support for performing operation on 2 doubles or 4 floats simultaneously). GPU: Cuda architecture IBM BlueGene

Modern classification (Sima, Fountain, Kacsuk) Classify based on how parallelism is achieved by operating on multiple data: data parallelism by performing many functions in parallel: function parallelism Control parallelism, task parallelism depending on the level of the functional parallelism. Parallel architectures Data-parallel architectures Function-parallel architectures

Data parallel architectures Vector processors, SIMD (array processors), systolic arrays. IP MAR Vector processor (pipelining) MEMORY A B C OP ADDR MDR DECODER ALU

Data parallel architecture: Array processor IP MAR MEMORY OP ADDR MDR A1 B1 C1 A2 B2 C2 AN BN CN DECODER ALU ALU ALU

Control parallel architectures Function-parallel architectures Process level Parallel Arch Thread level Parallel Arch Instruction level Parallel Arch (ILPs) (MIMDs) Distributed Memory MIMD Superscalar processors Shared Memory MIMD Pipelined processors VLIWs

Classifying today’s architectures Multicore processors? Superscalar? SSE? GPU: Cuda architecture? IBM BlueGene?

Performance of parallel architectures Common metrics MIPS: million instructions per second MIPS = instruction count/(execution time x 106) MFLOPS: million floating point operations per second. MFLOPS = FP ops in program/(execution time x 106) Which is a better metric? FLOP is more related to the time of a task in numerical code # of FLOP / program is determined by the matrix size

Performance of parallel architectures FlOPS units kiloFLOPS (KFLOPS) 10^3 megaFLOPS (MFLOPS) 10^6 gigaFLOPS (GFLOPS) 10^9  single CPU performance teraFLOPS (TFLOPS) 10^12 petaFLOPS (PFLOPS) 10^15  we are here right now 10 petaFLOPS supercomputers exaFLOPS (EFLOPS) 10^18  the next milestone

Peak and sustained performance Peak performance Measured in MFLOPS Highest possible MFLOPS when the system does nothing but numerical computation Rough hardware measure Little indication on how the system will perform in practice.

Peak and sustained performance The MFLOPS rate that a program achieves over the entire run. Measuring sustained performance Using benchmarks Peak MFLOPS is usually much larger than sustained MFLOPS Efficiency rate = sustained MFLOPS / peak MFLOPS

Measuring the performance of parallel computers Benchmarks: programs that are used to measure the performance. LINPACK benchmark: a measure of a system’s floating point computing power Solving a dense N by N system of linear equations Ax=b Use to rank supercomputers in the top500 list.

Other common benchmarks Micro benchmarks suit Numerical computing LAPACK ScaLAPACK Memory bandwidth STREAM Kernel benchmarks NPB (NAS parallel benchmark) PARKBENCH SPEC Splash

Summary Flynn’s classification Modern classification Performance SISD, SIMD, MIMD, MISD Modern classification Data parallelism function parallelism Instruction level, thread level, and process level Performance MIPS, MFLOPS Peak performance and sustained performance

References K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.