SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

Slides:

Advertisements

Similar presentations

Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

Structure of Computer Systems

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.

Chapter 8 Hardware Conventional Computer Hardware Architecture.

1 Burroughs B5500 multiprocessor. These machines were designed to support HLLs, such as Algol. They used a stack architecture, but part of the stack was.

History of Distributed Systems Joseph Cordina

Introduction CS 524 – High-Performance Computing.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

Chapter Hardwired vs Microprogrammed Control Multithreading

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

Chapter 17 Parallel Processing.

CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Distributed Computing Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Presentation On Parallel Computing  By  Abdul Mobin  KSU ID :  Id :

Computer System Architectures Computer System Software

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Multi-core architectures. Single-core computer Single-core CPU chip.

Multi-Core Architectures

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.

Classic Model of Parallel Processing

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Data Management for Decision Support Session-4 Prof. Bharat Bhasker.

Processor Architecture

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Outline Why this subject? What is High Performance Computing?

EKT303/4 Superscalar vs Super-pipelined.

Lecture 3: Computer Architectures

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Background Computer System Architectures Computer System Software.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.

Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

PipeliningPipelining Computer Architecture (Fall 2006)

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

University of Technology

18-447: Computer Architecture Lecture 30B: Multiprocessors

Distributed Processors

Parallel Processing - introduction

Multi-Processing in High Performance Computer Architecture:

Chapter 1 Introduction.

Chapter 4 Multiprocessors

Multicore and GPU Programming

Multicore and GPU Programming

Presentation transcript:

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING Background Amdahl's law and Gustafson's law Dependencies Race conditions, mutual exclusion, synchronization, and parallel slowdown Fine-grained, coarse-grained, and embarrassing parallelism

SJSU SPRING 2011 PARALLEL COMPUTING Amdahl's Law The speed-up of a program from parallelization is limited by how much of the program can be parallelized. Amdahl's Law

SJSU SPRING 2011 PARALLEL COMPUTING Dependencies Consider the following functions, which demonstrate several kinds of dependencies: 1: function Dep(a, b) 2: c := a·b 3: d := 2·c 4: end function Operation 3 in Dep(a, b) cannot be executed before (or even in parallel with) operation 2, because operation 3 uses a result from operation 2. It violates condition 1, and thus introduces a flow dependency.

SJSU SPRING 2011 PARALLEL COMPUTING Dependencies Consider the following functions 1: function NoDep(a, b) 2: c := a·b 3: d := 2·b 4: e := a+b 5: end function In this example, there are no dependencies between the instructions, so they can all be run in parallel.

SJSU SPRING 2011 PARALLEL COMPUTING Race condition A flaw whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. Can occur in electronics systems, logic circuits, and multithreaded software. Race condition in a logic circuit. Here, ∆t1 and ∆t2 represent the propagation delays of The logic elements. When the input value (A) changes, the circuit outputs a short spike of duration ∆t1.

SJSU SPRING 2011 PARALLEL COMPUTING Fine-grained, coarse-grained, and embarrassing parallelism Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. Fine-grained parallelism: subtasks must communicate many times per second Coarse-grained parallelism: they do not communicate many times per second Embarrassingly parallel: rarely or never have to communicate. Embarrassingly parallel applications are the easiest to parallelize

SJSU SPRING 2011 PARALLEL COMPUTING Types of parallelism Data parallelism Task parallelism Bit-level parallelism Instruction-level parallelism A five-stage pipelined superscalar processor, capable of issuing two instructions per cycle. It can have two instructions in each stage of the pipeline, for a total of up to 10 instructions (shown in green) being simultaneously executed.

SJSU SPRING 2011 PARALLEL COMPUTING Hardware Memory and communication Classes of parallel computers Multicore computing Symmetric multiprocessing Distributed computing

SJSU SPRING 2011 PARALLEL COMPUTING Multicore Computing PROS better than dual core won't use the same bandwidth and bus therefore be even faster. CONS heat dissipation problems more expensive

SJSU SPRING 2011 PARALLEL COMPUTING Software Parallel programming languages Automatic parallelization Application checkpointing

SJSU SPRING 2011 PARALLEL COMPUTING Parallel programming languages Concurrent programming languages, libraries, APIs, and parallel programming models (such as Algorithmic Skeletons) have been created for programming parallel computers. Shared memory Distributed memory Shared distributed memory

SJSU SPRING 2011 PARALLEL COMPUTING Automatic parallelization Automatic parallelization of a sequential program by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, has had only limited success. Mainstream parallel programming languages remain either explicitly parallel or (at best) partially implicit, in which a programmer gives the compiler directives for parallelization. A few fully implicit parallel programming languages exist—SISAL, Parallel Haskell, and (for FPGAs) Mitrion-C.

SJSU SPRING 2011 PARALLEL COMPUTING Application checkpointing The larger and more complex a computer is, the more that can go wrong and the shorter the mean time between failures. Application checkpointing is a technique whereby the computer system takes a "snapshot" of the application. This information can be used to restore the program if the computer should fail.

SJSU SPRING 2011 PARALLEL COMPUTING Algorithmic methods Parallel computing is used in a wide range of fields, from bioinformatics to economics. Common types of problems found in parallel computing applications are: Dense linear algebra Sparse linear algebra Dynamic programming Finite-state machine simulation

SJSU SPRING 2011 PARALLEL COMPUTING Programming The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed. The base language of supercomputer code is, in general, Fortran or C, using special libraries to share data between nodes. The new massively parallel GPGPUs have hundreds of processor cores and are programmed using programming models such as CUDA and OpenCL.

SJSU SPRING 2011 PARALLEL COMPUTING Classes of parallel computers Parallel computers can be roughly classified according to the level at which the hardware supports parallelism. Multicore computing Symmetric multiprocessing Distributed computing Specialized parallel computers

SJSU SPRING 2011 PARALLEL COMPUTING Multicore computing Includes multiple execution units ("cores") on the same chip. Can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar. Simultaneous multithreading has only one execution unit, but when that unit is idling (such as during a cache miss), it process a second thread. IBM's Cell microprocessor, for use in the Sony PlayStation 3 is multithreading.

SJSU SPRING 2011 PARALLEL COMPUTING Symmetric multiprocessing A computer system with multiple identical processors that share memory and connect via a bus. Bus contention prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors. Small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely cost-effective.

SJSU SPRING 2011 PARALLEL COMPUTING Distributed computing A distributed memory computer system in which the processing elements are connected by a network. Highly scalable. (a)–(b) A distributed system. (c) A parallel system.

SJSU SPRING 2011 PARALLEL COMPUTING Specialized parallel computers Within parallel computing, there are specialized parallel devices that tend to be applicable to only a few classes of parallel problems. Reconfigurable computing General-purpose computing on graphics processing units Application-specific integrated circuits Vector processors

SJSU SPRING 2011 PARALLEL COMPUTING Questions?

SJSU SPRING 2011 PARALLEL COMPUTING References: Wikipedia.org Google.com