Introduction to High Performance Computing Lecture 12

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Chapter 4 Retiming.

1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).

1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Reference: Message Passing Fundamentals.

Chapter Hardwired vs Microprogrammed Control Multithreading

Computer Science 1620 Multi-Dimensional Arrays. we used arrays to store a set of data of the same type e.g. store the assignment grades for a particular.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.

Introduction to Parallel Processing Ch. 12, Pg

2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Comparing AlgorithmsCSCI 1900 – Discrete Structures CSCI 1900 Discrete Structures Complexity Reading: Kolman, Sections 4.6, 5.2, and 5.3.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.

CSCI-455/552 Introduction to High Performance Computing Lecture 13.

Memory/Storage Architecture Lab Computer Architecture Multiprocessors.

Lab 2 Parallel processing using NIOS II processors

Processor Architecture

Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.

DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

CSCI-455/552 Introduction to High Performance Computing Lecture 6.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

$100 $200 $300 $400 $100 $200 $300 $400 $300 $200 $100 Adding multiples of 10 Add 1 to 2 digit numbers with regrouping Add 3 two digit numbers Ways.

An Overview of Parallel Processing

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Thinking in Parallel - Introduction New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Unit 1 Introduction Number Systems and Conversion.

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

Parallel Processing - introduction

Parallel Programming By J. H. Wang May 2, 2017.

EKT 221 : Digital 2 Serial Transfers & Microoperations

Course Description Algorithms are: Recipes for solving problems.

Morgan Kaufmann Publishers

Pipelined Computations

Arrays & Functions Lesson xx

Pipelining and Vector Processing

Chapter 17 Parallel Processing

Decomposition Data Decomposition Functional Decomposition

A First Book of ANSI C Fourth Edition

Pipelined Computations

Multivector and SIMD Computers

Overview Parallel Processing Pipelining

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

AN INTRODUCTION ON PARALLEL PROCESSING

Computing in COBOL: The Arithmetic Verbs and Intrinsic Functions

By Brandon, Ben, and Lee Parallel Computing.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Ainsley Smith Tel: Ex

UNIVERSITY OF MASSACHUSETTS Dept

Pipelining: Basic Concepts

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

UNIVERSITY OF MASSACHUSETTS Dept

Course Description Algorithms are: Recipes for solving problems.

Data Parallel Pattern 6c.1

Chapter 13 Recursion Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

Fetch And Add – switching network

Presentation transcript:

Introduction to High Performance Computing Lecture 12 CSCI-455/552 Introduction to High Performance Computing Lecture 12

Pipelined Computations Chapter 5 Pipelined Computations 5.1

Pipelined Computations Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each task executed by a separate process or processor. 5.2

Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4]; . 5.3

Pipeline for an Unfolded Loop 5.4

Another Example Frequency filter - Objective to remove specific frequencies (f0, f1, f2,f3, etc.) from a digitized signal, f(t). Signal enters pipeline from left: 5.5

Where Pipelining Can Be Used to Good Effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1. If more than one instance of the complete problem is to be Executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start next process can be passed forward before process has completed all its internal operations 5.6

“Type 1” Pipeline Space-Time Diagram 5.7

Alternative Space-time Diagram 5.8

“Type 2” Pipeline Space-Time Diagram 5.9

“Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before previous state completed. 5.10

If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: 5.11

Computing Platform for Pipelined Applications Multiprocessor system with a line configuration Strictly speaking pipeline may not be the best structure for a cluster - however a cluster with switched direct connections, as most have, can support simultaneous message passing. 5. 12

Example Pipelined Solutions (Examples of each type of computation) 5.13

Pipeline Program Examples Adding Numbers Type 1 pipeline computation 5.14

Basic code for process Pi : recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation, Pi+1); except for the first process, P0, which is send(&number, P1); and the last process, Pn-1, which is recv(&number, Pn-2); accumulation = accumulation + number; 5.15

SPMD Program if (process > 0) { recv(&accumulation, Pi-1); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1); The final result is in the last process. Instead of addition, other arithmetic operations could be done. 5.16

Pipelined Addition Numbers Master process and ring configuration 5.17