Data Parallel Algorithms

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Instruction Set Design
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
CM-5 Massively Parallel Supercomputer ALAN MOSER Thinking Machines Corporation 1993.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter Hardwired vs Microprogrammed Control Multithreading
GCSE Computing - The CPU
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
Chapter 1 Algorithm Analysis
LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
The Central Processing Unit (CPU)
Parallel Computing Presented by Justin Reschke
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
1 Lecture 5a: CPU architecture 101 boris.
Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.
GCSE Computing - The CPU
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Overview Parallel Processing Pipelining
Higher Level Parallelism
Distributed and Parallel Processing
Distributed Processors
Dynamic connection system
buses, crossing switch, multistage network.
Java Primer 1: Types, Classes and Operators
Parallel Processing - introduction
Parallel Algorithms (chap. 30, 1st edition)
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Data Structures Interview / VIVA Questions and Answers
Flynn’s Classification Of Computer Architectures
Introduction of microprocessor
William Stallings Computer Organization and Architecture 8th Edition
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Number Representations and Basic Processor Architecture
Pipelining and Vector Processing
Array Processor.
Chapter 4: Threads.
Threads, SMP, and Microkernels
Data Structures and Algorithms in Parallel Computing
Computer Architecture
Multivector and SIMD Computers
Representation, Syntax, Paradigms, Types
Operating System Concepts
buses, crossing switch, multistage network.
Background and Motivation
Advanced Computer and Parallel Processing
Part 2: Parallel Models (I)
Channels.
Unit –VIII PRAM Algorithms.
Advanced Computer and Parallel Processing
Channels.
Channels.
Process.
Software Design Lecture : 39.
GCSE Computing - The CPU
Data Parallel Pattern 6c.1
CSc 453 Interpreters & Interpretation
Data Parallel Computations and Pattern
Data Parallel Computations and Pattern
Presentation transcript:

Data Parallel Algorithms Article by W. DANIEL HILLIS and GUY L. STEELE, JR. Presented by: ALAN MOSER Tuesday June 28, 2005

Overview Review: Data Parallel vs. Control Parallel Connection Machine Programming Model Differences of the Connection Machine Model Algorithms Summation Prefix summation by doubling Finding the end of a Linked List All partial sums of a Linked List Matching up elements of two Linked Lists

Data vs. Control Parallel Data parallel (SIMD) same instruction is executed synchronous by all processors on multiple data items. Control parallel (MIMD) each processor may execute a different instruction from the same code asynchronously on multiple data items.

Connections Machine Programming Model Consists of two parts: 1. Front-end computer -traditional SISD computer serves as controller, VAX or Symbolics 3600 2. Array of Connection Machine processors -each with own local memory -to the front end processor array appears as memory

Executing Instructions & Selecting Processors Executes in SIMD fashion -A single instruction stream from front-end acts on multiple data items. Each processor has state bit or context flag -Context flag set to 1 means CPU is selected Instructions are one of two types -Conditional, only CPU’s selected will execute -Unconditional, all CPU’s will execute regardless of context flag

Differences of the Connection Machine Model General pointer-based communication Virtual Processors

General Communication? Typical computers of fine-grained SIMD style restrict communication to patterns such as a grid or tree wired into the hardware. The connection machine model allows any CPU to communicate with any other CPU while other CPU’s communicate concurrently via a SEND instruction.

SEND Instruction SEND Instruction takes two operands 1. address of the data to be sent 2. A processor pointer -i.e. CPU number and field within that CPU to which data is to be placed.

Virtual Processors The connection machine model is abstracted from the hardware that supports it. (i.e. number and size of its processors) Programs described in terms of virtual processors.

Benefits of Virtual Processors Same program can run unchanged on different sizes of the connection machine Number of CPU’s may be regarded as expandable rather than fixed. CPU’s may be allocated dynamically “on the fly” processor-cons instruction allocates memory, memory comes with own CPU attached

Data Parallel Algorithms Summation Prefix Summation Finding the end of a Linked List All Partial sums of a Linked List Matching the elements of two Linked Lists

Summation of an Array for j := 1 to log n do for all k in parallel do if ((k + 1) mod 2^j) = 0 then x[k] := x[k – 2^(j-1)] + x[k] fi od

Diagram of Summation of Array

Prefix Summation of an Array for j := 1 to log n do for all k in parallel do if (k > = 2^j) then x[k] := x[k – 2^(j-1)] + x[k] fi od

Diagram of Prefix Summation of Array by Doubling

Count Instruction Every CPU unconditionally examines its context flag compute 1 if set, 0 if clear then pre-forms an unconditionally summation of the integer values Used to count the number of selected CPU’s implicit use of summation algorithm

Enumerate Instruction Every CPU unconditionally examines its context flag compute 1 if set, 0 if clear then pre-forms an unconditional prefix summation of the integer values. Used to count and number the selected CPUs (implicit use of prefix summation) Result every CPU receives a count of the number of active processors that precede it (including itself)

Finding the end of a Linked List for all k in parallel do chum[k] := next [k] while chum [k] != null and chum [chum [k]] != null do chum [k] := chum [chum[k]] od

Linked List after each Iteration of loop Original Linked List Linked List after each Iteration of loop

All Partial Sums of a Linked List for all k in parallel do chum [k] := next [k] while chum[k] != null do value[chum[k]] := value[k] + value[chum[k]] chum [k] := chum [chum [k] ] od

Linked List after execution of chum[k]:=next[k] Original Linked List Linked List after execution of chum[k]:=next[k] Linked List after first Iteration of while loop

Linked List after last Iteration of while loop Final product shown without chum pointers

Matching the elements in two Linked Lists for all k in parallel do friend[k] := null od friend[list1] := list2 friend[list2] := list1 chum [k] := next [k] while chum [k] != null do if friend[k] != null then friend[chum[k]] := chum [friend[k]] chum [k] := chum [ckum[k]] fi

Two original Linked Lists

Properties of Matching two Linked Lists Possible to match two lists of different lengths If list2 is friend of list1 but not vise versa then list2 will have friend components that are null (unaffected) This algorithm can process many lists or pairs of linked lists simultaneously

Reference W. DANIEL HILLIS and GUY L. STEELE, JR. “DATA PARALLEL ALGORITHMS,” Communications of the ACM. December 1986 Volume 29 Number 12