Data Parallel Algorithms Article by W. DANIEL HILLIS and GUY L. STEELE, JR. Presented by: ALAN MOSER Tuesday June 28, 2005
Overview Review: Data Parallel vs. Control Parallel Connection Machine Programming Model Differences of the Connection Machine Model Algorithms Summation Prefix summation by doubling Finding the end of a Linked List All partial sums of a Linked List Matching up elements of two Linked Lists
Data vs. Control Parallel Data parallel (SIMD) same instruction is executed synchronous by all processors on multiple data items. Control parallel (MIMD) each processor may execute a different instruction from the same code asynchronously on multiple data items.
Connections Machine Programming Model Consists of two parts: 1. Front-end computer -traditional SISD computer serves as controller, VAX or Symbolics 3600 2. Array of Connection Machine processors -each with own local memory -to the front end processor array appears as memory
Executing Instructions & Selecting Processors Executes in SIMD fashion -A single instruction stream from front-end acts on multiple data items. Each processor has state bit or context flag -Context flag set to 1 means CPU is selected Instructions are one of two types -Conditional, only CPU’s selected will execute -Unconditional, all CPU’s will execute regardless of context flag
Differences of the Connection Machine Model General pointer-based communication Virtual Processors
General Communication? Typical computers of fine-grained SIMD style restrict communication to patterns such as a grid or tree wired into the hardware. The connection machine model allows any CPU to communicate with any other CPU while other CPU’s communicate concurrently via a SEND instruction.
SEND Instruction SEND Instruction takes two operands 1. address of the data to be sent 2. A processor pointer -i.e. CPU number and field within that CPU to which data is to be placed.
Virtual Processors The connection machine model is abstracted from the hardware that supports it. (i.e. number and size of its processors) Programs described in terms of virtual processors.
Benefits of Virtual Processors Same program can run unchanged on different sizes of the connection machine Number of CPU’s may be regarded as expandable rather than fixed. CPU’s may be allocated dynamically “on the fly” processor-cons instruction allocates memory, memory comes with own CPU attached
Data Parallel Algorithms Summation Prefix Summation Finding the end of a Linked List All Partial sums of a Linked List Matching the elements of two Linked Lists
Summation of an Array for j := 1 to log n do for all k in parallel do if ((k + 1) mod 2^j) = 0 then x[k] := x[k – 2^(j-1)] + x[k] fi od
Diagram of Summation of Array
Prefix Summation of an Array for j := 1 to log n do for all k in parallel do if (k > = 2^j) then x[k] := x[k – 2^(j-1)] + x[k] fi od
Diagram of Prefix Summation of Array by Doubling
Count Instruction Every CPU unconditionally examines its context flag compute 1 if set, 0 if clear then pre-forms an unconditionally summation of the integer values Used to count the number of selected CPU’s implicit use of summation algorithm
Enumerate Instruction Every CPU unconditionally examines its context flag compute 1 if set, 0 if clear then pre-forms an unconditional prefix summation of the integer values. Used to count and number the selected CPUs (implicit use of prefix summation) Result every CPU receives a count of the number of active processors that precede it (including itself)
Finding the end of a Linked List for all k in parallel do chum[k] := next [k] while chum [k] != null and chum [chum [k]] != null do chum [k] := chum [chum[k]] od
Linked List after each Iteration of loop Original Linked List Linked List after each Iteration of loop
All Partial Sums of a Linked List for all k in parallel do chum [k] := next [k] while chum[k] != null do value[chum[k]] := value[k] + value[chum[k]] chum [k] := chum [chum [k] ] od
Linked List after execution of chum[k]:=next[k] Original Linked List Linked List after execution of chum[k]:=next[k] Linked List after first Iteration of while loop
Linked List after last Iteration of while loop Final product shown without chum pointers
Matching the elements in two Linked Lists for all k in parallel do friend[k] := null od friend[list1] := list2 friend[list2] := list1 chum [k] := next [k] while chum [k] != null do if friend[k] != null then friend[chum[k]] := chum [friend[k]] chum [k] := chum [ckum[k]] fi
Two original Linked Lists
Properties of Matching two Linked Lists Possible to match two lists of different lengths If list2 is friend of list1 but not vise versa then list2 will have friend components that are null (unaffected) This algorithm can process many lists or pairs of linked lists simultaneously
Reference W. DANIEL HILLIS and GUY L. STEELE, JR. “DATA PARALLEL ALGORITHMS,” Communications of the ACM. December 1986 Volume 29 Number 12