Download presentation
Presentation is loading. Please wait.
1
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation
2
ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation °Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code °Develop a parallel compiler Intermediate form Partitioning -Block based or loop based Placement Routing
3
ECE669 L23: Parallel Compilation April 29, 2004 Compilation technologies for parallel machines Assumptions: Input:Parallel program Output:“Coarse” parallel program & directives for: Which threads run in 1 task Where tasks are placed Where data is placed Which data elements go in each data chunk Limitation: No special optimizations for synchronization -- synchro mem refs treated as any other comm.
4
ECE669 L23: Parallel Compilation April 29, 2004 Toy example °Loop parallelization
5
ECE669 L23: Parallel Compilation April 29, 2004 Example °Matrix multiply °Typically, °Looking to find parallelism...
6
ECE669 L23: Parallel Compilation April 29, 2004 Choosing a program representation... °Dataflow graph No notion of storage problem Data values flow along arcs Nodes represent operations + a0a0 b0b0 c0c0 + a2a2 b2b2 c2c2 + a5a5 b5b5 c5c5
7
ECE669 L23: Parallel Compilation April 29, 2004 Compiler representation °For certain kinds of structured programs °Unstructured programs A Data X Task A Task B LOOP LOOP nest Data array Index expressions Communication weight Array
8
ECE669 L23: Parallel Compilation April 29, 2004 Process reference graph Nodes represent threads (processes) computation Edges represent communication (memory references) Can attach weights on edges to represent volume of communication Extension: precedence relations edges can be added too Can also try to represent multiple loop produced threads as one node P0P0 P1P1 P2P2 P5P5 Memory Monolithic memory or21......Not very useful
9
ECE669 L23: Parallel Compilation April 29, 2004 Allocate data items to nodes as well Nodes: Threads, data objects Edges: Communication Key: Works for both shared-memory, object-oriented, and dataflow systems! (Msg. passing) Process communication graph P0P0 A0A0 B0B0 C0C0 d1d1 d0d0 1 1 1 P1P1 A 01 A2A2 C1C1 d3d3 d2d2 1 1 1 P2P2 B0B0 d5d5 d4d4 P5P5
10
ECE669 L23: Parallel Compilation April 29, 2004 PCG for Jacobi relaxation Fine PCG Coarse PCG 4 4 4 4 4 : Computation : Data
11
ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph
12
ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Coarse process communication graph Placement... other phases, scheduling. Dynamic? MP: Partitioning Coarse process communication graph
13
ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation °Consider loop partitioning °Create small local compilation °Consider static routing between tiles °Short neighbor-to-neighbor communication °Compiler orchestrated
14
ECE669 L23: Parallel Compilation April 29, 2004 Flow Compilation °Modulo unrolling °Partitioning °Scheduling
15
ECE669 L23: Parallel Compilation April 29, 2004 Modulo Unrolling – Smart Memory °Loop unrolling relies on dependencies °Allow maximum parallelism °Minimize communication
16
ECE669 L23: Parallel Compilation April 29, 2004 Array Partitioning – Smart Memory °Assign each line to separate memory °Consider exchange of data °Approach is scalable
17
ECE669 L23: Parallel Compilation April 29, 2004 Communication Scheduling – Smart Memory °Determine where data should be sent °Determine when data should be sent
18
ECE669 L23: Parallel Compilation April 29, 2004 Speedup for Jacobi – Smart Memory °Virtual wires indicates scheduled paths °Hard wires are dedicated paths °Hard wires require more wiring resources °RAW is a parallel processor from MIT
19
ECE669 L23: Parallel Compilation April 29, 2004 Partitioning Use heuristic for unstructured programs For structured programs......start from: Arrays List of arrays A B C L0 L1 L2 Loop Nests List of loops
20
ECE669 L23: Parallel Compilation April 29, 2004 Notion of Iteration space, data space E.g. A Matrix j i Data space Iteration space Represents a “thread” with a given value of i,j
21
ECE669 L23: Parallel Compilation April 29, 2004 °Partitioning: How to “tile” iteration for MIMD M/Cs data spaces? Notion of Iteration space, data space E.g. A Matrix j i Data space Iteration space Represents a “thread” with a given value of i,j This thread affects the above computation +
22
ECE669 L23: Parallel Compilation April 29, 2004 Loop partitioning for caches °Machine model Assume all data is in memory Minimize first-time cache fetches Ignore secondary effects such as invalidations due to writes Memory Network Cache PPP A
23
ECE669 L23: Parallel Compilation April 29, 2004 Summary °Parallel compilation often targets block based and loop based parallelism °Compilation steps address identification of parallelism and representations Graphs often useful to represent program dependencies °For static scheduling both computation and communication can be represented °Data positioning is an important for computation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.