Download presentation
Presentation is loading. Please wait.
Published byOwen Terry Modified over 9 years ago
1
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical University of Athens School of Electrical and Computer Engineering Computing Systems Laboratory
2
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Previous work M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, "Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces", SuperComputing Conference on High Performance Networking and Computing (SC2002), Baltimore, Maryland, November 16-22, 2002. G. Goumas, A.Sotiropoulos and N. Koziris, "Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping," Proceedings of the 2001 International Parallel and Distributed Processing Symposium (IPDPS2001), IEEE Press, San Francisco, California, April 2001.
3
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview Tiling for parallelization Non-overlapping vs. Overlapping execution scheme Grouping Application on a cluster of SMPs with a fixed number of nodes Experimental-Simulation Results
4
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Nested For-Loops for (i 1 =l 1 ; i 1 <=u 1 ; i 1 ++) for (i 2 =l 2 ; i 2 <=u 2 ; i 2 ++) … … … … … for (i n =l n ; i n <=u n ; i n ++) { Loop Body }
5
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Dependence Vectors i2i2 i1i1 for (i 1 =0; i 1 <=7; i 1 ++) for (i 2 =0; i 2 <=7; i 2 ++) A[i,j]=A[i-1,j]+A[i,j-1]
6
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Tiling i2i2 i1i1
7
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Tiling i2i2 i1i1 Processor 0 Processor 1
8
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview Tiling for parallelization Non-overlapping vs. Overlapping execution scheme Grouping Application on a cluster of SMPs with a fixed number of nodes Experimental-Simulation Results
9
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Non-Overlapping Scheme i2i2 i1i1 Processor 0 Processor 1 Processor 2
10
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Non-Overlapping vs. Overlapping Scheme P0 P1 P2 P3 P0 P1 P2 P3
11
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overlapping Scheme i2i2 i1i1 Processor 0 Processor 1 Processor 2
12
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview Tiling for parallelization Non-overlapping vs. Overlapping execution scheme Grouping Application on a cluster of SMPs with a fixed number of nodes Experimental-Simulation Results
13
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Generalization to SMPs – “Grouping” SMP0 SMP1 SMP2 SMP3 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1
14
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Example: Grouping + Non overlapping Communication Scheme Tile Space Group Space SMP node0 SMP node1 Scheduling vector Π=(1,0)
15
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Example: Grouping + Overlapping Communication Scheme Tile Space Group Space SMP node0 SMP node1 Scheduling vector Π=(1,1)
16
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview Tiling for parallelization Non-overlapping vs. Overlapping execution scheme Grouping Application on a cluster of SMPs with a fixed number of nodes Experimental-Simulation Results
17
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs Dynamic Scheduling by the Operating System Run time overhead for generating a lot of processes Context switching slows down the execution Static Scheduling at Compile Time
18
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs Cyclic Assignment Schedule Mirror Assignment Schedule Cluster Assignment Schedule Retiling
19
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1
20
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 SMP0 SMP1 chunk
21
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment – Non Overlapping Communication CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 t
22
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Overlapping Communication Cyclic assignment on 2 SMP nodes with 2 CPUs each t CPU0 CPU1 CPU0 CPU1 SMP0 SMP1
23
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Communication CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 SMP0 SMP1 chunk
24
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs Cyclic Assignment Schedule Mirror Assignment Schedule Cluster Assignment Schedule Retiling
25
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 Mirror assignment on 2 SMP nodes with 2 CPUs each SMP1 SMP0 chunk
26
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment – Non Overlapping Communication Mirror assignment on 2 SMP nodes with 2 CPUs each CPU0 CPU1 CPU0 CPU1 SMP0 SMP1 t
27
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment - Overlapping Communication Mirror assignment on 2 SMP nodes with 2 CPUs each t CPU0 CPU1 CPU0 CPU1 SMP0 SMP1
28
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment - Communication SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 Mirror assignment on 2 SMP nodes with 2 CPUs each SMP1 SMP0
29
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs Cyclic Assignment Schedule Mirror Assignment Schedule Cluster Assignment Schedule Retiling
30
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 tiles “TILE”
31
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 TILES GROUPS
32
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment – Non Overlapping Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t
33
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment – Overlapping Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t
34
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment - Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 TILES GROUPS
35
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs Cyclic Assignment Schedule Mirror Assignment Schedule Cluster Assignment Schedule Retiling
36
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 old tiles new tiles
37
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 old tiles new tiles retaining computation volume of a tile
38
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling – Non Overlapping Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t
39
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling –Overlapping Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t
40
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling - Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1
41
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview Tiling for parallelization Non-overlapping vs. Overlapping execution scheme Grouping Application on a cluster of SMPs with a fixed number of nodes Experimental-Simulation Results
42
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Experimental Platform Linux SMP (Symmetric Multi- Processors) Cluster 2 nodes 1GB RAM 2 Pentium III 1266MHz Myrinet high performance interconnect GM low level message passing system
43
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes The Myrinet interconnect User-level Networking Based on the GM message passing interface All message exchange using DMA Directly to/from pinned userspace buffers Communication is offloaded to the NIC Programmable NIC LANai RISC processor @ 133-333MHz 2-8MB SRAM 2+2Gbps full duplex fiber links
44
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes GM Architecture Comprised of three main parts User library Kernel driver Firmware on NIC OS bypass design Regions of NIC memory mapped to the VM of a process GM Library Application GM kernel module GM firmware User Kernel NIC
45
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Sending and Receiving messages over Myrinet/GM Sending application Host NIC Send q Send DMARecv DMA Host DMA LANai Receiving application Host NIC Recv q Send DMARecv DMA Host DMA LANai BufferEvent qBufferEvent q
46
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Initial Code for (i=1; i<=X; i++) for (j=1; j<=Y; j++) for (k=1; k<=Z; k++) { A[i][j][k] = func(A[i-1][j][k], A[i][j-1][k], A[i][j][k-1]) }
47
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes cyclic mirror cluster retile cyclic mirror cluster retile Experimental results 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 500 1000 1500 2000 2500 3000 3500 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 500 1000 1500 2000 2500 3000 3500 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme
48
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Simulation results mirror cyclic retile 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme cluster mirror 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme retile cluster cyclic
49
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Simulation results retile cluster cyclic mirror 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme mirror cluster retile 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme cyclic
50
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Advantages - Disadvantages AdvantagesDisadvantages cyclic + fast pipeline filling- communication mirror + better communication than cyclic - idle time steps - worse communication than cluster, retile cluster + communication: 1) little volume of data to be transferred 2) data combined in fewer messages - slow pipeline filling retile + fast pipeline filling + communication: little volume of data to be transfered - reorganizes tiles annuls optimal tile shape for cache hits
51
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes The End
52
National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Overlapping Communication SMP0 SMP1 SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 equivalent schedulings P t scheduling on a fixed number of processors empty pipeline waiting for the necessary data to become available t P scheduling on an unlimited number of processors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.