Download presentation
Presentation is loading. Please wait.
Published byAmelia Hubbard Modified over 9 years ago
1
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors: Haitao Wei, Junqing Yu, Huafei Yu, Mingkang Qin, Guang R. Gao Chih-Sheng Lin
2
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 2
3
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 3
4
Multi-core Architectures Multi-core architectures have become the mainstream solution and industry standard from servers to desktop platforms and handheld devices ▫IBM’s Cell, Nvidia’s GPU, ICT’s Godson, MIT’s raw Multi-core processor ▫increases the computation ability ▫pushes the performance burden to the compiler and programmer to effectively exploit the coarse-grained parallelism across the cores 4
5
Stream Programming Model The stream programming model is an approach! Stream languages ▫StreamIt, Brook, CUDA, SPUR and Cg ▫are motivated by applications in media processing domains ▫are based on synchronous dataflow (SDF) or regular stream flow graphs (RSFG) 5
6
Regular Stream Flow Graph (RSFG) Node ▫a computation task (actor) ▫has an independent instruction stream and address space ▫fire repeatedly in a periodic schedule Arc(Edge) ▫the communication (flow of data) between nodes ▫through the communication channel 6
7
Software Pipelining Software pipelining ▫an efficient method to exploit the coarse-grained parallelism in stream programs ▫takes whole program as a loop and periodic schedule as iteration of the loop Stream programs can be easily and naturally mapped to communication-exposed multi-core architecture ▫but the gains through parallel execution can be overshadowed by the cost of communication and synchronization 7
8
Software Pipelining (Cont.) The performance metric of software pipelining ▫the initiation rate of successive iteration Rate optimal schedule ▫The schedule with the maximum initiation rate (minimum initiation interval) Resource limitations ▫Processor capability, the size of memory with each PE, interconnect bandwidth and direct memory access (DMA) 8
9
Goal To orchestrate an efficient software pipelining schedule which obtains optimal computation rate while minimize the communication cost and satisfying the resource constraints under the system 9
10
CMRO and ROMC CMRO (Communication Minimized Rate- Optimal) ▫minimizes the communication cost at optimal computation rate ▫formulated as an unified Integer Linear Programming (ILP) problem ROMC (Rate-Optimal with Memory Constraints) ▫formulated as an unified integer quadratic programming problem ▫transformed to an ILP problem by using stage adjustment optimization 10
11
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 11
12
DFBrook Steam Language DFBrook: extension of Brook for SDF 12
13
Target Architecture – Godson-T Communication exposed multi-core platform 13
14
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 14
15
CMRO Schedule – Problem Definition 15
16
CMRO Schedule – Problem Definition (Cont.) 16
17
Example of Stream Graph and DDG Stream Graph Data Dependency Graph 17
18
CMRO Problem 18
19
Continued with the previous example SGMS (Stream Graph Modulo Schedule) ▫lacks the consideration of communication 19
20
Continued with the previous example CMRO 20
21
ILP Formulation - Space 21
22
ILP Formulation - Space(Cont.) 22
23
ILP Formulation - Space(Cont.) 23
24
ILP Formulation - Space(Cont.) 24
25
ILP Formulation - Time 25
26
ILP Formulation – Time(Cont.) 26
27
ILP Formulation for CMRO Problem 27
28
Rate-Optimal Schedule with Memory Constraints (ROMC) 28
29
ROMC(Cont.) Considerations ▫All the buffers used for an instance are allocated statically in the memory of the processor where the instance is assigned to ▫In the software pipelining schedule, multiple buffers are introduced to keep up with the distance in the stages between two connected instances 29
30
Example of Buffer Allocation Schemes 30
31
ROMC(Cont.) 31
32
Solving ROMC Problem 32
33
Solving ROMC Problem 33
34
Stage Assignment and Adjustment Optimization Process 34
35
Stage Assignment and Adjustment Optimization Process(Cont.) 35 Key: The stage of DMA-node can be adjusted to reduced the buffer usage of victim processors
36
Buffer Usage Calculation 36 The number of input buffers in each PE’s memory
37
Buffer Usage Calculation(Cont.) 37 The number of output buffers in each PE’s memory
38
Stage Adjustment Optimization 38
39
Stage Adjustment Optimization(Cont.) 39
40
Stage Adjustment Optimization(Cont.) 40
41
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 41
42
Experiment Infrastructure and Methodology Scheduler ▫implemented by DFBrook to generate codes for the software pipelining schedules Experimental Platform ▫Godson-T Architecture Simulator Solving ILPs ▫Commercial program CPLEX 42
43
Comparison 43
44
Comparison(Cont.) 44
45
ROMC Schedule Performance Number of processors = 9 MinMem = 16KB for all benchmarks MaxMem = 512KB for imgsmth, Gauss and aveMotion; 32KB for others 45
46
ROMC vs Conservative Estimate Method (CEM) *: both of the two schedulers can find a feasible solution +: only ROMC finds a solution while the solution by CEM is unable to meet the memory constraints 46
47
Scalability (over single processor) 47
48
ROMC ILP Solving Time (in CPU seconds) In 70% of the cases, ROMC scheduler can obtain an optimal solution in less than 6 minutes 48
49
CMRO ILP Solving Time 49
50
CMRO Performance Improvement 50
51
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 51
52
Related Works The schedule of stream graph ▫Ptolemy: model of computation and scheduling on SDF ▫Regular Stream Flow Graph (RSFG) can be statically schedule at compiler time Stream compilation ▫Coarse-grained task, data, pipeline parallelism have been exploited for StreamIt on raw architecture 52
53
Related Works(Cont.) Software pipelining is a well-known technique for loop optimization and recently used to used to schedule stream programs ▫LP formulation for min buffer requirements of rate optimal software pipelining of RSFGs SGMS for StreamIt applications on multi-core architecture ▫focused on the balance of work partition but lack considering the cost of communication 53
54
Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 54
55
Conclusion A unified ILP formulation that combines the requirement of rate-optimal software pipelining and the min inter-core communication overhead Consideration of memory constraints Implementation on DFBrook language and Godson-T architecture Good performance improvement comparing with other schedules 55
56
Thanks for your listening~ 56
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.