Download presentation
Presentation is loading. Please wait.
Published byAusten Andrews Modified over 9 years ago
1
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005
2
2 Motivation Emergence of pipelined and parallel computing in multiprocessor systems. VLSI technologies with extensive parallel and pipelining computation capabilities have triggered the algorithm implementations directly on FPGA and ASIC. Performance impacts: –Number and type of computing resources –Interconnection and communication mechanisms –Partition among resources
3
3 Partitioning Definition: divide processes/data among computing resources such that critical processes / larger data sets are implemented on faster resources, while the less critical processes / smaller data sets implemented on slower and cheaper resources. Goal: optimizing the overall performance corresponding to resource constraints such as speed, power, and area. Spatial partitioning and temporal partitioning
4
4 Partition Scheme
5
5 Dependency Chain structure –Computation are divided into several stages, each of which perform one operation of the whole procedure. –Partition task: keep the processing loads of all stages roughly equal. –Serial computing and pipelined computing Tree structure –Independent branches are sent to multiple resources in homogenous or heterogeneous computing environments. Each resource performs operations at the same time. –Parallel computing and distributed computing
6
6 Combinations Chain-structured pipelined programs over chain- connected systems Multiple chain-structured parallel and pipelined programs over single-host multiple-satellite systems Multiple arbitrarily structured serial programs over single- host multiple-satellite system Single-tree structured parallel programs over single-host multiple identical satellites systems
7
7 Chain structure Partition principles –Challenge: –Load balance –Shorten the execution time of the bottleneck stage –Minimize overall computing time –Computing –Communication
8
8 Modeling – Layered Graph Definition –Node : sub-process – connects to –Number of node: m: process n:resource –Weight –Node: computing time –Edge: communication time
9
9 Minimum Bottleneck Path Pre-definitions: –Label L(i) for each node i to record the minimum, over all paths ending at i, of the maximum edge (bottleneck) on each path at any stage. –Label W(e) as the weight for the edge e connecting node a (above) and b (below). For each node b, replace L(b) by Complexity
10
10 Dynamic Programming Pre-definition – : time required to run process i on resource j – : amount of data to be transmitted from i to (i+1) – : transmission speed between j and (j+1) –The total time on j : (k to i are assigned to resource j) – : time consumed by bottleneck resource given the first i are assigned to the first j resources. – : pointer to the first process on the resource, given the first i are assigned to the first j resources. – Objective: minimize
11
11 Algorithm 1. Initialization where 2. First resource Let and for 3. Recursion (Bellman equation) where k is the smallest index holding the above equality
12
12 Continue … 4. Assignment –Assign processes to resource n –Set and decrease n by 1 –Iterate until all processes are assigned. Complexity –Step1: –Step2: –Step3: –Step4:
13
13 Modeling - Doubly Weighted Graph Definition: –Each edge e: – : sum weight – : bottleneck weight –Path (P): the processing flow
14
14 Sum-Bottleneck (SB) Path Pre-definition: –S(P): sum weight of path P –B(P): the bottleneck weight Algorithm: search for the optimal path between two nodes such that the sum-bottleneck (SB) weight is minimum. : sum-bottleneck weight of path P Optimal SB path has weight 10.
15
15 Optimal SB Path Algorithm Algorithm: bottleneck weight 1.Remove all edges with 2.Delete all remaining weights. DWG has been transformed into an ordinary weighted graph 3.If there is a path between s and t, then return the shortest path P between s and t. Using Dijkstra’s algorithm to find the shortest path P. Complexity:
16
16 Tree structure Current process depends only on their parent process. Independence between children processes permits the burden distribution from single resource to multi- resources.
17
17 Modeling – Directed Dual Graph Direction: from lower ordered node to higher
18
18 : execution time on host : execution time on slave (all slaves are similar) : inter-processor communication (communication between parent to grandparent)
19
19 Objective: Finding the optimal assignment from A to H. Algorithm: Optimal SB path Complexity analysis –Multi-graph: multiple edge connecting the same pair –Given m nodes, there are m edges
20
20 Driving force Data-driven –How to divide data sets into different sizes for multiple computing resources –How to coordinate data flows along different directions such that brings appropriate data to the suitable processors at the right time. Function-driven –How to perform different functions of one task on different computing resources at the same time.
21
21 Resource constraints Multi-processor –Homogenous system –Heterogeneous system Hardware/software (HW/SW) co-processing –Process scheduling VLSI –the communication time is ignorable –Capacity –RAM size for image processing –Computation complexity affects both design size and delay. Reducing computation complexity is necessary. –Decrease image size –Reduce RAM size, therefore matching FPGA capacity constraint –Reduce clock cycle numbers of sub-processes, and increase overall processing frequency
22
22 Analysis Tools DWG DAG (Directed Acyclic Graph) CDFG (Control and Data Flow Graph) Petri Net PN := (P, T, I, O, M) P - places; T – transitions; I - input arc connections between places and transitions; O - output arc connections between transitions and places; M – the number of tokens.
23
23 To be continued …… Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.