Download presentation
Presentation is loading. Please wait.
Published byHelena Dean Modified over 8 years ago
1
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea Richa Arizona State University
2
Agenda Network Processor (NP) System Resource Mapping and Scheduling Problem Heuristic Approach –Linear Programming and Randomized Rounding Resource Contention Issue –Detection and Elimination Experimental Results Summary and Future Work
3
Network Processor Systems Programmable devices designed to process packets at wire-speed Non-homogeneous real-time systems Comprise of a mix of ASICs, programmable processors and on-chip interconnects Optimized to support multiple applications such as IPv4, Diffserv, etc.
4
Resource Mapping and Scheduling Problem in NP Given a set APP={APP 1, APP 2, …,APP k } of applications each specified by a DAG, where each application APP j has a set of constraints (e.g. timing constraints, area constraints etc.), find the mapping that minimize the system cost in terms of dollar value while satisfying all the design constraints Assuming only one application active at any given time
5
System Specification Possible Task-to-Resource Mappings Several algorithms may be available for execution of a task Associated with each resource are cost and area parameters There may be multiple instances of a resource
6
Integer Linear Programming (ILP) formulation Objective: –Find a task-to-resource mapping with minimum cost Constraints: –Board area constraint –Timing constraint –Unique task constraint –Exclusive resource constraint –Communication delay constraint –Task-to-Resource mapping constraint –Task dependency constraint Example design problem with 3-flows: –800 variables –2000 constraints
7
Heuristic Approach-- Randomized Rounding Based on Linear Programming solution Traditional evolutionary algorithms require a set of feasible solutions as a starting point, i.e. Genetic Algorithms, Simulated Annealing –Hard to obtain an initial feasible set due to the conflicting constraints (area, time) in the problem
8
Randomized Rounding Relax integrality constraints of the ILP and solve the LP Fractional values of the binary variables used as probabilities for rounding them to either 0 or 1 Variable Randomized Rounding –Randomly select variables from a set of randomly chosen constraints –Round the selected variables Iterative rounding in case of constraint violation
9
Randomized Rounding (cont.) Fixing Variables –Reducing the number of variable to be rounded –Fix variable with integer values after solving LP –Iteratively solve LP till the number of integer variables does not increase Grouping variables –Assign priority based on the variable group affiliation
10
Randomized Rounding (cont.) Rollback Point Selection – Roll back only to the last group where constraint violation occurred Rounding Step Size –Round one or more each time?
11
Randomized Rounding Results Near-optimal solution in a fraction of ILP solution time
12
Exploration of Solution Space If the deadline constraint is too strict, the ILP may not have any feasible solution for the existing set of resources. On the other hand, with a too relaxed deadline feasible solution will be obtained with increased chance of resource contention. Solution space is explored using binary search in order to find a least cost feasible solution without any resource contention.
13
Improvement of Solution Relaxed deadline for packet processing helps to reduce the system cost in dollar value. Packet latency is increased, while satisfying the line speed. This approach allows multiple packets to be inside the system simultaneously (packet level parallelism). There may be resource contention if more than one packet try to access the same resource at the same instance of time for two different tasks.
14
Resource Contention Example: –Line rate = 10Gbps, Packet size = 64 bytes –No Packet Gap –Packet arrives every 51ns
15
Resource Contention Detection Packet Flow Graph (PFG) –This is visual depiction of the flow of packets through various resources inside NP system –G(V, E): V is the set the of resources allocated by the ILP, with additional entry and exit nodes, s and t, respectively. –Edge e = (u, v) ε E, if resource u and v are sequentially allocated. –Weight w(e) is associated with edge e: w(e) = (x(e), y(e)); where x(e) is the allocation sequence of the resource and y(e) is the execution time on that sequence.
16
Resource Contention Detection Resource Cycle Time –Calculation in PFG –It is defined as the maximum time span for which a resource is busy in executing the set of tasks for a packet. –Resource is not available until it finishes all the tasks for a packet scheduled on it Maximum Cycle Time: –It is defined as the maximum of all resource cycle times. Resource contention is detected if maximum cycle time is greater than packet arrival rate. Gantt chart is used to detect resource contention among multiple paths in a task graph
17
Resource Contention (Single Path) Example:
18
Resource Contention (Multiple Paths)
19
Resource Contention Elimination Binary search approach to speed up the exploration of solution space iteratively. Solution found by ILP is scrutinized for resource contention. –If there is no resource contention, no more work needed. –search iteratively for least cost feasible solution otherwise
20
Resource Contention Elimination d is the arrival rate of the packets and l is the maximum diameter of the flow graphs
21
Experimental Settings Codesign method applied to a Packet Processing System similar to the Intel IXP2400 network processor –Resource set derived from Intel IXP2400 architecture –Application set derived from the standard benchmarking applications defined by the Network Processing Forum, for which there is a mapping available from Intel Compared performance of the mapping generated by our approach with the standard mapping specified by Intel as part of the IXA Application Framework
22
Performance Metrics End-to-end Packet Latency Defined as the time interval starting when the first bit of a packet enters the input port and ending when the first bit of the packet reaches the output port Throughput The number of data bits transferred in unit time. Measured at 0% packet loss while varying packet size Resource Utilization The ratio of the time a resource was active and the total measurement time
23
Input Task Graphs
24
Experimental Parameters Input:
25
Experimental Results Output:
26
Experimental Results
28
Conclusion and Future Work Codesign framework for PPSs with consideration of multiple flows and real-time constraints The iterative improvement scheme introduces packet-level parallelism into the system For task graphs of the benchmark applications, the method produces solution in a small time and shows performance metrics comparable to the existing PPSs The framework can be extended with: –An object-oriented or modeling language for specification –Effects of caching and multithreading –Dynamic analysis for workload characterization
29
Thank You Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.