A R EVISIT TO THE P RIMAL -D UAL B ASED C LOCK S KEW S CHEDULING A LGORITHM Min Ni and Seda Ogrenci Memik EECS Department, Northwestern University
A GENDA Introduction Related Work The Primal-Dual Algorithm The existing primal-dual approach Our enhanced implementation Experimental Results Conclusion
I NTRODUCTION The Problem of Clock Skew Scheduling constraint graph MINIMIZE P
R ELATED W ORK Existing Approaches for Solving Clock Skew Scheduling Linear programming Binary search with iterative shortest path problem O(|V||E|log( C / n )) Primal-dual based algorithm (Burns) O(|V|^2|E|)
T HE P RIMAL -D UAL A PPROACH Theory of the Primal-Dual Algorithm Complementary slackness theorem: starting from feasible solution of PRIMAL, find feasible solution of DUAL, they can be optimal if certain conditions are met. dual variables Primal variables
P RIMAL -D UAL A PPROACH The complementary slackness conditions General format: variable times constraints Starting from a feasible solution {Li, P}, if we can also find feasible solution { }to the above system of linear equations, the feasible solution is optimal. If > 0, then must be zero, those = 0 are called admissible edge.
R ESTRICTED D UAL P ROBLEM Solve the system of linear equations on only admissible edges This is equivalent to solving the following restricted dual problem If minimum is 0, then we are done. However, it is still not straightforward to solve because it is on dual variables
R ESTRICTED P RIMAL P ROBLEM Check on the Restricted Primal Problem It can be proved that this problem has an optimal solution 0 if there exists a cycle on the admissible graph G a (consisting of admissible edges only).
P RIMAL -D UAL A LGORITHM Starting from an empty admissible graph, incrementally reduce the clock period value until a cycle emerges in the admissible graph. The effect of reducing P is that more edges become admissible and those are inserted into admissible graph G a. Two main tasks in while loop: 1.Find THETA; 2.Maintain G a;
PRIMAL-DUAL BURNS IMPLEMENTATION A different strategy for maintaining the admissible graph G a and updating THETA values results in different efficiency.
A N E XAMPLE 5 iterations to find the minimum clock period P by updating admissible graph and theta value; edge becomes admissible Theta value skew
E NHANCED IMPLEMENTATION Two major sources of overhead in the existing implementation Scan through all edges (|E|) in the graph to create admissible graph G a from scratch in each iteration; Calculate theta values for all edges (|E|) in the graph and find the minimum one;
M AINTAINING ADMISSIBLE GRAPH Theorem: If exactly one minimum theta value edge ( i, j ) is added into the admissible graph G a, then G a is a forest until a cycle is generated. Add new admissible edge and remove edges becoming non- admissible; No need for calling negative cycle detection routine, maintaining a parent list instead; Complexity is |V| compared with the same step in Burns implementation |E|;
EFFICIENT CALCULATION OF THETA Similar to Dijkstras shortest path algorithm, a set of edges are maintained as candidates of shortest path tree edges; In our problem, we need to find minimum theta edge to add into G a ; In Burns implementation, all edges are scanned during each iteration; theta values are recalculated for all edges; We maintain a much smaller set of candidates in heap ; theta values are only recalculated for a subset of this small candidate set. O(logV) for maintaining the heap;
A SYMPTOTIC RUNTIME IMPROVEMENT Our implementation has an asymptotic runtime of ; while it is for Burns implementation; Very similar to the improvement from Bellman-Ford algorithm ( )to Dijikstras ( ) algorithm for shortest path problem.
E XPERIMENTAL SETUP Benchmark circuits ISCAS89 large circuit ITC99 Delay data Resynthesis in Synopsys Design Compiler (VHDL) Delay is exported from Standard Delay Format (SDF) file Comparison between Burns and ours Same graph data structure Same graph manipulating subroutines Same routine for calculating theta values
E XPERIMENTAL R ESULTS
CONCLUSIONS A much more efficient primal-dual based algorithm to improve the runtime efficiency of Burns implementation of the primal-dual algorithm Superior in both theoretical and practical runtime efficiency On average 95X speed up on 20 test circuits