All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004.

All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004

Outline Review of graph theory Problem definition Sequential algorithms Properties of interest Parallel algorithm Analysis Recent research References

Graph Terminology G = (V, E) W = weight matrix –w ij = weight/length of edge (v i, v j ) –w ij = ∞ if v i and v j are not connected by an edge –w ii = 0 Assume W has positive, 0, and negative values For this problem, we cannot have a negative-sum cycle in G

Weighted Graph and Weight Matrix v3v3 v2v2 v0v0 v1v1 v4v4 1 2 5 7 6 9 -4 3 0 1 2 3 4

Directed Weighted Graph and Weight Matrix v4v4 v2v2 v0v0 v3v3 v5v5 5 -2 9 4 3 1 2 0 1 2 3 4 5 v1v1 7 6

All-Pairs Shortest Paths Problem Defined For every pair of vertices v i and v j in V, it is required to find the length of the shortest path from v i to v j along edges in E. Specifically, a matrix D is to be constructed such that d ij is the length of the shortest path from v i to v j in G, for all i and j. Length of a path (or cycle) is the sum of the lengths (weights) of the edges forming it.

Sample Shortest Path v4v4 v2v2 v0v0 v3v3 v5v5 5 -2 9 4 3 1 2 v1v1 7 6 Shortest path from v 0 to v 4 is along edges (v 0, v 1 ), (v 1, v 2 ), (v 2, v 4 ) and has length 6

Disallowing Negative-length Cycles APSP does not allow for input to contain negative-length cycles This is necessary because: –If such a cycle were to exist within a path from v i to v j, then one could traverse this cycle indefinitely, producing paths of ever shorter lengths from v i to v j. If a negative-length cycle exists, then all paths which contain this cycle would have a length of -∞.

Recent Work on Sequential Algorithms Floyd-Warshall algorithm is Θ(V 3 ) –Appropriate for dense graphs: |E| = O(|V| 2 ) Johnson’s algorithm –Appropriate for sparse graphs: |E| = O(|V|) –O(V 2 log V + V E) if using a Fibonacci heap –O(V E log V) if using binary min-heap Shoshan and Zwick (1999) –Integer edge weights in {1, 2, …, W} –O(W V ω p(V W)) where ω ≤ 2.376 and p is a polylog function Pettie (2002) –Allows real-weighted edges –O(V 2 log log V + V E) Strassen’s Algorithm (matrix multiplication)

Properties of Interest Let denote the length of the shortest path from v i to v j that goes through at most k - 1 intermediate vertices (k hops) = w ij (edge length from v i to v j ) If i ≠ j and there is no edge from v i to v j, then Also, Given that there are no negative weighted cycles in G, there is no advantage in visiting any vertex more than once in the shortest path from v i to v j. Since there are only n vertices in G,

Guaranteeing Shortest Paths If the shortest path from v i to v j contains v r and v s (where v r precedes v s ) The path from v r to v s must be minimal (or it wouldn’t exist in the shortest path) Thus, to obtain the shortest path from v i to v j, we can compute all combinations of optimal sub-paths (whose concatenation is a path from v i to v j ), and then select the shortest one vivi vsvs vjvj MIN vrvr ∑ MINs

Iteratively Building Shortest Paths vivi vjvj v1v1 w 1j v2v2 w 2j vnvn w nj …

Recurrence Definition For k > 1, Guarantees O(log k) steps to calculate vivi vlvl vjvj ≤ k/2 vertices ≤ k vertices MIN

Similarity

Computing D Let D k = matrix with entries d ij for 0 ≤ i, j ≤ n - 1. Given D 1, compute D 2, D 4, …, D m – D = D m To calculate D k from D k/2, use special form of matrix multiplication –‘  ’ → ‘  ’ –‘  ’ → ‘min’

“Modified” Matrix Multiplication Step 2: for r = 0 to N – 1 dopar C r = A r + B r end Step 3: for m = 2q to 3q – 1 do for all r  N (r m = 0) dopar C r = min(C r, C r(m) )

“Modified” Example 2323 1 1 -2 2 -4 4 -3 3 3 -2 4 -4 P 100 P 101 P 000 P 001 P 110 P 111 P 010 P 011 From 9.2, after step (1.3)

“Modified” Example (step 2) P 100 P 101 P 000 P 001 P 110 P 111 P 010 P 011 From 9.2, after modified step 2 5-2 10 0 21

“Modified” Example (step 3) P 100 P 101 P 000 P 001 P 110 P 111 P 010 P 011 From 9.2, after modified step 3 0-2 10 MIN

Hypercube Setup Begin with a hypercube of n 3 processors –Each has registers A, B, and C –Arrange them in an n  n  n array (cube) Set A(0, j, k) = w jk for 0 ≤ j, k ≤ n – 1 –i.e processors in positions (0, j, k) contain D 1 = W When done, C(0, j, k) contains APSP = D m

Setup Example 0 1 2 3 4 5 D 1 = Wjk = A(0, j, k) = v4v4 v2v2 v0v0 v3v3 v5v5 5 -2 9 4 3 1 2 v1v1 7 6

APSP Parallel Algorithm Algorithm HYPERCUBE SHORTEST PATH (A,C) Step 1: for j = 0 to n - 1 dopar for k = 0 to n - 1 dopar B(0, j, k) = A(0, j, k) end for Step 2:for i = 1 to do (2.1) HYPERCUBE MATRIX MULTIPLICATION(A,B,C) (2.2) for j = 0 to n - 1 dopar for k = 0 to n - 1 dopar (i) A(0, j, k) = C(0, j, k) (ii) B(0, j, k) = C(0, j, k) end for

An Example 0 1 2 3 4 5 D 1 =D 2 = D 4 = 0 1 2 3 4 5 D 8 = 0 1 2 3 4 5

Analysis Steps 1 and (2.2) require constant time There are iterations of Step (2.1) –Each requires O(log n) time The overall running time is t(n) = O(log 2 n) p(n) = n 3 Cost is c(n) = p(n) t(n) = O(n 3 log 2 n) Efficiency is

Recent Research Jenq and Sahni (1987) compared various parallel algorithms for solving APSP empirically Kumar and Singh (1991) used the isoefficiency metric (developed by Kumar and Rao) to analyze the scalability of parallel APSP algorithms –Hardware vs. scalability –Memory vs. scalability

Isoefficiency For “scalable” algorithms (efficiency increases monotonically as p remains constant and problem size increases), efficiency can be maintained for increasing processors provided that the problem size also increases Relates the problem size to the number of processors necessary for an increase in speedup in proportion to the number of processors used

Isoefficiency (cont) Given an architecture, defines the “degree of scalability” Tells us the required growth in problem size to be able to efficiently utilize an increasing number of processors Ex: Given an isoefficiency of kp 3 If p 0 and w 0, speedup = 0.8p 0 (efficiency = 0.8) If p 1 = 2p 0, to maintain efficiency of 0.8 w 1 = 2 3 w 0 = 8w 0 Indicates the superiority of one algorithm over another only when problem sizes are increased in the range between the two isoefficiency functions

Isoefficiency (cont) Given an architecture, defines the “degree of scalability” Tells us the required growth in problem size to be able to efficiently utilize an increasing number of processors Ex: Given an isoefficiency of kp 3 If p 0 and w 0, speedup = 0.8p 0 (efficiency = 0.8) If w 1 = 2w 0, to maintain efficiency of 0.8 p 1 = 2 3 w 0 = 8w 0 Indicates the superiority of one algorithm over another only when problem sizes are increased in the range between the two isoefficiency functions Given an isoefficiency of kp 3 If p 0 and w 0, speedup = 0.8p 0 (efficiency = 0.8) If p 1 = 2p 0, to maintain efficiency of 0.8 w 1 = 2 3 w 0 = 8w 0

Memory Overhead Factor (MOF) Ratio: Total memory required for all processors Memory required for the same problems size on single processor We’d like this to be lower!

Architectures Discussed Shared Memory (CREW) Hypercube (Cube) Mesh Mesh with Cut-Through Routing Mesh with Cut-Through and Multicast Routing Also examined fast and slow communication technologies

Parallel APSP Algorithms Floyd Checkerboard Floyd Pipelined Checkerboard Floyd Striped Dijkstra Source-Partition Dijkstra Source-Parallel

General Parallel Algorithm (Floyd) Repeat steps 1 through 4 for k := 1 to n Step 1: If this processor has a segment of P k-1 [*,k], then transmit it to all processors that need it Step 2: If this processor has a segment of P k-1 [k,*], then transmit it to all processors that need it Step 3: Wait until the needed segments of P k-1 [*,k] and P k-1 [k,*] have been received Step 4: For all i, j in this processor’s partition, compute P k [i,j] := min {P k-1 [i,j], P k-1 [i,k] + P k-1 [k,j]}

Floyd Checkerboard Each “cell” is assigned to a different processor, and this processor is responsible for updating the cost matrix values at each iteration of the Floyd algorithm. Steps 1 and 2 of the GPF involve each of the processors sending their data to the “neighbor” columns and rows.

Floyd Pipelined Checkerboard Similar to the preceding. Steps 1 and 2 of the GPF involve each of the processors sending their data to the “neighbor” columns and rows. The difference is that the processors are not synchronized and compute and send data ASAP (or sends as soon as it receives).

Floyd Striped Each “column” is assigned a different processor, and this processor is responsible for updating the cost matrix values at each iteration of the Floyd algorithm. Step 1 of the GPF involves each of the processors sending their data to the “neighbor” columns. Step 2 is not needed (since the column is contained within the processor).

Dijkstra Source-Partition Assumes Dijkstra’s Single-source Shortest Path is equally distributed over p processors and executed in parallel Processor p finds shortest paths from each vertex in it’s set to all other vertices in the graph Fortunately, this approach involves no inter-processor communication Unfortunately, only n processors can be kept busy Also, memory overhead is high since each processors has a copy of the weight matrix

Dijkstra’s Source-Parallel Motivated by keeping more processors busy Run n copies of the Dijkstra’s SSP –Each copy runs on processors (p > n)

Calculating Isoefficiency Example: Floyd Checkerboard At most n 2 processors can be kept busy n must grow as Θ(√p) due to problem structure By Floyd (sequential), T e = Θ(n 3 ) Thus isoefficiency is √(p 3 ) = Θ(p 1.5 ) But what about communication…

Calculating Isoefficiency (cont) t s = message startup time t w = per-word communication time t c = time to compute next iteration value for one cell in matrix m = number words sent d = number hops between nodes Hypercube: –(t s + t w m) log d = time to deliver m words –2 (t s + t w m) log p = barrier synchronization time (up & down “tree”) –d = √p –Step 1 = (t s + t w n/√p) log √p –Step 2 = (t s + t w n/√p) log √p –Step 3 (barrier synch) = 2(t s + t w ) log p –Step 4 = t c n 2 /p Isoefficiency = Θ(p 1.5 (log p) 3 )

Mathematical Details How are n and p related?

Mathematical Details

Calculating Isoefficiency (cont) t s = message startup time t w = per-word communication time t c = time to compute next iteration value for one cell in matrix m = number words sent d = number hops between nodes Mesh: –Step 1 = –Step 2 = –Step 3 (barrier synch) = –Step 4 = T e Isoefficiency = Θ(p 3 +p 2.25 ) = Θ(p 3 )

Isoefficiency and MOF for Algorithm & Architecture Combinations Base AlgorithmParallel VariantArchitectureIsoefficiencyMOF DijkstraSource- Partitioned SM, Cube, Mesh, Mesh-CT, Mesh-CT-MC p3p3 p DijkstraSource-ParallelSM, Cube(p log p) 1.5 n Mesh, Mesh-CT Mesh-CT-MC p 1.8 n FloydStripeSMp3p3 1 Cube(p log p) 3 1 Meshp 4.5 1 Mesh-CT(p log p) 3 1 Mesh-CT-MCp3p3 1 FloydCheckerboardSMp 1.5 1 Cubep 1.5 (log p) 3 1 Meshp3p3 1 Mesh-CTp 2.25 1 Mesh-CT-MCp 2.25 1 FloydPipelined Checkerboard SM, Cube, Mesh, Mesh-CT, Mesh-CT-MC p 1.5 1

Comparing Metrics We’ve used “cost” previously this semester (cost = p T p ) But notice that the cost of all of the architecture- algorithm combinations discussed here is Θ(n 3 ) Clearly some are more scalable than others Thus isoefficiency is a useful metric when analyzing algorithms and architectures

References Akl S. G. Parallel Computation: Models and Methods. Prentice Hall, Upper Saddle River NJ, pp. 381-384,1997. Cormen T. H., Leiserson C. E., Rivest R. L., and Stein C. Introduction to Algorithms (2 nd Edition). The MIT Press, Cambridge MA, pp. 620-642, 2001. Jenq J. and Sahni S. All Pairs Shortest Path on a Hypercube Multiprocessor. In International Conference on Parallel Processing. pp. 713-716, 1987. Kumar V. and Singh V. Scalability of Parallel Algorithms for the All Pairs Shortest Path Problem. Journal of Parallel and Distributed Computing, vol. 13, no. 2, Academic Press, San Diego CA, pp. 124- 138, 1991. Pettie S. A Faster All-pairs Shortest Path Algorithm for Real- weighted Sparse Graphs. In Proc. 29th Int'l Colloq. on Automata, Languages, and Programming (ICALP'02), LNCS vol. 2380, pp. 85- 97, 2002.

All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004.

Similar presentations

Presentation on theme: "All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004.

Similar presentations

Presentation on theme: "All-Pairs Shortest Paths Csc8530 – Dr. Prasad Jon A Preston March 17, 2004."— Presentation transcript:

Similar presentations

About project

Feedback