CS717 Fault-Tolerant Routing: A Genetic Algorithm and CJC Arjun Rao CS 717 November 18, 2004
CS717 Next Paper [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies” [2] Loh, Shaw., “A Genetic-Based Fault- Tolerant Routing Strategy for Multiprocessor Networks”
CS717 Our Little Problem… AI search techniques topology- and fault-type independent… …but non-minimal routes utilized Follow-up work shows how genetic algorithms (combined with heuristics) can find minimal routes in presence of network faults
CS717 Genetic Algorithms: Overview Optimization strategy Population of potential solutions evolve over series of generations Each element of population is chromosome; each unit of chromosome is gene Chromosomes undergo crossover and mutation Most fit chromosomes selected for next generation, based upon fitness function
CS717 Abstract Model Same as before (including definitions of S and G) Pure abstraction suffers from same caveats as before Basic idea: Instead of AI search for adaptive route, optimize over population of routes to find best
CS717 Message Packets Simplified version:
CS717 Chromosome Route Chromosome Node on route Gene in chromosome Length of route Size of chromosome –Chromosome size directly reflects routing performance! Distance traversed basis of fitness
CS717 Population Creation
CS717 Mutation and Crossover Mutation: Swap and/or shift Normal crossover destroys routes, messes with source and destination; problem w/ different lengths –Use one-point random crossover
CS717 Fitness Function F = (D max – D route ) / D max + –D max : Maximum distance between source and destination –D route : Distance traveled by specific route – : Predefined value to ensure non-zero fitness Higher value More fit
CS717 Selection Scheme Roulette Wheel –Sum of fitness values * random value from [0,1] –Select chromosomes until fitness sum greater than product Tournament Selection –Most fit chromosomes selected Stochastic Remainder –Probabilities used to select route Which scheme has best performance selecting optimal route?
CS717 Reroute
CS717 Genetic Hybrid Algorithm
CS717 Performance Testing RW, TS, SR tested in concert with RR and HCR (previous algorithm was for RR) 25-node mesh network Varying fault percentages Ten tests, each over 20 generations Variations in mutation and crossover rates
CS717 Performance Testing (cont.) Randomness of RR bad for rerouting Unsuccessful routes increase with number of generations SR + HCR performed best (prevented premature convergence, maintained good diversity)
CS717 Performance Testing (cont.)
CS717 Comparison With AI-only Strategies
CS717 Next (and Final) Paper [3] Loh, Schröder, Hsu., “Fault-Tolerant Routing on Complete Josephus Cubes” (not AI/GA-related but interesting nevertheless)
CS717 What is a Josephus Cube? New network topology –Better embedding, communications performance than hypercube topologies Can be used for processors, node clusters w/ optical channel architectures –In clusters, fault tolerance sacrificed for scalability –Symmetry (which is good for routing) somewhat sacrificed as well
CS717 Complete Josephus Cube (CJC) Augmented (i.e. more links) Josephus Cube Better fault-tolerance Routing efficiency maintained! Properties –Uniform node degree of log 2 N + 2 –N = 2 r, r is order of network –Smaller diameter –Better symmetry –Guaranteed message delivery (w/ up to r + 1 faults) –No deadlocks/livelocks We examine CJC as node cluster topology
CS717 Defining the CJC CJC(N) is an undirected graph G = (V, E) Nodes labeled using function J: –J(1) = 1 –J(2i) = 2J(i) – 1, i 1 –J(2i + 1) = 2J(i) + 1, i 1
CS717 Defining the CJC (cont.) V = {u | J(2 r ) u J(2 r + 1 – 1)} Three types of edges: H, J, C –E = E H E J E C (x,y) E H iff H(x,y) = 1, where H is the Hamming distance (x,y) E J iff y = x XOR 6 (x,y) E C iff y = (x), where finds 2’s complement
CS717 Example: CJC(8), r = 3
CS717 Example: CJC(16), r = 4
CS717 Data Vectors Route vector R v (u) = (T r+1 T r …T 1 T 0 ) –Information on paths already traversed –T r+1 = 1 if |u v| > CEIL(r / 2) else 0 –T r = 1 if v = u (J), where u (J) is 2’s complement of lowest order bits –T k = (u k+1 v k+1 ), 0 k < r
CS717 Data Vectors (cont.) Input link vector I v (w) = (L r+1 L r …L 1 L 0 ) –L i = 0 if message arrives on link i else 1 Fault status vector F v (w) = (S r+1 S r …S 1 S 0 ) –S j = 0 if node on link j or link j itself faulty else 1 Navigation vector N v (u) = R v (u) I v (u) F v (w) –Designates optimal (or best-possible) paths from current node
CS717 CJC-FTROUTE() Algorithm 1)Extract route vector from packet header 2)Update current node’s fault status vector 3)If route vector all 0’s, destination reached 4)(Re)Initialize input link vector 5)Compute navigation vector N v (u) 6)If enabled C link not faulty, set R v r+1 (u) to 0, take 2’s complement of R v (u), and route packet 7)Else, do the same with the J link (link r), except complement R v 0 (u) and R v 1 (u) only 8)If both C and J links faulty and/or not enabled, find an H link for routing (or misrouting), set R v r+1 (u) and R v r (u) to 0, and either route or discard
CS717 Example 1: CJC(16)
CS717 Example 2: CJC(16)
CS717 Routing With Bounded Faults Theorem: Routing between a node pair (x, y) is optimal if F T < (r + 2) which includes faulty J and C incident links at x Proof: Let f i faults be encountered at node w i on path of length p. |R(w i )| - f i optimal links left to y; routing from w i to w i+1 optimal if link or |R(w i )| - f i > 0. Generalizing to whole path:
CS717 Maximum Path Distance Define FTR(u, v) = Set of edges of path from u to v Corollary: FTR guarantees path distance of route upper-bounded by H(x, y) if F T < (r + 2) Corollary: FTR guarantees max distance r in order r cluster Proof: Follows from above corollary with H(x, y) r
CS717 Final Set of Theorems For sub-optimal routing, FTR guarantees max path distance of 2r – H(x, y) + 2 only if F T < (r + 2) (including disabled C and J links at x) FTR guarantees deadlock-free routes FTR guarantees livelock-free routes –Proof of latter two utilizes notion of virtual networks in CJC