Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS717 Application of AI- and ML-Techniques to Fault-Tolerant Routing Arjun Rao CS 717 November 16 and 18, 2004.

Similar presentations


Presentation on theme: "CS717 Application of AI- and ML-Techniques to Fault-Tolerant Routing Arjun Rao CS 717 November 16 and 18, 2004."— Presentation transcript:

1 CS717 Application of AI- and ML-Techniques to Fault-Tolerant Routing Arjun Rao CS 717 November 16 and 18, 2004

2 CS717 Papers Covered [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies” [2] Loh, Shaw., “A Genetic-Based Fault- Tolerant Routing Strategy for Multiprocessor Networks”

3 CS717 Papers Covered (cont.) [3] Loh, Schröder, Hsu., “Fault-Tolerant Routing on Complete Josephus Cubes” (not AI-related but interesting nevertheless) If time permits, also: [4] Bradley, Tyrrell., “Immunotronics: Hardware Fault Tolerance Inspired by the Immune System”

4 CS717 The Problem of Routing Communication between nodes –Servers –Microprocessors Desire shortest, most efficient paths –Multiprocessor network topologies, e.g. hypercubes, Josephus cubes, etc. Desire availability of paths –What to do when links/nodes fail? –How to remain (close to) optimal?

5 CS717 Intro to Fault-Tolerant Routing Current algorithms adaptive but non-minimal Misrouting Routing strategies tied to specific topologies –k-ary, n-cubes, meshes, etc.: Regular structures and symmetry –Constrained by fault number and types More general strategies vulnerable to deadlock and livelock

6 CS717 “Turn Model” [Glass, Ni] Widest application scope –k-ary, n-cubes, nD-meshes, torus geometries, etc. “West-First” algorithm (on 2D-mesh) –Messages prevented from turning “west” again –Prevents cycles  deadlocks –Routing along virtual channels in strictly decreasing or increasing order

7 CS717 Turn Model and Channel Numbering

8 CS717 Turn Model (cont.) Three examples of routing “F” = FAILURE Full adaptation w/o deadlock and livelock requires more global info  more overhead

9 CS717 AI Search Techniques Arbitrary topology  Search space Search space  Search tree(s) Adaptive but still non-minimal Characteristic recursion impractical on loosely-coupled, distributed network

10 CS717 AI Logical Abstraction Abstraction: –S: Problem space –O: Set of objectives –P: Search paths –S = (O, P), where o i  O and p j  P, each p j connects tuple (o k, o l ), k  l Abstraction used to model…

11 CS717 Multiprocessor Network w/ Generic Topology Network –N: Nodes –L: Links between nodes –G = (N, L), where n i  N and l j  L, each l j connects tuple (n k, n l ), k  l Objective  Node Search path  Link

12 CS717 Abstract Routing Model Search  : –  (o s, o t ): S x S  S*, where S = (O, P) and S* = (O*, P*) –o x,o y  O and o x,o y  O*  Successful search –o x,o y  O and o x  O*, o y  O*  Unsuccessful  Routing attempt R: –R(n s, n d ): G x G  G*, where G = (N, L) and G* = (N*, L*) –n i,n j  N and n i,n j  N*  Complete route –n i,n j  N and n i  N*, n j  N*  Incomplete 

13 CS717 Routing Analogy AI search equivalent to routing attempt Successful search  Route between source and destination nodes Unsuccessful search  Incomplete route to destination

14 CS717 Caveats of Analogy No specific search algorithm  No routing strategy No optimality constraints Nothing about deadlocks/livelocks Nothing about fault tolerance!!

15 CS717 Fault-Tolerant Routing Model Model considers two aspects: –Routing system configuration Must be generic enough! –Message propagation protocols and policies Following slides introduce what is needed for AI searches (w/ physical message backtracking)

16 CS717 FT Routing Model (cont.)

17 CS717 FT Routing Model (cont.) Eager readership of input messages Single input buffer to avoid polling Multiple output buffers to accommodate different delivery rates Router process: –AI/FT routing strategy implemented here –Physical message backtracking  Increased message sizes –Increased message sizes/overhead  Requires communications router at each node

18 CS717 Communications Router

19 CS717 Communications Router (cont.) Communication router constitutes router process and connections Main components: LCM and CP ROM: Stores link management and routing software RAM: Stores routing table, link status table, associated link lists

20 CS717 CR Data Structure: Routing Table

21 CS717 CR Routing Table For each node, up to n links For each link: –Connected with status OK and node ID of neighbor –Not connected with status NC and node ID –1 Link fault represented by timeout: –Status reset to NC Processor fault represented by timeouts in neighbors

22 CS717 CR Data Structures: Link Status Table, Lists

23 CS717 Message Packets Six fields: –Router Control (4 bits): Type of message, including NORMAL and BACKTRACK –Destination Node ID (10 bits): Supports network of size up to 1024 nodes –Pending Nodes (20 bytes): Stack of node IDs that may receive packet but have not yet –Traversed Nodes (20 bytes): Stack of nodes traversed, with most recent on top

24 CS717 Message Packets (cont.) –Traversed Nodes Index (10 bits): Index to previous traversed nodes field. Supports simulation of physical message backtracking –Data Field (n-bit pointer): Points to information content of packet

25 CS717 (Finally) AI Search Strategies Brute Force: –Depth-First Search –Random Climbing Heuristic: –Hill Climbing –Best-First Search –A*

26 CS717 AI Search Strategies (cont.) In presence of network faults: –Prevent cycles  No deadlocks –Prevent more than two traversals of nodes/links  No livelocks and necessary for AI searches Adaptations of search algorithms Problems: –Recursion? Nope (PMB) –Overhead? Fixed (Well, mostly…)

27 CS717 Common Beginning Extracts header and disassembles it IF Destination Node is reached, pass packet to host processor ELSE IF Router Control is BACKTRACK IF Pending Nodes top node is directly linked Route packet to that node Set Router Control to NORMAL ELSE Backtrack packet to previous node in traversed Pop current node ID from Pending Nodes Push current node ID onto Traversed Nodes

28 CS717 Depth-First Search Travel as far as possible –Do not consider alternative paths just yet If fault or dead-end, backtrack to most recent possible path

29 CS717 DFS (cont.) Following common beginning: Look for directly linked successor nodes IF they are already traversed, ignore ELSE IF they are in Pending Nodes, ignore ELSE push them onto Pending Nodes Read top node of Pending Nodes IF directly linked (no fault), route packet to it ELSE Set BACKTRACK and route to last traversed node END

30 CS717 DFS Example

31 CS717 DFS Example (cont.)

32 CS717 Random Climbing Following the common beginning: … ELSE Select a successor node randomly Push unselected successor nodes onto Pending Nodes …

33 CS717 Hill Climbing Heuristic: Estimated remaining distance Following common beginning: … ELSE Sort successor nodes according to est. remaining distance Push sorted nodes onto Pending Nodes …

34 CS717 Best-First Search Resumes partial routes not previously considered Looks at immediate neighbors, neighbors of predecessors –Sorts by est. remaining distance Leads to non-minimal routes!

35 CS717 BFS (cont.) … ELSE Push (directly linked successor nodes) onto Pending Nodes Sort Pending Nodes according to est. remaining distance …

36 CS717 A* Two heuristics: –Estimated remaining distance: h –Path length traversed: g Partial paths sorted by f = g + h When no faults, always finds minimal route

37 CS717 A* (cont.) After current ID processing: Record path length traversed, g … ELSE Calculate and store f for new successor nodes Push them onto Pending Nodes sorted by f …

38 CS717 Performance Testing Simulated 125-node multiprocessor network Max 8 links per node (maps to many topologies) Faulty links and processors –Pre-specified or dynamically generated Testing: –Messages between every pair of nodes –20 trials at 0%, 5%, 10%, 15%, 20% faulty links –125 x 125 x 20 x 6 = 1,875,000 tests (??)

39 CS717 Test Results As faults increase, heuristic strategies fair better (esp. > 15%) A* best search technique but slow Hill climbing and BFS do not consider nodes traversed –Hill climbing considers only immediate neighbors

40 CS717 Test Results (cont.)

41 CS717 Main Point Using AI search techniques, we abstract from routing in networks to searching in trees (topology-independent, quantity and type of faults irrelevant)

42 CS717 Next Paper [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies” [2] Loh, Shaw., “A Genetic-Based Fault- Tolerant Routing Strategy for Multiprocessor Networks”

43 CS717 Our Little Problem… AI search techniques topology- and fault-type independent… …but non-minimal routes utilized Follow-up work shows how genetic algorithms (combined with heuristics) can find minimal routes in presence of network faults

44 CS717 Genetic Algorithms: Overview Optimization strategy Population of potential solutions evolve over series of generations Each element of population is chromosome; each unit of chromosome is gene Chromosomes undergo crossover and mutation Most fit chromosomes selected for next generation, based upon fitness function

45 CS717 Abstract Model Same as before (including definitions of S and G) Pure abstraction suffers from same caveats as before Basic idea: Instead of AI search for adaptive route, optimize over population of routes to find best

46 CS717 Message Packets Simplified version:

47 CS717 Chromosome Route  Chromosome Node on route  Gene in chromosome Length of route  Size of chromosome –Chromosome size directly reflects routing performance! Distance traversed basis of fitness

48 CS717 Population Creation

49 CS717 Mutation and Crossover Mutation: Swap and/or shift Normal crossover destroys routes, messes with source and destination; problem w/ different lengths –Use one-point random crossover

50 CS717 Fitness Function F = (D max – D route ) / D max +  –D max : Maximum distance between source and destination –D route : Distance traveled by specific route –  : Predefined value to ensure non-zero fitness Higher value  More fit

51 CS717 Selection Scheme Roulette Wheel –Sum of fitness values * random value from [0,1] –Select chromosomes with fitness greater than product Tournament Selection –Most fit chromosomes selected Stochastic Remainder –Probabilities used to select route Which scheme has best performance selecting optimal route?

52 CS717 Reroute

53 CS717 Genetic Hybrid Algorithm


Download ppt "CS717 Application of AI- and ML-Techniques to Fault-Tolerant Routing Arjun Rao CS 717 November 16 and 18, 2004."

Similar presentations


Ads by Google