Download presentation
Presentation is loading. Please wait.
Published byBaby Tilton Modified over 9 years ago
1
© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison, WI
2
– 2 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Preview Focus on tree-based overlay networks (T-BŌN) Leverage characteristics of hierarchical topologies MRNet overview Reliability background Our approach to T-BŌN reliability Main-memory implicit checkpointing protocol
3
– 3 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Research Domain Target distributed system monitors, tools, profilers and debuggers Paradyn, Tau, etc.Paradyn, Tau, etc. Fault-model: crash-stop failures TCP-like reliability for multicast and stateful reduction operations Tolerate all internal node failures Graceful degradation to flat topology
4
– 4 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks This Year in HPC Processor Statistics from Top500 List: 7974: Top ten average 18%: ≥ 1024 59%: clusters 8192: largest cluster 32,768: largest system In 2005: 65,536 processor system Clusters and MPPs w/ 10 4 -10 5 processors will soon be commonplace.
5
– 5 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #1: Performance MRNet: Multicast/Reduction Overlay Network T-BŌN for scalable, efficient group communications and data analysesT-BŌN for scalable, efficient group communications and data analyses –Scalable multicast –Scalable reduction –In-network data aggregation
6
– 6 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks BE Front-End BE MRNet Example: Running Average Filter 1,181,81,271,51,111,221,32 2,131,272,82,27 3,18 4,18 3,18 4,18 7,18 4,18 7,18 3,18 4,183,18 2,131,272,82,27
7
– 7 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #2: Reliability A system with 10,000 nodes is 10 4 times more likely to fail than one with 100 nodes.
8
– 8 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #2: Reliability Leverage characteristics of T-BŌNs to provide highly scalable reliability protocols Logarithmic properties Regularity and predictability –Structure –Communication Inherent data redundancy
9
– 9 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Approaches to Distributed Reliability Reliable group communications Distributed transactions Rollback-recovery protocols
10
– 10 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Rollback-Recovery Protocols Checkpoint/Restart Challenges: Time Overhead –Checkpointing latency Commit latency (stable storage access) Coordination (coordinated checkpointing) –Recovery latency Calculating recovery point (uncoordinated checkpointing) Space Overhead –Checkpoint storage Multiple/useless checkpoints (uncoordinated checkpointing) Forced checkpoints (communication-induced checkpointing) –Protocol messages Complexity –Heterogeneity –Recovery semantics
11
– 11 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Approach to T-BŌN Reliability Framework for studying various recovery protocols in T-BŌNs Specify different recovery protocols for experimentation or customization Cost-benefit analyses of various recovery schemes Three new rollback-recovery protocols 1.Main-memory implicit checkpoints (MMIC) and state regeneration 2.Uncoordinated checkpoints w/ fast recovery 3.Pure communication-induced checkpoints
12
– 12 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Idea Leverage inherent redundancy of stateful reduction networks Eliminate explicit checkpoints Use volatile storage Reduces checkpoint latency –Checkpointed state used to regenerate the state of other failed processes Establish recovery clique Enable efficient recovery
13
– 13 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations Input: Set of states from a complete set of sibling nodes Output: Regenerated state of parent node Input: States from a parent node and an incomplete set of children nodes. Output: Regenerated state of failed node(s)
14
– 14 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations (cont’d) Input: States from a node Output: Two states to be assumed by two new sibling nodes jointly responsible for task of original node. Input: Two states from nodes in the network. Output: State to be assumed by a new node responsible for the tasks of the two original nodes.
15
– 15 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC: Recovery Semantics 1.Detect failure 2.Establish a recovery clique Set of processes whose persistent state can be used to regenerate that of failed node 3.Identify take-over node Assumes role of failed node 4.Regenerate persistent state of failed node 5.Reintegrate regenerated state into take- over node 6.Resume
16
– 16 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks BE MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c1 :2,27S c0 :2,8
17
– 17 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c1 :2,27S c0 :2,8 1. Detect Failure
18
– 18 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 1. Detect Failure 2. Calculate Recovery Clique
19
– 19 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node.
20
– 20 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node. 4. Regenerate lost state into “take- over” node: 4.1 read(S p, S c0, S c3 ) 4.2 decompose(S p, S c0, S c2, S c3 ) → S c1 ’ 4.3 merge(S c1 ’,S c2 ) → S c2 ’ 4.4 write(S c2 ’) → S c2 BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 2,8 8,14 2,5 S c1 ’:2,27 S c2 ’:4,21
21
– 21 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter 5. Update and resume BE S p :8,14 S c3 :2,5S c2 :4,21S c0 :2,8 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node. 4. Regenerate lost state into “take- over” node: 4.1 read(S p, S c0, S c3 ) 4.2 decompose(S p, S c0, S c2, S c3 ) → S c1 ’ 4.3 merge(S c1 ’,S c2 ) → S c2 ’ 4.4 write(S c2 ’) → S c2
22
– 22 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Outstanding Issues and Other Research Evaluation of new rollback recovery protocols Preemptive vs. non-preemptive recovery Failure zone identification Non-trivial filters Failure detection Topology reconfiguration Modeling Transmission layer reliability Efficient data loss repair
23
– 23 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks References Roth, Arnold, and Miller, “MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools”, in SC2003. Roth, Arnold, and Miller, “Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned”, in 2004 High-Performance Grid Computing Workshop. More to come … see you next year! http://www.paradyn.org/mrnet http://www.paradyn.org/mrnet darnold@cs.wisc.edu
24
– 24 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations (cont’d) Input: None Output: Current state of filter object Input: State of a filter object Output: None Side effect: Checkpoint to volatile/stable storage
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.