Scalable Failure Recovery for Tree-based Overlay Networks

Scalable Failure Recovery for Tree-based Overlay Networks
Scalable TBŌN Failure Recovery Scalable Failure Recovery for Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week April 30 – May 3, 2007 Madison, WI © 2006 Dorian C. Arnold

Scalable TBŌN Failure Recovery
Overview Motivation Address the likely frequent failures in extreme-scale systems State Compensation: Tree-Based Overlay Networks (TBŌNs) failure recovery Use surviving state to compensate for lost state Leverage TBŌN properties Inherent information redundancies Weak data consistency model: convergent recovery Final output stream converges to non-failure case Intermediate output packets may differ Preserves all output information Inherent information redundancy. Our key observation is that we can leverage this to … Broader class than you might think © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

HPC System Trends 60% larger than 103 processors 10 systems larger than 104 processors System Location Size Time Frame RoadRunner LANL ~3.2x104 2008 Jaguar ORNL ~4.2x104 2007 SunFire x64 TACC ~5.2x104 Cray XT4 ~2x105 BlueGene/P ANL ~5x105 BlueGene/Q ANL/LLNL ~106 © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Large Scale System Reliability
Scalable TBŌN Failure Recovery Large Scale System Reliability BlueGene/L Purple These are only hardware failures. © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Current Reliability Approaches
Scalable TBŌN Failure Recovery Current Reliability Approaches Fail-over (hot backup) Replace failed primary w/ backup replica Extremely high overhead: 100% minimum! Rollback recovery Rollback to checkpoint after failure May require dedicated resources and lead to overloaded network/storage resources 100% at sizes of 10^5 and above probably not acceptable When not covering a topic, say fuller discussion available in tech report © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Our Approach: State Compensation
Scalable TBŌN Failure Recovery Our Approach: State Compensation Leverage inherent TBŌN redundancies Avoid explicit replication No overhead during normal operation Rapid recovery Limited process participation General recovery model Applies to broad classes of computations When not covering a topic, say fuller discussion available in tech report © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Background: TBŌN Model
Scalable TBŌN Failure Recovery Background: TBŌN Model FE Application Front-end TBŌN Output CP0 CP1 CP2 CP CP0 Tree of Communication Processes CP1 CP2 CP … Packet Filter Filter State Foundation for Paradyn on 1000 nodes, scalable debugging in next talk, … TBŌN Input Application Back-ends BE BE BE BE BE BE BE BE © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

A Trip Down Theory Lane Formal treatment provides confidence in recovery model Reason about semantics before/after recovery Recovery model doesn’t change computation Prove algorithmic soundness Understand recovery model characteristics System reliability cannot depend upon intuition and ad-hoc reasoning © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Theory Overview TBŌN end-to-end argument: output only depends on state at the end-points Can recover from lost of any internal filter and channel states CPi channel state filter state CPj channel state Define formal model to provide confidence that algorithms are sound: that we get behaviors and results that we understand. Too complex to do ad-hoc or based on intuition. We want to get to the point where I can show that if … we can … to compensate … End-to-end argument applies to our model (not necc. all tbon models) CPk CPl © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Theory Overview (cont’d)
Scalable TBŌN Failure Recovery Theory Overview (cont’d) CPi TBŌN Output Theorem Output depends only on channel states and root filter state channel state filter state All-encompassing Leaf State Theorem State at leaves subsume channel state (all state throughout TBŌN) CPj channel state Emphasize: don’t need internal state. Can ignore! Improves performance not necc. for correctness. If needed channel states would have to resort to logging or something like that. Result: only need leaf state to recover from root/internal failures CPk CPl © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Theory Overview (cont’d)
Scalable TBŌN Failure Recovery Theory Overview (cont’d) TBON Output Theorem All-encompassing Leaf State Theorem Builds on Inherent Redundancy Theorem © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Background: Data Aggregation
Scalable TBŌN Failure Recovery Background: Data Aggregation Filter function: f ( i n C P ) ; s ! o u t + 1 g Packets from input channels Current filter state Output packet Updated filter state © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Background: Filter Function
Scalable TBŌN Failure Recovery Background: Filter Function Built on state join and difference operators State join operator, Update current state by merging inputs Commutative: Associative: Idempotent: t i n ( C P ) t f s ! + 1 a t b = Show picture of joining real computations, even complicated ones, fit these characteristics. Properties of join important for flexibility (esp. in tree reconstruction) Once you buy into inherent redundancy, this becomes next most relevant point Properties are necessary? Vs. convenient? Allow us to relax recovery constraints Admit many useful computations. ( a t b ) c = a t = © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Background: Descendant Notation
Scalable TBŌN Failure Recovery Background: Descendant Notation desc0(CPi ) CPi CP desc1(CPi ) CP … CP CP desck-1(CPi ) desck(CPi ) CP CP CP CP … fs( desck(CPi ) ): join of filter states of specified processes cs( desck(CPi ) ): join of channel states of specified processes © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

TBŌN Properties: Inherent Redundancy Theorem
Scalable TBŌN Failure Recovery TBŌN Properties: Inherent Redundancy Theorem The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. fsn(CP0 ) csn,p(CP0 ) csn,q(CP0 ) = fsp(CP1 ) fsq(CP2 ) t t t CP0 Got bogged down in details CP1 CP2 © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

TBŌN Properties: Inherent Redundancy Theorem
Scalable TBŌN Failure Recovery TBŌN Properties: Inherent Redundancy Theorem The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. fs( desc0(CP0 )) cs( desc0(CP0)) = fs( desc1(CP0 )) t CP0 Got bogged down in details CP1 CP2 © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

TBŌN Properties: All-encompassing Leaf State Theorem
Scalable TBŌN Failure Recovery TBŌN Properties: All-encompassing Leaf State Theorem The join of the states from a sub-tree’s leaves equals the join of the states at the sub-tree’s root and all in-flight data The join of the states from a sub-tree’s leaves equals the join of the states at the sub-tree’s root and all in-flight data CP0 CP1 CP2 CP3 CP4 CP5 CP6 © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

TBŌN Properties: All-encompassing Leaf State Theorem
Scalable TBŌN Failure Recovery TBŌN Properties: All-encompassing Leaf State Theorem The join of the states from a sub-tree’s leaves equals the join of the states at the sub-tree’s root and all in-flight data From Inherent Redundancy Theorem: f s ( d e c 1 C P ) = t f s ( d e c 2 C P ) = 1 t … f s ( d e c k C P ) = 1 t f s ( d e c k C P ) = t : 1 © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

State Composition Motivated by previous theory State at leaves of a sub-tree subsume state throughout the higher levels Compose state below failure zones to compensate for lost state Addresses root and internal failures State decomposition for leaf failures Generate child state from parent and sibling’s © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

State Composition If CPj fails, all state associated with CPj is lost CPi TBŌN Output Theorem: Output depends only on channel states and root filter state cs( CPi , m) fs(CPj ) CPj All-encompassing Leaf State Theorem: State at leaves subsume channel state (all state throughout TBŌN) cs( CPj ) CPk CPl Therefore, leaf states can replace lost channel state without changing computation’s semantics © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

State Composition Algorithm
Scalable TBŌN Failure Recovery State Composition Algorithm if detect child failure remove failed child from input list resume filtering from non-failed children endif if detect parent failure do determine/connect to new parent while failure to connect propagate filter state to new parent © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Summary: Theory can be Good!
Scalable TBŌN Failure Recovery Summary: Theory can be Good! Allows us to make recovery guarantees Yields sensible, understandable results Better-informed implementation What needs to be implemented What does not need to be implemented Current status: empirical evaluation coming © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

References Arnold and Miller, “State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks”, UW-CS Technical Report, February 2007. Other papers and software: © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

State Composition Performance
Scalable TBŌN Failure Recovery State Composition Performance r e c o v y l a t n = ( i s b h m x d p ) + u LAN connection establishment: ~1 millisecond © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Failure Model Fail-stop Multiple, simultaneous failures Failure zones: regions of contiguous failure Application process failures May view as sequential data sources/sink Amenable to basic reliability mechanisms Simple restart, sequential checkpointing Not tolerate, but include application failures Have strategy to deal with, but not time to describe here (see tech. report) © 2007 Dorian C. Arnold Scalable TBŌN Failure Recovery © 2006 Dorian C. Arnold

Scalable Failure Recovery for Tree-based Overlay Networks

Similar presentations

Presentation on theme: "Scalable Failure Recovery for Tree-based Overlay Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Failure Recovery for Tree-based Overlay Networks

Similar presentations

Presentation on theme: "Scalable Failure Recovery for Tree-based Overlay Networks"— Presentation transcript:

Similar presentations

About project

Feedback