Consistent Cuts and Un-coordinated Check-pointing.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
JAYASRI JETTI CHINMAYA KRISHNA SURYADEVARA
Justification-based TMSs (JTMS) JTMS utilizes 3 types of nodes, where each node is associated with an assertion: 1.Premises. Their justifications (provided.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Chapter 4: Trees Part II - AVL Tree
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
Lectures on Network Flows
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Mrs. Chapman. Tabs (Block Categories) Commands Available to use Script Area where you type your code Sprite Stage All sprites in this project.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
Approximation Algorithms
Checkpointing 2.0 Compiler-Assisted Checkpointing Uncoordinated Checkpointing.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Intermediate Code. Local Optimizations
Cloud Computing Concepts
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:
1 Rollback-Recovery Protocols II Mahmoud ElGammal.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.
Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Code Optimization 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a.
12. Recovery Study Meeting M1 Yuuki Horita 2004/5/14.
Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.
“Virtual Time and Global States of Distributed Systems”
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Functions. Motivation What is a function? A function is a self-contained unit of program code designed to accomplish a particular task. We already used.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
Fault Tolerance and Checkpointing - Sathish Vadhiyar.
More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Fault Tolerance and Recovery Mostly taken from
Distributed Systems Lecture 6 Global states and snapshots 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
User-Written Functions
CSE 486/586 Distributed Systems Global States
Lectures on Network Flows
Instructor: Shengyu Zhang
EEC 688/788 Secure and Dependable Computing
湖南大学-信息科学与工程学院-计算机与科学系
Data Flow Analysis Compiler Design
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Global States
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
SPL – PS1 Introduction to C++.
Topic 2b ISA Support for High-Level Languages
Presentation transcript:

Consistent Cuts and Un-coordinated Check-pointing

Cuts Subset C of events in computation –some definitions require at least one event from each process For each process P, events in C that executed on P form an initial prefix of all events that executed on P Cut: {e0,e1,e2,e4,e7} Not a cut: {e0,e2,e4,e7} Frontier of cut: subset of cut containing last events on each process –for our example, {e2,e4,e7} x xxx xxx xxxx xx x e0 e1e2e3 e4 e5 e6 e7 e8 e9e10 e11e12e13

Equivalent definition of cut Subset C of events in computation If e’  C, and e  e’, and e and e’ executed on same process, then e  C. What happens if we remove condition that e and e’ were executed on same process? x xxx xxx xxxx xx x e0 e1e2e3 e4 e5 e6 e7 e8 e9e10 e11e12e13

Consistent cut Subset C of events in computation If e’  C, and e  e’, then e  C –Consistent cut: {e0, e1, e2, e4, e5,e7} note e5  e2 but cut is still consistent by our definition –Inconsistent cut: {e0,e1,e2,e4,e7} –Not a cut: {e0,e2,e4,e7} x xxx xxx xxxx xx x e0 e1e2e3 e4 e5 e6 e7 e8 e9e10 e11e12e13

Properties of consistent cuts(0) If cut is inconsistent, there must be a message such that receiving event is in C but sending event is not. Proof: there must an e and e’ such e  e’, e’ in C but e not in C. Consider the chain e  e0  e1…  e’. There must be events ei  ej in this chain such that events e,e0,…ei are not in C, but ej is in C. Clearly, ei and ej must be executed by different processes. Therefore, ei is send and ej is receive. x xxx xxx xxxx xx x e0 e4 e5 e6 e7 e8 e9e10 e11e12e13 x e e’

Properties of consistent cuts(I) Let e P be a computational event on a frontier of a consistent cut C. If e P  e’ Q, then e’ Q cannot be in C. Proof: Consider the causal chain e P  e1  … e’ Q. Event e1 must execute on process P because e P is a computational event. If e P is on frontier, e1 is not. By definition of consistent cut, e’ Q cannot be in consistent cut. x xxx xxx xxxx xx x e0 e4 e5 e6 e7 e8 e9e10 e11e12e13

Properties (II) Let F = {e 0,e 1,….} be a set of computational events, one from each process. F is the frontier of a consistent cut iff the events in F are concurrent. Proof: from Property (I) and Property(0). x xxx xxx xxxx xx x e0 e4 e5 e6 e7 e8 e9e10 e11e12e13

Properties of consistent cuts (III): Lattice of consistent cuts x xxx xxx xxxx xx x e0 e1e2e3 e4 e5 e6 e7 e8 e9e10 e11e12e13 C1 C2

Un-coordinated check-pointing Each process saves its local state at start, and then whenever it wants. Events: compute,send,receive,take check-point Recovery line: frontier of any consistent cut, whose events are all check-points Is there an optimum recovery line? How do we find it? p q r *

Check-point Dependency Graph Nodes –One for each local check-point –One for current state of each surviving process Edges: one for each message (e,e’) from some P to Q –Source is node for last check-point on P that happened before e –Destination is node n on Q for first check-point/current state such that e’ happened before n p q r * p q r

Properties of check-point dependency graph Node c2 is reachable from node c1 in graph iff check-point corresponding to c1 happens before check-point corresponding to c2. p q r * p q r

Finding optimum recovery line RL 0 = { last nodes on each process } While (there exist u,v in RL i | v is reachable from u) –RL i+1 = RL i – {v} + {node before v in same process as v} Final RL when loop terminates is optimum recovery line See later to make this into an algorithm. p q r * p q r RL0 RL1RL2 RL3

Correctness Algorithm obviously computes a set of concurrent check-points, one from each process. From Property (II), it follows that these check- points are frontier of a consistent cut. p q r

Optimality Suppose O is better recovery line. O cannot be RL O ; otherwise, our algorithm succeeds. So RL 0 is better than O. Consider iteration when RL i is better than O but RL i+1 is not. There exist u,v in RL i such that v is reachable from u and RL i+1 is obtained from Rl i by dropping v and taking check-point prior to v. Therefore, v must be in O. Let x in O be check-point on same process as u. We see that x  u  v, which contradicts Property(II). p q r

Finding recovery line efficiently Node colors –Yellow: on current recovery line –Red: beyond current recovery line –Green: behind current recovery line Bad edge: –Source is red/yellow –Destination is yellow/green Algorithm: propagate redness forward from destination bad edges p q r

Algorithm Mark all nodes green For each node l that is last node of process –Mark node yellow –Add each edge (l,d) to worklist While worklist is nonempty do –Get edge (s,d) from worklist; –If color(d) is red continue; –L = node to left of d; –Mark L yellow; Add all bad edges (L,d) to worklist; –R = first red node to right of d; –For each node t in interval [d,R) Mark t red; Add all bad edges of form (t,d) to worklist;

Remarks Complexity of algorithm: O(|E|+|V|) –Each node is touched at most 3 times to mark it green, yellow,red –Each edge is examined at most twice Once when its source goes green  yellow Once when its source goes yellow  red Another approach: use rollback dependency graph (see Alvisi et al)

Practical details Each process numbers its checkpoints starting at 0. When a message is sent from S to R, number of last check- point is piggybacked on message. Receiver of message saves message + piggyback in log. When checkpoint is taken, message log is also saved on disk. In-flight messages can be recovered from this log after recovery line has been established. p q r *

Garbage collection of saved states Garbage collection of old states is key problem. One solution: run the recovery line algorithm once in a while even if there is no failure, and GC all states behind the recovery line.

Application-level Check-pointing

Recall We have seen system-level check-pointing. Trouble with system-level check-pointing: –lot of data saved at each check-point PC, registers, stack, heap, some O/S state,network state,… thin pipe to disk problem –lack of portability processor/OS state is very implementation-specific cannot restart check-point on different platform cannot restart check-point on different number of processors One alternative: application-level check-pointing

Application-level check-pointing Key idea: permit user to specify –what variables should be saved at a check-point –program point where check-point should be taken Example: protein-folding –save only positions and velocities of bases –check-point at end of time-step Advantages: –less data saved only live data needs to be saved check-point at program points where live data is small and no in-flight messages –data can be saved in implementation-independent manner

Warning This is more complex than it appears! We must restore –PC: need to save where check-point was taken –registers –stack In general, many active procedure invocations when check- point is taken. How do we restore stack so procedure returns etc. happen correctly? Heap: restored heap data will be in different locations than at check-point

Right intuition In application-level check-pointing, we must use the saved variables to recompute the system state we would have saved in system-level check- pointing, modulo relocation of heap variables. Recovery script: –code that is executed to accomplish this –distinct from user code, but obviously derived from it –however, needs to woven into user code to simplify problems such as register restoration

Example: DOME (Beguelin et al,CMU) Distributed Object Migration Environment (DOME) C++ library of data parallel objects automatically distributed over networks of heterogenous work- stations Application-level check-pointing and restart supported –User-level –Pre-processor based

Simple case Most computation occurs in a loop in main Solution: –put one check-point at bottom of loop –live variables at bottom of loop are globals –write script to save and restore globals –weave script into main

Dome example main (int argc, char *argv[]) {dome-init(argc,argv); //* statements are introduced for failure recovery //prefix d on variable type says “save me at checkpoint” * dScalar integer-variable; * dScalar float-variable; * dVector int-vector; * if (! is_dome_restarting()) * execute_user_initialization_code(…); while (!loop_done(…)) { //loop_done uses only saved variables do_computation(…); * dome_check_point(); }

Analysis Let us understand how this code restores processor state –PC: we drop into loop after restoring globals –registers: by making recovery script part of main, we ensure that register contents at top of loop are same for normal execution and for restart –stack: we re-execute main, so frame is restored –heap: restored from saved check-point but may be relocated Think: this works even if we restart on different machine!

Remarks Loop body is allowed to make function calls –real restriction is that there is one check-point and it must be in main Command-line parameter is used to determine whether execution is normal or restart User must write some code to restore variables from check-point –perhaps library code can help

More complex example f() { dScalar i; do_f_stuff; g(i); next_statement; …; } g(dScalar &I) { do_g_stuff_1; dome_checkpoint(); do_g_stuff_2; }

General scenario Check-point could happen deep inside a bunch of procedure calls. On restart, we need to restore stack so procedure returns etc. can happen normally. Solution: save information about which procedure invocations are live at check- point

Example with Dome constructs f() { g(dScalar &I) { dScalar i; if (is_dome_restarting()) if (is_dome_restarting()) { goto restart_done; next_call = dome_get_next_call(); do_g_stuff_1; …..} dome_checkpoint(); do_f_stuff; restart_done: dome_push(“g1”); do_g_stuff_2; g1: } g(i); dome_pop(); next_statement; …; }

Challenge Do this for MPI code. Can compiler determine –where to check-point? –what data to check-point? Need not save all data live at check-point –if some variables can be easily recomputed from saved data and program constants, we can re-compute those values in the recovery script. –we can modify program to make this easier. Measure of success: beat hand-written recovery code