Presentation is loading. Please wait.

Presentation is loading. Please wait.

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California,

Similar presentations


Presentation on theme: "FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California,"— Presentation transcript:

1 FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California, Berkeley

2 Presentation Outline 1.Introduction –Objective/Motivation –Background 2.Methods –Implementation –Test setup 3.Evaluation –Test results –Conclusions

3 The Berkeley/Stanford ROC Project Purpose: investigating novel techniques for building highly- dependable Internet services Example techniques: –Advanced support for operator undo –Stability through targeted restarts –Integrated root cause analysis –Online verification of recovery mechanisms

4 FIG Project Objective/Motivation Objective: Develop a lightweight, extensible tool for injecting errors to test recovery code/mechanisms Motivation: Testing and production environments are always different Large systems will require recovery code, which should be tested as part of normal operation

5 “Software’s Invisible Users” Application Other librariesOther apps System libraries (libc) OS User interface User Input Concept: Jim Whittaker Florida Institute of Technology

6 Related Testing Methods 1.Ballista (DeVale, Koopman, Siewiorek) “Top-down” testing of POSIX-compliant OS and library interfaces 2.Fuzz (Miller, Fredriksen, So) Tested UNIX applications by feeding them random input streams 3.Holodeck (Whittaker et al.) Similar approach to ours, but only for Windows 2000/XP

7 FIG Implementation Thin stub library between app & libraries Traps API calls –Logs them –Inserts faults Can be inserted into any app without modification –Uses LD_PRELOAD Application libfig.so libc.so, other libs OS Normal call path Injected fault

8 Extensibility API stubs are automatically generated Very easy to add new APIs to log Fault injection is under script control Can simulate multiple fault models (e.g., memory pressure) MALLOC_INDEX interval 82 to infinity return 0 errno ENOMEM probability 0.03 OPEN_INDEX // device out of space. interval 100 to infinity return –1 errno ENOSPC probability 0.001 // kernel out of memory. interval 100 to 120 return –1 errno ENOMEM probability 0.1 // too many files open. callnumber 108 return -1 errno EMFILE probability 1.0 Sample control file:

9 Test Setup: Applications GNU file utilities (ls, mv, etc.) Emacs 20.7.1 – with and without X Apache 1.3.22 Berkeley DB 4.0.14 Netscape Navigator 4.76 MySQL server 3.23.36

10 Test Setup: Instrumented Calls & Their Errors malloc() – memory exhaustion read() – I/O error, system call was interrupted write() – I/O error, no space left on device, call interrupted open() – memory exhaustion, no space on device, too many files open select() – memory exhaustion

11 Test Results: Client Apps read()write()select()malloc() EINTREIOENOSPCEIOENOMEM Emacs – no X o.k.exitwarn o.k.crash Emacs - w/X o.k.crasho.k.crash crash/ exit crash Netscapewarnexit n/aexit

12 Test Results: Server Apps read()write()select()malloc() EINTREIOENOSPCEIOENOMEM Berkeley DB – Xact retrydetect Xact abort n/a Xact abort Berkeley DB – no Xact retrydetect data loss n/a detect, or data loss MySQL Server Xact abort retry, warn Xact abort retry restart process Apacheo.k. req. drop o.k.n/a

13 Netscape Reacts

14 Test Results: Overhead Time (s)Overhead No FIG33.46N/A FIG, no logging34.282.5% Logging w/o timestamps47.8342.9% Logging w/timestamps61.7484.5% strace (all syscalls)112.85237.3% Timing using Berkeley DB (non-transactional) to read, sort and write one million words. Note: FIG communicates with a separate logging daemon through shared memory to reduce logging overhead.

15 Strategies for Reliable Services: Intelligent retry –ls: “bounded retry” of malloc() Resource preallocation –Apache: allocates buffer pool at startup Degraded service –Apache: deactivates logging if disk full Process pools –Apache and MySQL

16 FIG as a Prototype for Online Error Injection Low run-time overhead Easy to enable/disable Easy to configure Extensible Can simulate multiple fault models

17 A Case for Online Error Injection Recovery code is not usually exercised during normal operation Deployed environments tend to differ from testing environments Can run error injection tests on a subset of deployed systems FIG can simulate common environmental errors

18 Conclusions FIG exposed a variety of deficiencies in how our test applications handled environmental errors Server apps are generally more robust than client applications FIG exhibits low overhead FIG is suitable for online error injection

19

20 Future Directions Limitations of FIG: –Only for UNIX-like OSes –Limited to app/library interface (proxy for app/OS interaction) Make FIG part of a larger test suite Include clock time and event based error triggers Greater flexibility in configuration file

21 Other Related Work 1.Xept (Vo et al.) Instruments object code to ensure that error handling code exists 2.Processor & memory errors DOCTOR, HYBRID, DEFINE 3.Process memory corruption FERRARI, DEFINE


Download ppt "FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California,"

Similar presentations


Ads by Google