FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California,

Slides:



Advertisements
Similar presentations
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Advertisements

Software Fault Injection for Survivability Jeffrey M. Voas & Anup K. Ghosh Presented by Alison Teoh.
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
RUGRAT: Runtime Test Case Generation using Dynamic Compilers Ben Breech NASA Goddard Space Flight Center Lori Pollock John Cavazos University of Delaware.
UNIX Chapter 01 Overview of Operating Systems Mr. Mohammad A. Smirat.
© 2004, D. J. Foreman 1 O/S Organization. © 2004, D. J. Foreman 2 Topics  Basic functions of an OS ■ Dev mgmt ■ Process & resource mgmt ■ Memory mgmt.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
3.5 Interprocess Communication
Chapter 13 Embedded Systems
OS Spring’03 Introduction Operating Systems Spring 2003.
Harvard University Oracle Database Administration Session 2 System Level.
Threads CSCI 444/544 Operating Systems Fall 2008.
Figure 1.1 Interaction between applications and the operating system.
PRASHANTHI NARAYAN NETTEM.
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.
SIMULATING ERRORS IN WEB SERVICES International Journal of Simulation: Systems, Sciences and Technology 2004 Nik Looker, Malcolm Munro and Jie Xu.
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
Computer Organization Review and OS Introduction CS550 Operating Systems.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
Group 6 Comp 129 Chapter 4.  An operating system s a set of programs made to manage the resources of a computer.  The OS performs five basic functions:
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
Computer System Architectures Computer System Software
OPC Database.NET. OPC Systems.NET What is OPC Systems.NET? OPC Systems.NET is a suite of.NET and HTML5 products for SCADA, HMI, Data Historian, and live.
Microsoft Application Virtualization 5.0: Introduction Mohnish Chaturvedi & Ian Bartlett Premier Field Engineer WCL312.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS Design.
FIG: Fault Injection in glibc Pete Broadwell, Naveen Sastry Jonathan Traupman.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
Windows NT Operating System. Windows NT Models Layered Model Client/Server Model Object Model Symmetric Multiprocessing.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
DARPA Jul A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
OSes: 3. OS Structs 1 Operating Systems v Objectives –summarise OSes from several perspectives Certificate Program in Software Development CSE-TC and CSIM,
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
A Tool for Pro-active Defense Against the Buffer Overrun Attack D. Bruschi, E. Rosti, R. Banfi Presented By: Warshavsky Alex.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Java Example Presentation of a Language. Background Conception: Java began as a language for embedded processors in consumer electronics, such as VCR,
Using Dynamic Compilers for Software Testing Ben Breech Lori Pollock John Cavazos.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
EPICS and LabVIEW Tony Vento, National Instruments
Operating System Concepts
Copyright © Curt Hill More on Operating Systems Continuation of Introduction.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
CSNB334 Advanced Operating Systems 3. Kernel Structure and Organization Lecturer: Abdul Rahim Ahmad.
2016 Global Seminar 按一下以編輯母片標題樣式 Virtualization apps simplify your IoT development Alfred Li.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Introduction to Operating Systems Concepts
Chapter 4 – Thread Concepts
Introduction to threads
Exceptional Control Flow
GridOS: Operating System Services for Grid Architectures
TANGO Harmonization Meeting (Edinburgh)
Free Transactions with Rio Vista
Presented by: Daniel Taylor
Chapter 4 – Thread Concepts
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
Chapter 5: Threads Overview Multithreading Models Threading Issues
Real-time Software Design
Introduction of Week 3 Assignment Discussion
Oracle Architecture Overview
Fault Tolerance Distributed Web-based Systems
Free Transactions with Rio Vista
SCONE: Secure Linux Containers Environments with Intel SGX
Co-designed Virtual Machines for Reliable Computer Systems
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms Naveen Sastry, Pete Broadwell, Jonathan Traupman, David Patterson University of California, Berkeley

Presentation Outline 1.Introduction –Objective/Motivation –Background 2.Methods –Implementation –Test setup 3.Evaluation –Test results –Conclusions

The Berkeley/Stanford ROC Project Purpose: investigating novel techniques for building highly- dependable Internet services Example techniques: –Advanced support for operator undo –Stability through targeted restarts –Integrated root cause analysis –Online verification of recovery mechanisms

FIG Project Objective/Motivation Objective: Develop a lightweight, extensible tool for injecting errors to test recovery code/mechanisms Motivation: Testing and production environments are always different Large systems will require recovery code, which should be tested as part of normal operation

“Software’s Invisible Users” Application Other librariesOther apps System libraries (libc) OS User interface User Input Concept: Jim Whittaker Florida Institute of Technology

Related Testing Methods 1.Ballista (DeVale, Koopman, Siewiorek) “Top-down” testing of POSIX-compliant OS and library interfaces 2.Fuzz (Miller, Fredriksen, So) Tested UNIX applications by feeding them random input streams 3.Holodeck (Whittaker et al.) Similar approach to ours, but only for Windows 2000/XP

FIG Implementation Thin stub library between app & libraries Traps API calls –Logs them –Inserts faults Can be inserted into any app without modification –Uses LD_PRELOAD Application libfig.so libc.so, other libs OS Normal call path Injected fault

Extensibility API stubs are automatically generated Very easy to add new APIs to log Fault injection is under script control Can simulate multiple fault models (e.g., memory pressure) MALLOC_INDEX interval 82 to infinity return 0 errno ENOMEM probability 0.03 OPEN_INDEX // device out of space. interval 100 to infinity return –1 errno ENOSPC probability // kernel out of memory. interval 100 to 120 return –1 errno ENOMEM probability 0.1 // too many files open. callnumber 108 return -1 errno EMFILE probability 1.0 Sample control file:

Test Setup: Applications GNU file utilities (ls, mv, etc.) Emacs – with and without X Apache Berkeley DB Netscape Navigator 4.76 MySQL server

Test Setup: Instrumented Calls & Their Errors malloc() – memory exhaustion read() – I/O error, system call was interrupted write() – I/O error, no space left on device, call interrupted open() – memory exhaustion, no space on device, too many files open select() – memory exhaustion

Test Results: Client Apps read()write()select()malloc() EINTREIOENOSPCEIOENOMEM Emacs – no X o.k.exitwarn o.k.crash Emacs - w/X o.k.crasho.k.crash crash/ exit crash Netscapewarnexit n/aexit

Test Results: Server Apps read()write()select()malloc() EINTREIOENOSPCEIOENOMEM Berkeley DB – Xact retrydetect Xact abort n/a Xact abort Berkeley DB – no Xact retrydetect data loss n/a detect, or data loss MySQL Server Xact abort retry, warn Xact abort retry restart process Apacheo.k. req. drop o.k.n/a

Netscape Reacts

Test Results: Overhead Time (s)Overhead No FIG33.46N/A FIG, no logging % Logging w/o timestamps % Logging w/timestamps % strace (all syscalls) % Timing using Berkeley DB (non-transactional) to read, sort and write one million words. Note: FIG communicates with a separate logging daemon through shared memory to reduce logging overhead.

Strategies for Reliable Services: Intelligent retry –ls: “bounded retry” of malloc() Resource preallocation –Apache: allocates buffer pool at startup Degraded service –Apache: deactivates logging if disk full Process pools –Apache and MySQL

FIG as a Prototype for Online Error Injection Low run-time overhead Easy to enable/disable Easy to configure Extensible Can simulate multiple fault models

A Case for Online Error Injection Recovery code is not usually exercised during normal operation Deployed environments tend to differ from testing environments Can run error injection tests on a subset of deployed systems FIG can simulate common environmental errors

Conclusions FIG exposed a variety of deficiencies in how our test applications handled environmental errors Server apps are generally more robust than client applications FIG exhibits low overhead FIG is suitable for online error injection

Future Directions Limitations of FIG: –Only for UNIX-like OSes –Limited to app/library interface (proxy for app/OS interaction) Make FIG part of a larger test suite Include clock time and event based error triggers Greater flexibility in configuration file

Other Related Work 1.Xept (Vo et al.) Instruments object code to ensure that error handling code exists 2.Processor & memory errors DOCTOR, HYBRID, DEFINE 3.Process memory corruption FERRARI, DEFINE