Bristlecone: A Language for Robust Software Systems Brian Demsky Alokika Dash University of California, Irvine.

Slides:



Advertisements
Similar presentations
Design by Contract.
Advertisements

Threads, SMP, and Microkernels
A component- and message-based architectural style for GUI software
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 2.
Remote Procedure Call (RPC)
Exception Handling Chapter 15 2 What You Will Learn Use try, throw, catch to watch for indicate exceptions handle How to process exceptions and failures.
An Introduction to Java Programming and Object- Oriented Application Development Chapter 8 Exceptions and Assertions.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Distributed Object & Remote Invocation Vidya Satyanarayanan.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
©Ian Sommerville 2000CS 365 Ariane 5 launcher failureSlide 1 The Ariane 5 Launcher Failure June 4th 1996 Total failure of the Ariane 5 launcher on its.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Page 1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
SIMULATING ERRORS IN WEB SERVICES International Journal of Simulation: Systems, Sciences and Technology 2004 Nik Looker, Malcolm Munro and Jie Xu.
Fundamentals of Python: From First Programs Through Data Structures
©Ian Sommerville 2004Software Engineering Case Studies Slide 1 The Ariane 5 Launcher Failure June 4th 1996 Total failure of the Ariane 5 launcher on its.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
(C) 2009 J. M. Garrido1 Object Oriented Simulation with Java.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
The Ariane 5 Launcher Failure June 4th 1996 Total failure of the Ariane 5 launcher on its maiden flight.
Computer Security and Penetration Testing
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Distributed Systems: Concepts and Design Chapter 1 Pages
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
CSE 451: Operating Systems Winter 2015 Module 22 Remote Procedure Call (RPC) Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Exceptions cs1043. Program Exceptions When a program detects an error, what should it do? – Nothing, simply allow the program to fail. – Implement a course.
Software Reliability Research Pankaj Jalote Professor, CSE, IIT Kanpur, India.
CprE 458/558: Real-Time Systems
Handling Mixed-Criticality in SoC- based Real-Time Embedded Systems Rodolfo Pellizzoni, Patrick Meredith, Min-Young Nam, Mu Sun, Marco Caccamo, Lui Sha.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
DESIGN PATTERNS -BEHAVIORAL PATTERNS WATTANAPON G SUTTAPAK Software Engineering, School of Information Communication Technology, University of PHAYAO 1.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Distributed Computing & Embedded Systems Chapter 4: Remote Method Invocation Dr. Umair Ali Khan.
Software Design and Development Development Methodoligies Computing Science.
The Structuring of Systems Using Upcalls By David D. Clark Presented by Samuel Moffatt.
Presented by: Daniel Taylor
Chapter 8 – Software Testing
Outline Announcements Fault Tolerance.
CSC 143 Error Handling Kinds of errors: invalid input vs programming bugs How to handle: Bugs: use assert to trap during testing Bad data: should never.
Process Description and Control
Chapter 1 Introduction(1.1)
Mark McKelvin EE249 Embedded System Design December 03, 2002
Chapter 2 Processes and Threads 2.1 Processes 2.2 Threads
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Abstractions for Fault Tolerance
Presentation transcript:

Bristlecone: A Language for Robust Software Systems Brian Demsky Alokika Dash University of California, Irvine

Current Software is All or Nothing Most current software either executes perfectly or fails completely Small errors cause catastrophic failures Violate fundamental developer assumptions Violated assumptions prevent continued execution No clean way to recover from errors Unclear what parts of the program are affected Failure may leave key data structures partially updated

Degraded Service Can Be Desirable Consider a bug that affects an embedded application in a single web page Current browsers often close all browser windows and exit Users find this behavior frustrating Better option is to isolate the failure and halt only the embedded application component

Request r=receiverequest(); logRequest(r); processRequest(r); Motivating Example: Web Server

Request r=receiverequest(); logRequest(r); processRequest(r); Motivating Example: Web Server CRASH Failure in log operation Prevents serving this request If logging failure is independent of request, potentially causes system to fail to serve any requests

Real World Example First Flight of the Ariane 5 Rocket Uncaught integer overflow in computation that computed horizontal bias Overflow shutdown the inertial reference system Inertial reference sent debug information to the guidance system Guidance system used these invalid values to set incorrect nozzle deflections $120 Million rocket crashed Horizontal bias value is not even used! Lesson: Critical system operations coupled to non-critical operations

Observations about Recovery Challenging to recover from failure with traditional program structures Unclear what code was doing Are data structures consistent? What depends on the failed code? What is still safe to do? Code structure introduces artificial dependences In the absence of precise dependence information, we must assume the worst case Failures can propagate through artificial dependences  small errors can cause catastrophic failures

Where do we lose information? Specifications describe functionality requirements Architecture/implementation phases map requirements into sequence of operations Mapping process loses information: Boundaries of operations (What is A and what is B?) Temporal dependences (Does B require A?) Data dependences (Does B use data produced by A?) Lost information introduces artificial dependences

Designing for Robustness Underlying assumption: All code contains bugs Goal: Mitigate the consequences Approach: Decompose application into many small tasks Specify dependences between these tasks Data dependences Control dependences Use transactions to prevent failures from exposing partially updated data structures Use dependence information to continue past failures

Bristlecone Language Program is specified as a set of tasks Task specifications describe task dependences Tasks have transactional semantics Runtime system reasons about dependences to execute past failures (automated recovery)

Web Server Example Read Request Log Request Send Page Decoupled operations Log Request and Send Page tasks are independent Failure of one does not affect the other Accept Connection

Specifying Object States Different object states support different functionality (Type State) Use flag construct to label conceptual object states Use these flags to determine when to perform operations Can differentiate between operations that have true data dependences and operations that just operate on same objects class WebRequest { Flag initialized; Flag send_page; Flag write_log; …}

Tagging Objects Motivation: Consider the web server example Each connection has: A Socket object that provides communication A WebRequest object that stores application specific state Need to pair the correct Socket and WebRequest objects together Solution: Tag the group of objects Socket Object WebRequest Object Connection Tag

Tagging Objects Tags group object instances Tags provide mechanism Tags have types Can create many instances of a tag type Each instance defines a group Can bind tag instances to objects Tags can specify that task parameters must be in the same group

Task Specifications Describe data dependences of tasks Describe affect of tasks on objects /* This task reads a request from a client. */ task readRequest(WebRequest w in initialized with connection t, Socket s in IO_Pending with connection t) {... taskexit(w: initialized:=false, send_page:=true, write_log:=true); }

Bristlecone Task Semantics Runtime invokes tasks Tasks can be invoked when objects are available in the heap that satisfy the task’s parameter guards Task have transactional memory semantics All operations are executed or none Task execution appears to occur in a single instance Failures cause transactions to abort and restore consistency

Failure-free Execution Read Request Log Request Send Page Accept Connection

Failure-free Execution Read Request Log Request Send Page Accept Connection

Failure-free Execution Read Request Log Request Send Page Accept Connection

Failure-free Execution Read Request Log Request Send Page Accept Connection

Error Detection Catching operating system signals Arithmetic exceptions Null pointer exceptions Library signals Socket errors … Runtime language checks Array out of bounds exceptions Assertions Imperative consistency checks Declarative data structure specifications

Failure Recovery Transactions restore data structures to previous consistent state Problem: Re-executing the same task will likely result in the same failure Solution: Use task specifications to determine what other tasks can be safely executed

Automatic Recovery Read Request logRequest Send Page Accept Connection CRASH

Automatic Recovery Read Request Log Request Send Page Accept Connection

Automatic Recovery Summary Read Request Log Request Send Page Accept Connection

Language Benefits Use specifications to understand failure in a meaningful way Use task specifications to reason how to recover from failures Task specifications eliminate artificial dependences

Task Dispatch Goal: Determine which parameter objects satisfy task guards Problem: Brute force search can be expensive Our Approach maintains: Parameter set of objects that satisfy an individual parameter’s guard Active task queue of sets of parameter objects that collectively satisfy all of task’s guards

Task Dispatch Precisely maintain parameter sets If an object is in a parameter set It satisfy the flag component of the guard Is bound to the correct types of tags All objects that satisfy parameter’s guard are in parameter set Active task queue is conservative If a set of objects could potentially satisfy all of task’s guards, it is in the task queue Must check that set of objects in a task queue invocation satisfies guards before invoking task

Task Dispatch When a new object is added to parameter set, create corresponding task queue invocations Search for objects that satisfy tag guards Idea: Use tags to prune search When we add an object with a tag guard to the set, use tags to prune search of other parameter objects that must be bound to the same tag

Task Binding Iteration Structure computation as a list of iterators over tags and objects Multiple types of iterators: Over tags bound to object Over objects bound to tag Over objects in parameter set Want to prune search early – ordering is important Statically generate iterator orderings for each parameter set of each task

Initial Experiences Implemented Bristlecone compiler and runtime Have evaluated system on several benchmarks including: Web Server Web Spider Chat Server Developed a Bristlecone and Java versions of each Java versions were designed to use threads to provide resilience to failures Randomly injected failures into executions

Web Spider Workload is a set of 100 web pages Java version implemented using a thread pool architecture 100 trials on each version Randomly injected 3 halting failures into each execution With injected failures Java version fetched average of 6 pages Bristlecone version fetched average of 91 pages

Web Server Web Server with support for e-commerce transactions Java version spawns a thread for each connection 200 trials on each version Randomly injected 50 halting failures into each execution With injected failures Java failed to serve inventory requests in 4.5% of trials, Bristlecone failed in 1.5% Java had correct inventory responses in 68.6%, Bristlecone in 100%

Chat Server Chat server allows multiple users to chat Java version spawns a thread for each connection 100 trials on each version Workload sent 800 messages Randomly injected 10 failures into each execution With injected failures Java version failed to serve 39.9% of messages Bristlecone version failed to serve 19.3% of messages

Related Work Traditional fault tolerance N-version programming Recovery blocks Exception handlers Languages Linda / Tuple spaces Orc Actors Argus Oz Erlang Software and Hardware Transactional Memory

Conclusions Bristlecone is a exciting approach to improve application reliability Initial experiences promising