Transparent Fault-Tolerant Java Virtual Machine Roy Friedman & Alon Kama Computer Science — Technion.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Computer System Architectures Computer System Software
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Advanced Operating Systems CIS 720 Lecture 1. Instructor Dr. Gurdip Singh – 234 Nichols Hall –
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.
Today’s Agenda  Quick Review  Finish Java Threads  The CS Problem Advanced Topics in Software Engineering 1.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Distributed Database Systems Overview
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Survey of Adding Fault Tolerance to Service Oriented Architecture Ingrid Buckley 03/26/09.
Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,
1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
Primary-Backup Replication COS 418: Distributed Systems Lecture 5 Kyle Jamieson.
Chapter 4 – Thread Concepts
Primary-Backup Replication
Advanced Operating Systems CIS 720
Chapter 4 – Thread Concepts
Advanced Operating Systems Lecture notes
Operating System Structure
Replication State Machines via Primary-Backup
Real-time Software Design
Threads, SMP, and Microkernels
Fault Tolerance Distributed Web-based Systems
Replication Improves reliability Improves availability
EEC 688/788 Secure and Dependable Computing
Lecture 4- Threads, SMP, and Microkernels
EEC 688/788 Secure and Dependable Computing
Replication State Machines via Primary-Backup
Outline Chapter 2 (cont) OS Design OS structure
Replication State Machines via Primary-Backup
Presented by: SHILPI AGARWAL
Mark McKelvin EE249 Embedded System Design December 03, 2002
System calls….. C-program->POSIX call
EEC 688/788 Secure and Dependable Computing
Chapter 2 Operating System Overview
Abstractions for Fault Tolerance
Presentation transcript:

Transparent Fault-Tolerant Java Virtual Machine Roy Friedman & Alon Kama Computer Science — Technion

FT-JVM Goals Fault-tolerant environment for executing Java applications Apps should execute without interruption, overcoming failures of individual machines Apps should not have to be modified in order to run on the system Highly Reliable Fault-tolerance can be extended by utilizing more machines Low Maintainability Recovery upon failure of individual machines should be swift Transparency Failures should be masked and the transition to another machine should be transparent

Fault tolerance by Replication Replication — Coordinating a set of replicas of the computation on processors that fail independently Potential for a dramatic decrease in Mean Time To Repair (MTTR) Achieve t -fault-tolerance, where t is the number of replicas  Increased cost of hardware for duplication of effort  Overhead and complexity of maintaining consistency Replication + Transparency (masking of failures, maintaining the illusion of a single copy) = High availability

Replication for Java Replication at the Java Virtual Machine level Replication at this level is cost-effective, portable, and transparent to the application developer and the user Approach extends Bressoud & Schneider (1995) who implemented active replication below the Operating System T. Bressoud and F. Schneider. Hypervisor-based Fault-Tolerance, SOSP-15

Design of the FT-JVM Replication requires deterministic execution. Difficult to achieve because of: Preemptive context switches Lock contention in SMP I/O availability differences Environment-specific attributes Changes made to the VM: Deterministic thread scheduling Deterministic thread switching Non-deterministic ops relay info to replication module

Design of the FT-JVM Replication module: One replication engine per processor, on both primary and backups Data packages are passed to engine on primary, retrieved from it for backups Threads waiting for I/O now yield instead, to be re-scheduled at specific intervals I/O is checked at beginning of a frame, determined by X context- switches or the lack of schedulable application threads primary backup End frame ACK Frame n data Frame n+1 data

Performance Results

SMP Raytrace

Conclusion Ideal for long-running, low-I/O Java applications Only a small performance degradation even for frequent synchronization between replicas (e.g. every second) Quick detection and recovery from failure