By, Gumpalli NagaLaxmi Prasanna. Outline: Abstract Introduction Capabilities in L4RE Capability Fault Handling Related Work Conclusion References.

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.

Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

Remote Procedure Call (RPC)

Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.

Chorus Vs Unix Operating Systems Overview Introduction Design Principles Programmer Interface User Interface Process Management Memory Management File.

Distributed Object & Remote Invocation Vidya Satyanarayanan.

Microkernels How to build a dependable, modular and secure operating system?

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

Windows 2000 and Solaris: Threads and SMP Management Submitted by: Rahul Bhuman.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

3.5 Interprocess Communication

USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.

Figure 1.1 Interaction between applications and the operating system.

Comparative Operating Systems Understanding the Kernel Structure Prashant Thuppala.

Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.

Case Study: The E1 Distributed Operating System Chris Krentz 3/20/2007.

.NET Mobile Application Development Introduction to Mobile and Distributed Applications.

H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.

Sun NFS Distributed File System Presentation by Jeff Graham and David Larsen.

Advances in Language Design

1 A Flexible and Secure Deployment Framework for Distributed Applications Alan Dearle, Graham Kirby, Andrew McCarthy and Juan Carlos Diaz y Carballo School.

Presentation by Betsy Kavali

Paper Review Mach : A New Kernel Foundation For UNIX Development Chan Seok Kang 2013/02/26.

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.

Extensibility, Safety and Performance in the SPIN Operating System Ashwini Kulkarni Operating Systems Winter 2006.

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.

DCE (distributed computing environment) DCE (distributed computing environment)

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,

The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

07/09/04 Johan Muskens ( TU/e Computer Science, System Architecture and Networking.

OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.

Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai

Processes Introduction to Operating Systems: Module 3.

An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.

GLOBE DISTRIBUTED SHARED OBJECT. INTRODUCTION  Globe stands for GLobal Object Based Environment.  Globe is different from CORBA and DCOM that it supports.

EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM

Shuman Guo CSc 8320 Advanced Operating Systems

Chapter 5: Distributed objects and remote invocation Introduction Remote procedure call Events and notifications.

DEAS2005Michael Shin Copyright1 Connector-Based Self-Healing Mechanism for Components of a Reliable System Michael E. Shin Department of Computer Science.

Eric Tryon Brian Clark Christopher McKeowen. System Architecture The architecture can be broken down to three different basic layers Stub/skeleton layer.

GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.

M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.

- Manvitha Potluri. Client-Server Communication It can be performed in two ways 1. Client-server communication using TCP 2. Client-server communication.

© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.

1 Isolating Web Programs in Modern Browser Architectures CS6204: Cloud Environment Spring 2011.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.

EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.

Object Interaction: RMI and RPC 1. Overview 2 Distributed applications programming - distributed objects model - RMI, invocation semantics - RPC Products.

Fault Tolerance Distributed Web-based Systems

Presentation transcript:

By, Gumpalli NagaLaxmi Prasanna

Outline: Abstract Introduction Capabilities in L4RE Capability Fault Handling Related Work Conclusion References

Abstract: Current research in operating systems focuses either on security or on reliability. In this paper, we present L4ReAnimator, a framework that allows restarting crashed applications and reestablishing lost communication channels on top of the Fiasco.OC microkernel. It therefore effectively combines the already existing capability-based security architecture of Fiasco.OC with reliability features at a reasonable cost.

Introduction: Research in embedded systems and hardware indicates that future systems will be much more susceptible to errors. Reasons: smaller hardware structure sizes leading to a higher impact of radiation to transistor state, temperature- induced problems due to over-heating of some areas of the chip, higher alterations of transistor aging, and production- induced component faults. In this paper we present L4ReAnimator, an extension to the L4 Runtime Environment (L4Re) running on top of Fiasco.OC. L4ReAnimator provides a framework to semi- transparently reintegrate crashed applications into a running system.

Capabilities in L4RE: L4Re Overview: Operating system platform comprises the Fiasco.OC microkernel and the L4Re user-level runtime environment. The system is organized as a set of interacting objects. The kernel provides spatial isolation between objects in form of tasks. The basic unit of execution is a thread. Objects interact by calling functions of other objects similar to the idea of object-oriented programming. This invocation is the only system call present in Fiasco.OC. In order to maintain absolute control over object rela- tionships, there are no globally accessible objects in L4Re. Instead, the microkernel manages a per-task table of capabilities referencing objects.

Each task can denote the objects it has access to by their capability slot number in this table. Keeping the capability space local to the task prevents tasks from obtaining knowledge about the rest of the system. An advanced feature of L4Re name spaces are session capabilities. These represent a dynamically created client-server communication channel. Sessions are not created directly by the client, but by its name space manager.

Example: Figure 1: Session start up The server creates a service management capability (S) and registers it in its name space

Figure 2: Session initialization The loader initiates a session using the S capability (1). The server creates and returns a new session capability C (2).

Figure 3: Session use The client queries its name space for a service capability (1) and gets C mapped into its capability table. Thereafter, client and server use C for communication (2).

Figure 4: Crash After a crash, the session and service capabilities get destroyed and client and loader possess dangling references to these capabilities.

Capability Fault Handling: Restartability Requirements: 1. Fault containment aims at limiting propagation of errors throughout the system. 2. Once a crashed component is restarted, it needs to be reintegrated into the running system. 3. Server applications usually keep a certain amount of client- related state. When restarting the server, this state needs to be rescued in order to transparently continue serving the client. This requirement is called persistence. 4. Another commonly mentioned requirement for a restartability mechanism is transparency.

Capability Fault Handling in L4Re: Figure 5: L4ReAnimator Architecture

Detecting Capability Faults: When a capability disappears, an application will be in one of two situations: 1. The application is currently not in the process of invoking the capability. In this case re-establishment of the capability mapping is postponed until the application invokes the capability again. This invocation will result in an error notifying the application that a non-existing capability has been invoked. 2. The application is currently blocked on a capability invocation. In this case the kernel will report an error indicating that the invocation was cancelled.

Handling Capability Faults: Once a capability fault is raised using the previously described mechanism, the capability registry is used to look up a capability fault handler for the capability that caused the fault. The fault handler is a function that is executed to re-establish a lost capability mapping. In order to do so, the fault handler needs to know about the type of the underlying capability and about the protocol that is used for re-establishment.

Reintegrating Shared Resources: In addition to communicating via capabilities, L4Re allows applications to share resources. This allows implementation of shared-memory communication channels Figure 6: L4Re Memory management

Related work: 1. The BirliX operating system architecture is a distributed system comprising of objects. Objects interact through RPC via communication channels identified by globally unique IDs. This enables re-connecting objects after a crash. Our work combines object-level restartability with an existing capability-based access control mechanism in order to achieve security and fault tolerance. 2. Minix is a microkernel-based operating system explicitly designed for supporting restartability of its components. A reincarnation server keeps track of the system state and detects crashed components at termination or using a heart beat mechanism. A data storage server enables components to store their state across instantiations. Recovery of a crashed application is performed by the reincarnation server, which also notices interested clients of this situation.

3. EROS is similar to the operating system used in this work. In that it uses capabilities to enforce access control at the object level. EROS also takes into account fault tolerance by incorporating a mechanism to create checkpoints at runtime. These checkpoints always include the whole running system. This eases reinstantiation, because one does not need to care about re-establishing capability mappings for single components. Our approach provides a more fine-grained level of restartability, by allowing to restart and reintegrate single objects.

Conclusion: In this paper we presented L4ReAnimator, a generic frame-work for providing restart-able applications within the L4Re runtime environment. For clients, L4ReAnimator provides a generic framework that allows them to use service-provided fault handlers without further modifications to the client. Using L4ReAnimator we enhanced the capability-based L4Re operating system with the ability to reintegrate re-started components into a running system at a reasonable cost.

References: 1. Borkar, S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25 (2005), David, F. M., and Campbell, R. H. Building a self-healing operating system. In DASC '07: Proceedings of the Third IEEE International Symposium on Dependable, Autonomic and Secure Computing (Washington, DC, USA, 2007), IEEE Computer Society, pp David, F. M., Chan, E., Carlyle, J. C., and Campbell, R. H. Curios: Improving reliability through operating system structure. In Usenix Symposium on Operating Systems Design and Implementation (2008), R. Draves and R. van Renesse, Eds., USENIX Association, pp Feske, N., and Helmuth, C. Design of the Bastei OS architecture. Tech. Rep. TUD-FI Dezember-2006, TU Dresden, Gefflaut, A., Jaeger, T., Park, Y., Liedtke, J., Elphinstone, K., Uhlig, V., Tidswell, J., Deller, L., and Reuther, L. The SawMill multiserver approach. In ACM SIGOPS European Workshop 9/00 (2000).

Thank You!