Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.

Slides:



Advertisements
Similar presentations
Operating Systems (CSCI2413) Lecture 2 Overview phones off (please)
Advertisements

User-Mode Linux Ken C.K. Lee
Completely Fair Scheduler Alireza Heidari. Introduction The Completely Fair Scheduler (CFS) is a process scheduler. Merged into the release of.
Computer Systems/Operating Systems - Class 8
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
Introduction to Operating Systems What is an operating system? Examples How do many programs run at the same time, with one processor?
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
CS533 Concepts of Operating Systems Class 3 Integrated Task and Stack Management.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
Exceptions, Interrupts & Traps
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
What is Concurrent Programming? Maram Bani Younes.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Operating system Structure and Operation by Dr. Amin Danial Asham.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
Chapter 4 Storage Management (Memory Management).
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Processes and Process Control 1. Processes and Process Control 2. Definitions of a Process 3. Systems state vs. Process State 4. A 2 State Process Model.
Operating System Principles And Multitasking
LINUX SCHEDULING Evolution in the 2.6 Kernel Kevin Lambert Maulik Mistry Cesar Davila Jeremy Taylor.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Introduction to Operating Systems and Concurrency.
Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture notes Dr.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
CS 542: Topics in Distributed Systems Self-Stabilization.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Concepts and Structures. Main difficulties with OS design synchronization ensure a program waiting for an I/O device receives the signal mutual exclusion.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
An operating system for a large-scale computer that is used by many people at once is a very complex system. It contains many millions of lines of instructions.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
1.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 1: Introduction What Operating Systems Do √ Computer-System Organization.
Module 12: I/O Systems I/O hardware Application I/O Interface
Process Management Process Concept Why only the global variables?
Operating Systems : Overview
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 3: Windows7 Part 2.
Operating Systems : Overview
Module 2: Computer-System Structures
Fault Tolerance Distributed Web-based Systems
Chapter 3: Windows7 Part 2.
Operating Systems : Overview
What is Concurrent Programming?
Process Description and Control
Architectural Support for OS
Operating Systems : Overview
Operating Systems : Overview
What is Concurrent Programming?
Subject Name: Operating System Concepts Subject Number:
Architectural Support for OS
Module 2: Computer-System Structures
CSE 153 Design of Operating Systems Winter 2019
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The Linux Kernel The kernel is the central component of most computer operating systems. Its responsibilities include managing the system's resources (the communication between hardware and software components). The main task of the kernel is to allow the execution of applications and support them with features such as hardware abstractions.

The Kernel (Cont.) The kernel also responsible of the high level scheduler, the one who decides which processes will be in the memory and which on the disk. The process is the main kernel abstraction, it defines which memory portions the application can access.

The Scheduler The low level scheduler decides which of the ready, in-memory processes are to be executed (allocated a CPU) next following a clock interrupt, an IO interrupt, an operating system call or another form of signal. In the project we will deal with the Completely Fair Scheduler implementation of the Linux Scheduler.

RB-Tree as Scheduler data structure The run queue is kept sorted by the Runnable threads' virtual runtimes by storing it in a Red- Black Tree, which is a variant of a binary search tree. When the scheduler decides to switch threads, it switches to the leftmost thread in the red-black tree, that is, the one with the earliest virtual runtime.

The RB-Tree

The RB-Tree (Cont.) The RB-Tree data structure must keep on the following properties: Every node is either black or red. All leafs are black. Both children of a red node are black. Every path from a node to a leaf should have the same count of black nodes.

The RB-Tree (Cont.) Each violation of one of those properties can cause a crash in the next tree operation which means, possible corruption of data! In order to keep track on these properties we are executing our mechanism which keeps a close eye on this data structure by running frequent tests on it upon a corruption detection, The program can self stabilize the system which means – Auto recovery.

Self Stabilizing Self-stabilization is a concept of fault-tolerance in distributed computing. A self-stabilizing system will end up in a correct state no matter what state it is initialized with, and no matter what execution steps it will take. The ability to recover without external intervention is very desirable in modern computer since it would enable them to repair errors and return to normal operations on their own. Computers can thus be made fault-tolerant.

The project Goal Detect & Recover from a corruption of the Scheduler’s RB-Tree data structure. Method Perform series of tests on the Scheduler at the following scenarios:  Periodic (i.e. every couple of time units).  On a Page Fault occurrence(which may indicate memory corruption). The tests include legitimacy tests and memory tests. In case of a failure we will stabilize the system – by Auto recovering which is done by rebuilding the data structure from currently running processes which in turn will let the processes run normally on the next scheduling.

The project – Some examples In this example one of the nodes in the tree points to some garbage in memory, A thing that definitely can happen in an operating system.

The project – Some examples The next figure demonstrates a case when the root changed it’s color from black (mandatory on RB-tree) to red. This will probably cause a corrupt on the next insertion of the process to the tree which can end up in a wrong structure of the tree in the best case or a system crash in the worst case.

Running the recovery After detecting the inconsistency, we run the recovery procedure which yields the following results for the above examples: For example 1 For example 2

Project Process The project was divided into 3 stages: In depth understanding on how the CFS kernel scheduler works and the data structures it uses. Understanding the self stabilizing mechanism “Hacking” the Linux Kernel, changing/adding data structures and interaction with existing kernel code.

Difficulties: Coding inside the Kernel – Data structure's encapsulation (Modules vs. Embedded Code). – Synchronization (Spin Locks, Scheduler preemption). – Enormous amount of interconnectivity. Debugging the kernel – (our advice - AVOID IT IF YOU CAN) – Embedded Debugger KDB ( new versions support KGDB ). – UML (User Mode Linux) vs. VMware. – Asynchronous system ( Interrupts & Exceptions ).

Demonstration – corruption without recovery

Demonstration – corruption with recovery

References and Utilities:  S. Dolev - Self stabilization, the MIT Press,  Understanding the Linux Kernel - Daniel P. Bovet & Marco Cesati.  Kernel Hacking - Kong, Joseph.  KDB – embedded kernel debugger.  VMware – virtual machine for the simulations.  UML – User Mode Linux (embedded VM).

Q & A