Swaminathan Sundararaman, Yupu Zhang, Sriram Subramanian, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Memory Management and Protection Part 3:Virtual memory, mode switching,
Advertisements

The Linux Kernel: Memory Management
Memory management.
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Fast and Safe Performance Recovery on OS Reboot Kenichi Kourai Kyushu Institute of Technology.
File Systems.
Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau.
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
File System Implementation
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Exceptions in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Nooks: an architecture for safe device drivers Mike Swift, The Wild and Crazy Guy, Hank Levy and Susan Eggers.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
OS Spring’03 Introduction Operating Systems Spring 2003.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
CacheMind: Fast Performance Recovery Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.
CSE 451 Section 4 Project 2 Design Considerations.
Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.
AFID: An Automated Fault Identification Tool Alex Edwards Sean Tucker Sébastien Worms Rahul Vaidya Brian Demsky.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Operating System Program 5 I/O System DMA Device Driver.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Windows 2000 Memory Management Computing Department, Lancaster University, UK.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
Networked File System CS Introduction to Operating Systems.
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
5.1 Advanced Operating Systems Operating Systems Bugs Linux's code has been significantly growing during last years. Is this code bugs free? Obviously.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
A Software Layer for Disk Fault Injection Jake Adriaens Dan Gibson CS 736 Spring 2005 Instructor: Remzi Arpaci-Dusseau.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Using Model Checking to Find Serious File System Errors StanFord Computer Systems Laboratory and Microsft Research. Published in 2004 Presented by Chervet.
G53SEC 1 Reference Monitors Enforcement of Access Control.
File System Implementation
Computer Systems Week 14: Memory Management Amanda Oddie.
1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift File System Kernel.
IRON for JFFS2 Presented by: Abhinav Kumar Raja Ram Yadhav Ramakrishnan.
Chapter 11 – File-System Implementation (Pgs )
E X PLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
10. Epilogue ENGI 3655 Lab Sessions.  We took control of the computer as early as possible, right after the end of the BIOS  Our multi-stage bootloader.
Kernel Structure and Infrastructure David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
WSRR 111 Coerced Cache Eviction and Discreet Mode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale, Vijay Chidambaram, Deepak Ramamurthi Andrea.
REAL-TIME OPERATING SYSTEMS
Module 12: I/O Systems I/O hardware Application I/O Interface
Jonathan Walpole Computer Science Portland State University
Presented by: Daniel Taylor
Journaling File Systems
Virtual Memory © 2004, D. J. Foreman.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Page Replacement.
Light-weight Contexts: An OS Abstraction for Safety and Performance
CS703 - Advanced Operating Systems
CSE451 Virtual Memory Paging Autumn 2002
Operating System Introduction.
System calls….. C-program->POSIX call
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
Virtual Memory © 2004, D. J. Foreman.
Presentation transcript:

Swaminathan Sundararaman, Yupu Zhang, Sriram Subramanian, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

 Why do file systems not crash all the time?  Bad things rarely happen  Common case code: frequently run code  Well tested – run all the time by users  “Hardened” code – lower failure probability  Ideal: if everything was common case code  We can significantly reduce the occurrence of bugs 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation2

 Code to handle exceptions/errors/failures  Worst property: rarely run but when executed must run absolutely correctly  Prior work uncovered bugs in recovery code  Memory allocation [Engler OSDI ’00, Yang OSDI ‘04, Yang OSDI ‘06]  Error propagation [Gunawi FAST ‘08, Rubio-Gonzalez PLDI ’09]  Missing recovery code [Engler OSDI ’00, Swift SOSP ’03]  Focus on memory allocation failures 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation3

 Memory is a limited resource  Virtualization, cloud computing (data centers)  Buggy components slowly leak memory  Memory is allocated throughout the OS  Core kernel code, file systems, device drivers, etc.  Allocation requests may not succeed  Memory can be allocated deep inside the stack  Deep recovery is difficult [Gunawi FAST ‘08, Rubio-Gonzalez PLDI ’09] FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/20154

Fault injection during memory allocation calls  15 runs of μbenchmark .1,.5 failure prob.  Error - good  Abort, unusable, or inconsistent - bad 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation FS probabilty Process StateFile-system State ErrorAbortUnusableInconsistent ext ext Btrfs Btrfs jfs jfs xfs xfs

 Deadlocks  Requests need not make progress  Not always possible  Critical sections, interrupt handlers  What about GFP_NOFAIL flag?  “GFP_NOFAIL should only be used when we have no way of recovering from failure.... GFP_NOFAIL is there as a marker which says ’we really shouldn’t be doing this but we don’t know how to fix it’” - Andrew Morton FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/20156

 Attempt to make common case the ONLY case  Pre-allocate memory inside OS (context of file systems) 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation7 Memory Allocation Application Kernel Syscall Block Driver Pre-allocated Memory Pre-allocated Memory Application Kernel Syscall Block Driver Vanilla KernelAMA Kernel Cleanup Recovery code not scattered Shallow recovery Code naturally written Advantages Mantra: Most robust recovery code is recovery code that never runs at all Mantra: Most robust recovery code is recovery code that never runs at all

 We have evaluated AMA with ext2 file system  ext2-mfr (memory failure robust ext2)  Robustness  Recovers from all memory allocation failures  Performance  Low overheads for most user workloads  Memory overheads  Most cases: we do really well  Few cases: we perform badly 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8

 Introduction  Challenges  Anticipatory Memory Allocation (AMA)  Reducing memory overheads  Evaluation  Conclusions 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation9

8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation10  Different types of memory allocation calls  kmalloc(size, flag)  vmalloc(size, flag)  kmem_cache_alloc(cachep, flag)  alloc_pages(order, flag) Need: to handle all memory allocation calls

 Hard to determine the number of objects allocated inside each function  Simple calls  Parameterized & conditional calls  Loops  Function calls  Recursions FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/2015 struct dentry *d alloc(..., struct qstr *name) {... if (name → len > DNAME INLINE LEN-1) { dname = kmalloc(name → len + 1, …); if (!dname) return NULL;... }} 11

 Introduction  Challenges  Anticipatory Memory Allocation (AMA)  Reducing memory overheads  Evaluation  Conclusions 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation12

8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation Application System Call File System Disk Kernel VFS User Vanilla Kernel Mem Mgmt Input Arguments FS VFS MM Memory Allocation Calls Application System Call Disk Kernel User AMA Kernel Input Arguments Legend Pre-allocate Memory 13 How to use the pre-allocated objects? Static analysis Runtime support How much to allocate?

FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/2015 If (c==0) {. print (“Driver init”); }. Kernel code CIL [Necula CC ‘02] 0: Call graph Nodes: 2k Edges: 7k LOC: 180k Syscall Memory allocation functions 1: Pruning Seen Before? Is KMA? 2: Loops & recursions Generate Allocation Relevant Graph 1. Identify loops and recursions. 2. Detect exit condition Nodes: 400 LOC: 9k Loops Recursion G 14 FunctionStatements Dependency List d_alloc size = name->len+1 arg N: name 3: Slicing & backtracking 3: Slicing & backtracking If (name->len > DNAME_INLINE_LEN-1 ) struct dentry *d alloc(..., struct qstr *name) { if (name → len > DNAME INLINE LEN -1) { 101 dname = kmalloc(name → len + 1, …); 102 if (!dname) return NULL; …}} Output of slicing: Function: d_alloc() dname = kmalloc(name->len +1, …); kmalloc size = name->len+1; A B C F G cache_alloc(inode_cache) 3 FunctionAllocationsDependency Akmalloc(name->len+1)name Bkmalloc(name->len+1)name Ccache_alloc(Inode_cache) Fkmalloc(…) + cache_alloc(inode_cache)name Gkmalloc(…)+cache_alloc(inode_cache)name kmalloc(name->len+1) Allocation equation for each system call 1

8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation System state Input parameters Allocation descriptor Pre-allocate objects Application Syscall Function c obj = kmalloc(..) Function g pg = alloc_page(..) Application Syscall Pre-allocated Objects Phase 1: Pre-allocationPhase 2: Using pre-allocated memoryPhase 3: Cleanup loff t pos = file pos read(file); err = AMA CHECK AND ALLOCATE(file, AMA SYS READ, pos, count); If (err) return err; ret = vfs read(file, buf, count, &pos); file pos write(file, pos); AMA CLEANUP(); Attached to the process VFS read example (count+ReadAhead+Worst(Indirect))×(PageSize+BufferHead) 15

 What if pre-allocation fails?  Shallow recovery: beginning of a system call ▪ No actual work gets done inside the file system ▪ Less than 20 lines of code [~Mantra]  Flexible recovery policies  Fail-immediate  Retry-forever (w/ and w/o back-off) FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/201516

 Introduction  Challenges  Anticipatory Memory Allocation (AMA)  Reducing memory overheads  Evaluation  Conclusions 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation17

 Hard to accurately predict memory requirements  Depends on current fs state (e.g., bitmaps)  Conservative estimate  Results in over allocation  Infeasible under memory pressure Need: ways to transform worst case to near exact 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation18

 Static analysis ignores cached objects Read: file 1 pages 1 to 4 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation 234 Normal Mode Page Cache File System Application AMA Read: file1 pages 1 to Page Cache File System Application AMA Cache Peeking Pin 19

 Data need not always be cached in memory  Upper bound for searching entries are high 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation Dir A Entry could be in any of the N pages We always need to allocate max. pages 123 N 1 Allocate a page and recycle it inside loop Page Recycling Normal 2 3N Other examples: searching for a free block, truncating a file 20

 Introduction  Challenges  Anticipatory Memory Allocation (AMA)  Reducing memory overheads  Evaluation  Conclusions 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation21

 Case study: ext2  AMA version: ext2-mfr (memory failure robust)  Questions that we want to answer:  How robust is AMA to memory allocation failures?  Space and performance overheads during user workloads?  Setup:  2.2 GHz Opteron processor & 2 GB RAM  Linux  Two 80 GB western digital disk 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation22

FS probabilty Process StateFile-system State ErrorAbortUnusable Inconsisten t ext ext Ext2-mfr Ext2-mfr Ext2-mfr Ext2-mfr Ext2-mfr Ext2-mfr FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/2015 Retry Error 23

8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation24 Less than 7% overhead for all workloads Elapsed Time (s)

Workload ext2 (GB) ext2-mfrext2-mfr + peek (GB)Overhead(GB)Overhead Sequential Read x x Sequential Write x x Random Read x x Random Write x x PostMark x x Sort x x OpenSSH x x 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation25 Less than 4% overhead for most workloads

 Introduction  Challenges  Anticipatory Memory Allocation (AMA)  Reducing memory overheads  Evaluation  Conclusions 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation26

 AMA: pre-allocation to avoid recovery code  All recovery is done inside a function  Unified and flexible recovery policies  Reduce memory overheads ▪ Cache peeking & page recycling  Evaluation  Handles all memory allocation failures  < 10% (memory & performance) overheads 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation27

“Act as if it were impossible to fail” – Dorothea Brande Mantra: Most robust recovery code is recovery code that never runs at all 8/28/2015FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation28

Advanced Systems Lab (ADSL) University of Wisconsin-Madison FAST '11: Making the Common Case the Only Case with Anticipatory Memory Allocation8/28/201529