CS444/544 Operating Systems II System Calls and Page Fault

Slides:



Advertisements
Similar presentations
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Advertisements

CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Memory Management (II)
OS Spring’03 Introduction Operating Systems Spring 2003.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Chapter 4 Memory Management Virtual Memory.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 9: Virtual Memory.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1 Virtual Memory (I). 2 Outline Physical and Virtual Addressing Address Spaces VM as a Tool for Caching VM as a Tool for Memory Management VM as a Tool.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
Virtual Memory Alan L. Cox Some slides adapted from CMU slides.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Lecture 11 Virtual Memory
Virtualization Virtualize hardware resources through abstraction CPU
ECE232: Hardware Organization and Design
Operating Systems CMPSC 473
CS161 – Design and Architecture of Computer
CSE 120 Principles of Operating
Protection and OS Structure
Section 9: Virtual Memory (VM)
Virtual Memory User memory model so far:
CS703 - Advanced Operating Systems
Today How was the midterm review? Lab4 due today.
Anton Burtsev February, 2017
Modeling Page Replacement Algorithms
x86 segmentation, page tables, and interrupts
CS510 Operating System Foundations
CSE 153 Design of Operating Systems Winter 2018
ECE 391 Exam 2 HKN Review Session
CS 105 “Tour of the Black Holes of Computing!”
CSE 153 Design of Operating Systems Winter 2018
Virtual Memory: Concepts /18-213/14-513/15-513: Introduction to Computer Systems 17th Lecture, October 23, 2018.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS 105 “Tour of the Black Holes of Computing!”
Modeling Page Replacement Algorithms
Virtual Memory CSCI 380: Operating Systems Lecture #7 -- Review and Lab Suggestions William Killian.
Instructors: Majd Sakr and Khaled Harras
Virtual Memory.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
Virtual Memory Prof. Eric Rotenberg
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS 105 “Tour of the Black Holes of Computing!”
CSCI 380: Operating Systems William Killian
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CS 105 “Tour of the Black Holes of Computing!”
CSE 153 Design of Operating Systems Winter 2019
CSE 471 Autumn 1998 Virtual memory
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Interrupts and System Calls
Instructor: Phil Gibbons
Presentation transcript:

CS444/544 Operating Systems II System Calls and Page Fault Yeongjin Jang 04/30/19

About Lab 1 & Tokens There will be no Token deduction from Lab 1 You all have 5 tokens to use for Lab 2 and onward Missing lab1-final tag -5 pts (max 50), i.e., missing the tag will get 9% if you got 50/50

About Lab 1 Missing student.info COMMIT your info for lab2, wuyo, Yuchia, wiltons, gaomin, jmanosu, xiongz, para, jaocalex, Mythcell, kulkarnr, REK, silverrhealm COMMIT your info for lab2, And send me an e-mail about the information ASAP GET 0 pts if the info is not presented

Recap: System Call (User) User calls a function cprintf -> calls sys_cputs() sys_cputs() at user code will call syscall() (lib/syscall.c) This syscall() is at lib/syscall.c Set args in the register and then int $0x30 Now kernel execution starts…

Recap: System Call (Kernel) CPU gets software interrupt TRAPHANDLER_NOEC(T_SYSCALL…) _alltraps() trap() trap_dispatch() Get registers that store arguments from struct Trapframe *tf Call syscall() using those registers This syscall() is at kern/syscall.c

Recap: System Call (return to User) Finishing handling of syscall (return of syscall()) trap() calls env_run() Get back to the user environment! env_pop_tf() Runs iret Back to Ring 3! Restore trap frame from the stack

Recap: A High-level Overview of User/Kernel Execution User Level (Ring 3) printf() A library call in ring 3 Libraries sys_write() A system call, From ring 3 to ring 0 OS Kernel (Ring 0) do_sys_write() A kernel function

Recap: A High-level Overview of User/Kernel Execution User Level (Ring 3) printf() A library call in ring 3 ret (ring 3) Libraries sys_write() iret (ring 0 to ring 3) A system call, From ring 3 to ring 0 OS Kernel (Ring 0) do_sys_write() A kernel function

Today’s Topic More about System Call Page Fault How an OS handle a fault and resume the execution?

Library Calls vs. System Calls Ring 3 App Library Calls Library Calls APIs available in Ring 3 DO NOT include operations in Ring 0 Could be a wrapper for some computation or Could be a wrapper for system calls E.g., printf() internally uses write(), which is a system call Some system calls are available as library calls As wrappers in Ring 3 OS Syscalls Hardware

Library Calls vs. System Calls APIs available in Ring 0 OS’s abstraction for hardware interface for userspace Called when Ring 3 application need to perform Ring 0 operations Ring 3 cannot do arbitrary Ring 0 operations Entries are only system calls (cannot call arbitrary kernel functions) Kernel checks all the arguments at the entrance A secure method to control access to Ring 0!

How an OS Protects System Call? Call gate System call can be invoked only with trap handler int $0x30 – in JOS int $0x80 – in Linux (32-bit) int $0x2e – in Windows (32-bit) sysenter/sysexit (32-bit) syscall/sysret (64-bit) OS performs checks if userspace is doing a right thing Before performing important ring 0 operations E.g., accessing hardware.. Ring 3 App Library Calls int $0x30 CHECK!! OS Syscalls Hardware

An Example of Check at Syscall Entry How can we protect ‘read()’ system call? read(int fd, void *buf, size_t count) Read count bytes from a file pointed by fd and store those in buf Usage

An Example of Check at Syscall Entry Problem: what will happen if we call… This will overwrite kernel code with your keystroke typing.. Changing kernel code from Ring 3 is possible!

Call Gate We can hook all syscalls from Ring 3 at our syscall trap handler Algorithm Get the trap for syscall Check for the syscall number (3 for read in Linux) Check for its arguments (fd, buf, size) buf must points to Ring 3 writable address, not the kernel address If passed, perform the ‘read’ operation Otherwise, return -EFAULT

Call Gate We can hook all syscalls from Ring 3 at our syscall trap handler Algorithm Get the trap for syscall Check for the syscall number (3 for read in Linux) Check for its arguments (fd, buf, size) buf must points to Ring 3 writable address, not the kernel address If passed, perform the ‘read’ operation Otherwise, return -EFAULT Ring 3 App Library Calls int $0x30 CHECK!! OS Syscalls Hardware

Test

Check How System Calls are Invoked Use strace in Linux

Check How Library Calls are Invoked Use ltrace in Linux

Page Fault: A Case of Handling Faults Occurs when paging fails !(pde&PTE_P) or !(pte&PTE_P): invalid translation Write access but !(pte&PTE_W): access violation Access from user but !(pte&PTE_U): protection violation Faults Faulting instruction has not executed (e.g., page fault) Resume the execution after handling the fault

Page Fault: an Example Accessing a Kernel address from User

Page Fault: What Does CPU Do? CPU would like to let OS know why and where such a page fault happened CR2: stores the address of the fault Error code: stores the reason of the fault 1 0xf0100064

CPU/OS Execution Example User program accesses 0xf0100064 CPU generates page fault (pte&PTE_U == 0) Calls page fault handler in IDT OS: page_fault_handler Read CR2 (address of the fault, 0xf0100064) Read error code (contains the reason of the fault) Resolve error (if not, generate SIGSEGV, segmentation fault!) Continue user execution User: resume on that instruction (or stop by SIGSEGV)

Fault Resume Example: Stack Overflow inc/memlayout.h We allocate one (1) page for the user stack If you use a large local variable on the stack Stack overflow (stack grows down…) NOT MAPPED!

Some Idea: Allocating New Stack Automatically Can we detect such an access and allocate a new page for the stack automatically? Yes We will utilize ‘Page Fault’ Observations Stack overflow would be sequential (access pages adjacent to the stack) We should catch both read/write access (both should fault)

Example: New Stack Allocation by Fault (User) Stack starts at 0xeebfe000 Suppose the current value of esp (stack) is 0xeebfe010 User program creates a new variable: char buf[32] buf = 0xeebfdff0 Buffer range: 0xeebfdff0 ~ 0xeebfe010 On accessing buf[0] = ‘1’; movb $0x31, (%eax) eax = 0xeebfdff0 No translation for 0xeebfd000

Example: New Stack Allocation by Fault (CPU) Lookup page table No translation! Store 0xeebfdff0 to CR2 Set error code “The fault was caused by a non-present page!” Call page fault handler (interrupt #14) NOT MAPPED!

Example: New Stack Allocation by Fault (OS) CPU invoked the Page_fault_handler() Read CR2 0xeebfdff0, it seems like the page right next to current stack page end Current end is: 0xeebfe000 Read error code “The fault was caused by a non-present page!” Let’s allocate a new page for the stack!

Example: New Stack Allocation by Fault (OS) Allocate a new page for the stack Struct PageInfo *pp = page_alloc(ALLOC_ZERO); Get a new page, and wipe it to have all zero as its contents page_insert(env_pgdir, pp, 0xeebfd000, PTE_U|PTE_W); Map a new page to that address! iret!

Example: New Stack Allocation by Fault (User-Return) On accessing buf[0] = ‘1’; movb $0x31, (%eax) eax = 0xeebfdff0 No translation for 0xeebfd000 Execute the faulting instruction again: buf[0] = ‘1’; eax = 0xeebfdff0 Now translation is valid! Continue to execute the loop..

Other Useful Examples of Using Page Fault Copy-on-Write (CoW) Share pages read-only Create a private copy when the first write access happens Memory Swapping Limited amount of physical memory We have a bigger storage: SSD, Hard Disk, online storage, etc. Can we store some ‘currently unused but will be used later’ part into the disk?

Copy-on-Write Think about vm-jos server Will run many /bin/bash, /usr/bin/gdb, /usr/bin/tmux, etc. Each of you will run those programs!! How can we build an OS to efficiently load them and minimize memory usage?

A Program .text .rodata .data .bss .bss (RW-) Code area. Read-only and executable .rodata Data area, Read-only and not executable .data Data area, Read/Writable (not executable) Initialized by some values .bss Initialized as 0 .data (RW-) .rodata (R--) .text (R-X)

Page Cache Store one copy of file in the memory Process 2 .bss (RW-) Do we need to copy the same data for each process creation? Store one copy of file in the memory .data (RW-) Process 1 .bss (RW-) .bss (RW-) .rodata (R--) .data (RW-) .data (RW-) .text (R-X) .rodata (R--) .rodata (R--) .text (R-X) .text (R-X)

Sharing by Read-only Set page table to map the same physical address to share contents Process 1 Process 2 .bss (RW-) .bss (R--) .bss (R--) .data (RW-) .data (R--) .data (R--) .rodata (R--) .rodata (R--) .rodata (R--) .text (R-X) .text (R-X) .text (R-X)

OK for Read-only Sections How can Process 1 write on .bss?? Process 1 .bss (RW-) .bss (R--) Write Page fault! .data (RW-) .data (R--) .rodata (R--) .rodata (R--) .text (R-X) .text (R-X)

Page Fault Handler Read CR2 Read Error code ToDo An address that is in the page cache Hmm… a fault from one of the shared location! Read Error code Write on read-only memory Hmm… the process requires a private copy! ToDo Map a new physical page (page_alloc, page_insert) Copy the contents Mark it read/write Resume…

Copy–on-Write How can Process 1 write on .bss?? .bss (RW-) Process 1 MAP! .bss (RW-) .bss (RW-) .bss (R--) Write Page fault! .data (RW-) .data (R--) .rodata (R--) .rodata (R--) .text (R-X) .text (R-X)

Benefits? Can reduce time for copying contents that is already in some physical memory (page cache) Can reduce actual use of physical memory by sharing code/read-only data among multiple processes 1,000,000 processes, requiring only 1 copy of .text/.rodata At the same time Can support sharing of writable pages (if not has written at all) Can create private pages seamlessly on write

Memory Swapping Memory Hierarchy FAST Expensive Reg (KB) Cache (MB) DISK (TB? PB?) Main Memory (GB) Cache (MB) Reg (KB) Memory Hierarchy SIZE

Challenge Suppose you have 8GB of main memory Can you run a program that its program size is 16GB? Yes, you can load them part by part This is because we do not use all of data at the same time Can your OS do this execution seamlessly to your application?

Swapping PT Virtual Memory Physical Memory pgdir PT

Swapping – Remove a page… PT Virtual Memory Physical Memory Page Fault! Access pgdir PT DISK

Swapping - OS Page fault handler Read CR2 (get address) Read error code If error code says that the fault is caused by non-present page and The faulting page of the current process is stored in the disk Load that page into physical memory Map it!

Swapping – Remove a page… PT Virtual Memory Physical Memory Create new map! Page Fault! Access pgdir Allocate New page! PT DISK READ from DISK

Page Fault A fault is meant to be handled by OS, and after that, it is meant to be resuming the execution from the faulting point Page fault handler is useful for implementing Automatic stack allocation Copy-on-write (will do in Lab4) Swapping