Automatic Compaction of OS Kernel Code via On-Demand Code Loading Haifeng He, Saumya Debray, Gregory Andrews The University of Arizona.

Slides:

Advertisements

Similar presentations

An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.

Advertisements

Virtual Memory Basics.

Part IV: Memory Management

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.

Memory Protection: Kernel and User Address Spaces  Background  Address binding  How memory protection is achieved.

Chapter 6: Memory Management

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.

Code Compaction of an Operating System Kernel Haifeng He, John Trimble, Somu Perianayagam, Saumya Debray, Gregory Andrews Computer Science Department.

Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.

Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.

Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.

Memory Management 2010.

Chapter 13 Embedded Systems

OS Spring’03 Introduction Operating Systems Spring 2003.

Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.

Microkernels: Mach and L4

Virtual Memory Management B.Ramamurthy. Paging (2) The relation between virtual addresses and physical memory addresses given by page table.

1 Virtual Memory Management B.Ramamurthy Chapter 10.

03/05/2008CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.

1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.

Flexicache: Software-based Instruction Caching for Embedded Processors Jason E Miller and Anant Agarwal Raw Group - MIT CSAIL.

Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.

Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &

CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.

COP 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.

Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.

Protection and the Kernel: Mode, Space, and Context.

Cosc 3P92 Week 9 & 10 Lecture slides

Exploring Suitability of Linux for Embedded Vision Applications Ankit Mathur Mayank Agarwal Mini Project.

COS 598: Advanced Operating System. Operating System Review What are the two purposes of an OS? What are the two modes of execution? Why do we have two.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.

1 Memory Management 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory 4.4 Page replacement algorithms 4.5 Modeling page replacement algorithms.

Virtual Memory Expanding Memory Multiple Concurrent Processes.

Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

Chapter 4 Memory Management Virtual Memory.

Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 9: Virtual Memory.

Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.

12/5/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,

Full and Para Virtualization

Memory Management. Why memory management? n Processes need to be loaded in memory to execute n Multiprogramming n The task of subdividing the user area.

Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 31 Memory Management.

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.

Memory Protection: Kernel and User Address Spaces Andy Wang Operating Systems COP 4610 / CGS 5765.

Virtualization.

Gwangsun Kim, Jiyun Jeong, John Kim

Memory Management.

Processes and threads.

Memory Protection: Kernel and User Address Spaces

Chapter 9: Virtual Memory

Memory Protection: Kernel and User Address Spaces

Memory Protection: Kernel and User Address Spaces

Memory Protection: Kernel and User Address Spaces

CS399 New Beginnings Jonathan Walpole.

CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.

Memory Management Tasks

Chapter 8: Memory management

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.

Virtual Memory Lecture notes from MKP and S. Yalamanchili.

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

COMP755 Advanced Operating Systems

CSc 453 Interpreters & Interpretation

Memory Protection: Kernel and User Address Spaces

Presentation transcript:

Automatic Compaction of OS Kernel Code via On-Demand Code Loading Haifeng He, Saumya Debray, Gregory Andrews The University of Arizona

Background General Purpose Operating Systems Resource constraints Limited amount of memory Reduce memory footprint of OS kernel code as much as possible Desktop Embedded Devices

General OS with Embedded Apps. Executed Needed (exception handling) Not needed but missed by existing analysis Statically proved as unnecessary by prior work Unexecuted but still can ’ t be discarded About 68% kernel code is not executed A Linux kernel with minimal configuration Profiling with MiBench suite 32% 18%-24%

Our Approach Limited amount of main memory Greater amount of secondary storage Memory HierarchyKernel Code lives in memory lives in secondary storage Hot code Cold code On-Demand Code Loading

A Big Picture Main Memory Remaining kernel code Code clustering Memory-resident kernel code Hot code Code buffer Accommodate one cluster at a time Core code Scheduler Memory management Interrupt handling Secondary Storage size(cluser)  size(code buffer)

Memory Requirement for Kernel Code Main Memory Hot code Code buffer Core code Size is predetermined Select the most frequently executed code How much hot code should stay in memory? The total size of memory- resident code  size(core code)x(1 + ) where  specified by user (e.g. 0%,10%) Size specified by user Upper-bound of memory usage for kernel code

Our Approach  Reminiscent of the old idea of overlays Purely software-based approach Does not require MMU or OSs support for VM  Main steps Apply clustering to whole-program control flow graph  Group “ related ” code together  Reduce cost of code loading Transform kernel code to support overlays  Modify control flow edges

Code Clustering  Objective minimize the number of code loading  Given: An edge-weighted whole-program control flow graph A list of functions marked as core code A growth bound  for memory-resident code Code buffer size BufSz  Apply a greedy node-coalescing algorithm until no coalescing can be carried out without violating Size of memory-resident code  size(core code)x(1+ ) Size of each cluster  BufSz

Code Transformation  Apply code transformation on Inter-cluster control flow edges Control flow edges from memory- resident code to clusters (but not needed on the other way) All indirect control flow edges (targets only known at runtime)

Code Transformation After clustering Cluster A Cluster B call F 0x220 F: Rewritten code Cluster A push &F call dyn_loader dyn_loader Cluster B (in code buffer) 0x2000x500 0x520 F: Runtime library 1. Address look up for &F 2. Load B into code buffer 3. Translate target addr &F into relative addr in code buffer

0x500 … push &F 0x530 call dyn_loader 0x540 pc Issue: Call Return in Code Buffer code buffer : start at 0x500 Runtime 0x200 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A Code Cluster A return address = 0x540

0x500 … 0x520 F: … 0x540 0x550 ret Call Return in Code Buffer 0x200 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A code buffer : start at 0x500 CodeRuntime Cluster B pc A has been overwritten by B! pc return address = 0x540 Load B into code buffer pc

0x500 … push &F 0x530 call dyn_loader 0x540 pc Issue: Call Return in Code Buffer code buffer : start at 0x500 Runtime 0x200 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A Code Cluster A return address = 0x540

0x500 … push &F 0x530 call dyn_loader 0x540 pc Call Return in Code Buffer code buffer : start at 0x500 Runtime 0x200 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A Code Cluster A return address = 0x540 = & dyn_restore_A dyn_restore_A Actual ret_addr = 0x140 Fix

0x500 … 0x520 F: … 0x540 0x550 ret Call Return in Code Buffer 0x100 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A code buffer : start at 0x500 CodeRuntime Cluster B pc return address = & dyn_restore_A pc dyn_restore_A Actual ret_addr = 0x140 Load B into code buffer

0x500 … push &F 0x530 call dyn_loader 0x540 Call Return in Code Buffer code buffer : start at 0x500 return address = & dyn_restore_A Runtime 0x100 … 0x220 F: … 0x250 ret Cluster B 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A Code Cluster A pc dyn_restore_A Actual ret_addr = 0x140 restore

Context Switches and Interrupts  Context switches  Interrupt Currently keep interrupt handlers in main memory Execute cluster A in code buffer context switches Execute. May change code buffer Remember A in Thread 1 task_struct Continue executing. in code buffer context switches Time Reload A into code buffer Thread 2 Thread 1

Experimental Setup  Start with a minimally configured kernel (Linux ) Compile the kernel with optimization for code size ( gcc –Os ) Original code size: 590KB  Implemented using binary rewriting tool PLTO  Benchmarks: MiBench, MediaBench, httpd

Memory Usage Reduction for Kernel Code Code buffer size = 2KB  Reduction decreases because amount of memory-resident code increases

Estimated Cost of Code Loading  All experiments were run in desktop environment  We estimated the cost of code loading as follows: Choose Micron NAND flash memory as an example (2KB page, takes to read a page) Est. Cost =

Overhead of Code Loading Unmodified Kernel  57% memory reduction 56% memory reduction 55% memory reduction

Related Work  Code compaction of OS kernel D. Chanet et al. LCTES 05 H. He et al. CGO 07  Reduce memory requirement in embedded system C. Park et al. EMSOFT 04 H. Park et al. DATE 06 B. Egger et al. CASE 06, EMSOFT 06  Binary rewriting of OS kernel Flower et al. FDDO-4

Conclusions  Embedded devices typically have a limited amount of memory  General-purpose OS kernels contain lots of code that is not executed in an embedded context  Reduce the memory requirement of OS kernel by using an on-demand code overlay mechanism  Memory requirements reduced significantly with little degradation in performance

Estimated Cost of Code Loading

A Big Picture Code buffer Main Memory Hot code Reuse code buffer Cold code Code clustering Core code Memory- resident kernel code Accommodate one cluster at a time Scheduler Memory management Interrupt handling

Memory Requirement for Kernel Code Core code How much hot code should stay in memory? Hot code Need to be in memory Size is predetermined Code buffer Size specified by user (we chose 2KB) Upper-bound of memory usage for kernel code Select the most frequently executed code Keep the total size of memory-resident code  size(core code)x(1 + ) where  specified by user (0%,10%)