1 How to Shadow Every Byte of Memory Used by a Program Nicholas Nethercote — National ICT Australia Julian Seward — OpenWorks LLP.

Slides:



Advertisements
Similar presentations
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
Advertisements

OS Memory Addressing.
Mobile Handset Memory Management
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Memory Management Norman White Stern School of Business.
OS Fall’02 Memory Management Operating Systems Fall 2002.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Memory ManagementCS-502 Fall Memory Management CS-502 Operating Systems Fall 2006 (Slides include materials from Operating System Concepts, 7 th.
Memory Management Five Requirements for Memory Management to satisfy: –Relocation Users generally don’t know where they will be placed in main memory May.
Memory ManagementCS-3013 C-term Memory Management CS-3013 Operating Systems C-term 2008 (Slides include materials from Operating System Concepts,
Chapter 91 Translation Lookaside Buffer (described later with virtual memory) Frame.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Efficient Memory Shadowing for 64-bit Architectures ISMM 2010, Toronto, Canada June 6, 2010.
Memory Allocation CS Introduction to Operating Systems.
Memory Management CSE451 Andrew Whitaker. Big Picture Up till now, we’ve focused on how multiple programs share the CPU Now: how do multiple programs.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
1 Valgrind A Framework for Heavyweight Dynamic Binary Instrumentation Nicholas Nethercote — National ICT Australia Julian Seward — OpenWorks LLP.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Lecture 11 Page 1 CS 111 Online Memory Management: Paging and Virtual Memory CS 111 On-Line MS Program Operating Systems Peter Reiher.
1 Address Translation Memory Allocation –Linked lists –Bit maps Options for managing memory –Base and Bound –Segmentation –Paging Paged page tables Inverted.
Chapter 4 Memory Management Virtual Memory.
Address Translation. Recall from Last Time… Virtual addresses Physical addresses Translation table Data reads or writes (untranslated) Translation tables.
CPS110: Address translation Landon Cox February 19, 2008.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
12/5/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Operating Systems Unit 7: – Virtual Memory organization Operating Systems.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
CPS110: Intro to memory Landon Cox February 14, 2008.
NETW3005 Memory Management. Reading For this lecture, you should have read Chapter 8 (Sections 1-6). NETW3005 (Operating Systems) Lecture 07 – Memory.
Address Translation Andy Wang Operating Systems COP 4610 / CGS 5765.
Address Translation Mark Stanovich Operating Systems COP 4610.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
Cho, Ho-Gi OS Lab. Cho, Ho-Gi OS Lab. How to Shadow Every Byte of Memory Used by a Program
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
CS 140 Lecture Notes: Virtual Memory
Lecture 11 Virtual Memory
Memory Management Virtual Memory.
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Virtual Memory User memory model so far:
Outline Paging Swapping and demand paging Virtual memory.
File System Structure How do I organize a disk into a file system?
Chapter 8: Main Memory.
CSE 153 Design of Operating Systems Winter 2018
CS 140 Lecture Notes: Virtual Memory
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS Introduction to Operating Systems
CS 140 Lecture Notes: Virtual Memory
Virtual Memory Hardware
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture 7: Flexible Address Translation
Memory Management CSE451 Andrew Whitaker.
Lecture 8: Efficient Address Translation
CSE 153 Design of Operating Systems Winter 2019
Paging and Segmentation
CS703 - Advanced Operating Systems
CS 140 Lecture Notes: Virtual Memory
Presentation transcript:

1 How to Shadow Every Byte of Memory Used by a Program Nicholas Nethercote — National ICT Australia Julian Seward — OpenWorks LLP

2 Shadow memory tools Shadow every byte of memory with another value that describes it This talk: –Why shadow memory is useful –How to implement it well shadow value tools shadow memory tools

3 Examples Tool(s)Shadow memory helps find... Memcheck, PurifyMemory errors Eraser, DRD, Helgrind, etc. Data races HobbesRun-time type errors AnnelidArray bounds violations TaintCheck, LIFT, TaintTrace Uses of untrusted values “Secret tracker”Leaked secrets ReduxDynamic dataflow graphs DynCompBInvariants pinSELSystem call side-effects properties bugs security

4 Shadow memory is difficult Performance –Lots of extra state, many operations instrumented Robustness Trade-offs must be made shadow values address space original values squeez e!

5 An example tool: Memcheck

6 Memcheck Three kinds of information: –A (“addressability”) bits: 1 bit / memory byte –V (“validity”) bits: 1 bit / register bit, 1 bit / memory bit –Heap blocks: location, size, allocation function Memory information: VVVVVVV V A original memory byte shadow memory V bits only used if A bit is “addressable”

7 A simple implementation

8 Basics (I) AVVVVVVV V... AVVVVVVV V A AVVVVVVVV... AVVVVVVVV A...PM SM 1 NoAccess DSM SM 2 0 KB 4032 KB 64 KB 128 KB 3904 KB 3968 KB x000 1 FFF F

9 Basics (II) Multi-byte shadow accesses: –Combine multiple single-byte accesses –Complain if any unaddressable bytes accessed –Values loaded from unaddressable bytes marked as defined Range-setting (set_range) –Loop over many bytes, one at a time Range-checking –E.g.: write(fd, buf, n) -- check n bytes in buf Slow-down: 209.6x

10 Complications Corruption of shadow memory –Possible with a buggy program –Originally used x86 segmentation, but not portable –Keep original and shadow memory far apart, and pray 64-bit machines –Three- or four-level structure would be slow –Two level structure extended to handle 32GB –Slow auxiliary table for memory beyond 32GB –Better solution is an open research question

11 Four optimisations

12 #1: Faster loads and stores Multi-byte loads/stores are very common –N separate lookups accesses is silly (where N = 2, 4, or 8) If access is aligned, fully addressable –Extract/write V bits for N shadow bytes at once –Else fall back to slow case: 1 in a 1000 or less Slow-down: 56.2x –3.73x faster

13 #2: Faster range-setting Range-setting large areas is common –Vectorise set_range –8-byte stride works well Replacing whole SMs –If marking a 64KB chunk as NoAccess, replace the SM with the NoAccess DSM –Add Defined and Undefined DSMs –Large read-only code sections covered by Defined DSM Slow-down: 34.7x –1.62x faster, 1.97x smaller

14 #3: Faster SP updates Stack pointer (SP) updates are very common Inc/dec size often small, statically known –E.g. 4, 8, 12, 16, 32 bytes More specialised range-setting functions –Unrolled versions of set_range() Slow-down: 27.2x –1.28x faster

15 #4: Compressed V bits Partially-defined bytes (PDBs) are rare –Memory: 1 A bit + 8 V bits  2 VA bits –Four states: NoAccess, Undefined, Defined, PartDefined –Full V bits for PDBs in secondary V bits table –Registers unchanged -- still 8 V bits per byte Slow-down: 23.4x –4.29x smaller, 1.16x faster Obvious in hindsight, but took 3 years to identify

16 Discussion Optimising principles: –Start with a simple implementation –Make the common cases fast –Exploit redundancy to reduce data sizes Novelty? –First detailed description of Memcheck’s shadow memory –First detailed description of a two-level table version –First detailed evaluation of shadow memory –Compressed V bits

17 Evaluation

18 Robustness Two-level table is very flexible –Small shadow memory chunks, each can go anywhere Earlier versions required large contiguous regions –Some programs require access to upper address space –Some Linux kernels have trouble mmap’ing large regions –Big problems with Mac OS X, AIX, other OSes Memcheck is robust –Standard Linux C and C++ development tool –Official: Linux, AIX; experimental: Mac OS X, FreeBSD

19 SPEC 2000 Performance ToolSlow- down Relative improvement No instrumentation4.3x Simple Memcheck209.6x + faster loads/stores 56.2x 3.73x faster + faster range- setting 34.7x1.62x faster, 1.97x smaller + faster SP updates 27.2x1.28x faster + compressed V bits 23.4x1.16x faster, 4.29x smaller Overall improvement 8.9x faster, 8.5x smaller Shadow memory causes about half of Memcheck’s overhead

20 Performance observations Performance is a traditional research obsession “The subjective issues are important — ease of use and robustness, but performance is the item which would be most interesting for the audience.” (my emphasis) Users: slowness is #1 survey complaint –But most user s are about bugs or interpreting results –Zero preparation is a big win Cost/benefit –People will use slow tools if they are sufficiently useful

21 Alternative implementation “Half-and-half” –Used by Hobbes, TaintTrace, (with variation) LIFT abc...za'b'c'...z' original memoryshadow memory Compared to two-level table –Faster –Not robust enough for our purposes constant offset

22 If you remember nothing else...

23 Take-home messages Shadow memory is powerful Shadow memory can be implemented well Implementations require trade-offs g