Caching in Operating Systems Design & Systems Programming David E. Culler CS162 – Operating Systems and Systems Programming Lecture 19 October 13, 2014.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
CS 153 Design of Operating Systems Spring 2015
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Virtual Memory & Address Translation Vivek Pai Princeton University.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Translation Look-Aside Buffers TLBs usually small, typically entries Like any other cache, the TLB can be fully associative, set associative,
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Virtual Memory vs. Physical Memory So far, all of a job’s virtual address space must be in physical memory However, many parts of programs are never.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Chapter 4 Memory Management Virtual Memory.
Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Review (1/2) °Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster °Each level of memory.
Virtual Memory 1 1.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Demand Paging Reference Reference on UNIX memory management
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Lecture 11 Virtual Memory
8 July 2015 Charles Reiss
Memory Management Virtual Memory.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
CS 704 Advanced Computer Architecture
Chapter 8: Main Memory.
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Anthony D. Joseph and Ion Stoica
Translation Lookaside Buffer
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtual Memory 1 1.
Presentation transcript:

Caching in Operating Systems Design & Systems Programming David E. Culler CS162 – Operating Systems and Systems Programming Lecture 19 October 13, 2014 Reading: A&D HW 4 going out Proj 2 out today

Objectives Recall and solidify understanding the concept and mechanics of caching. Understand how caching and caching effects pervade OS design. Put together all the mechanics around TLBs, Paging, and Memory caches Solidify understanding of Virtual Memory 10/13/14cs162 fa14 L192

Review: Memory Hierarchy Take advantage of the principle of locality to: –Present as much memory as in the cheapest technology –Provide access at speed offered by the fastest technology Core Secondary Storage (Disk) Processor Main Memory (DRAM) 1 10,000,000 (10 ms) Speed (ns): BsSize (bytes): MBs GBs TBs kBs 100kBs Secondary Storage (SSD) 100,000 (0.1 ms) 100GBs 10/13/14cs162 fa14 L193

Examples vmstat –s top mac-os utility/activity 10/13/14cs162 fa14 L194

Where does caching arise in Operating Systems ? 10/13/14cs162 fa14 L195

Where does caching arise in Operating Systems ? Direct use of caching techniques – paged virtual memory (mem as cache for disk) – TLB (cache of PTEs) – file systems (cache disk blocks in memory) – DNS (cache hostname => IP address translations) – Web proxies (cache recently accessed pages) Which pages to keep in memory? 10/13/14cs162 fa14 L196

Where does caching arise in Operating Systems ? Indirect - dealing with cache effects Process scheduling – which and how many processes are active ? – large memory footprints versus small ones ? – priorities ? Impact of thread scheduling on cache performance – rapid interleaving of threads (small quantum) may degrade cache performance increase ave MAT !!! Designing operating system data structures for cache performance. All of these are much more pronounced with multiprocessors / multicores 10/13/14cs162 fa14 L197

MP $ 10/13/14cs162 fa14 L198 bus Memory $ $ P P $ $ P P $ $ P P * * *

Working Set Model (Denning ~70) As a program executes it transitions through a sequence of “working sets” consisting of varying sized subsets of the address space 10/13/14cs162 fa14 L19 Time Address 9

Cache Behavior under WS model Amortized by fraction of time the WS is active Transitions from one WS to the next Capacity, Conflict, Compulsory misses Applicable to memory caches and pages. Others ? 10/13/14cs162 fa14 L19 Hit Rate Cache Size new working set fits

Another model of Locality: Zipf Likelihood of accessing item of rank r is α1/r a Although rare to access items below the top few, there are so many that it yields a “heavy tailed” distribution. Substantial value from even a tiny cache Substantial misses from even a very large one 10/13/14cs162 fa14 L1911

Where does caching arise in Operating Systems ? Maintaining the correctness of various caches TLB consistent with PT across context switches ? Across updates to the PT ? Shared pages mapped into VAS of multiple processes ? 10/13/14cs162 fa14 L1912

Going into detail on TLB 10/13/14cs162 fa14 L1913

What Actually Happens on a TLB Miss? Hardware traversed page tables: –On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) If PTE valid, hardware fills TLB and processor never knows If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards Software traversed Page tables (ala MIPS) –On TLB miss, processor receives TLB fault –Kernel traverses page table to find PTE If PTE valid, fills TLB and returns from fault If PTE marked as invalid, internally calls Page Fault handler Most chip sets provide hardware traversal –Modern operating systems tend to have more TLB faults since they use translation for many things 10/13/14cs162 fa14 L1914

What happens on a Context Switch? Need to do something, since TLBs map virtual addresses to physical addresses –Address Space just changed, so TLB entries no longer valid! Options? –Invalidate TLB: simple but might be expensive What if switching frequently between processes? –Include ProcessID in TLB This is an architectural solution: needs hardware What if translation tables change? –For example, to move page from memory to disk or vice versa… –Must invalidate TLB entry! Otherwise, might think that page is still in memory! 10/13/14cs162 fa14 L1915

What TLB organization makes sense? Needs to be really fast –Critical path of memory access –Seems to argue for Direct Mapped or Low Associativity However, needs to have very few conflicts! –With TLB, the Miss Time extremely high! –This argues that cost of Conflict (Miss Time) is much higher than slightly increased cost of access (Hit Time) Thrashing: continuous conflicts between accesses –What if use low order bits of page as index into TLB? First page of code, data, stack may map to same entry Need 3-way associativity at least? –What if use high order bits as index? TLB mostly unused for small programs CPU TLBCacheMemory 10/13/14cs162 fa14 L1916

TLB organization: include protection How big does TLB actually have to be? – Usually small: entries – Not very big, can support higher associativity TLB usually organized as fully-associative cache – Lookup is by Virtual Address – Returns Physical Address + other info What happens when fully-associative is too slow? – Put a small (4-16 entry) direct-mapped cache in front – Called a “TLB Slice” When does TLB lookup occur relative to memory cache access? – Before memory cache lookup? – In parallel with memory cache lookup? 10/13/14cs162 fa14 L1917

As described, TLB lookup is in serial with cache lookup: Machines with TLBs go one step further: they overlap TLB lookup with cache access. –Works because offset available early Reducing translation time further Virtual Address TLB Lookup V Access Rights PA V page no.offset 10 P page no.offset 10 Physical Address 10/13/14cs162 fa14 L1918

Overlapping TLB & Cache Access (1/2) Main idea: –Offset in virtual address exactly covers the “cache index” and “byte select” –Thus can select the cached byte(s) in parallel to perform address translation Offset Virtual Page # index tag / page # byte virtual address physical address 10/13/14cs162 fa14 L1919

Here is how this might work with a 4K cache: What if cache size is increased to 8KB? –Overlap not complete –Need to do something else. See CS152/252 TLB 4K Cache bytes index 1 K page #disp 20 assoc lookup 32 Hit/ Miss PA Data Hit/ Miss = PA Overlapping TLB & Cache Access (1/2) 10/13/14cs162 fa14 L1920

Putting Everything Together: Address Translation Physical Address: Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index PageTablePtr Page Table (1 st level) Page Table (2 nd level) Physical Memory: 10/13/14cs162 fa14 L1921

Page Table (2 nd level) PageTablePtr Page Table (1 st level) Putting Everything Together: TLB Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index Physical Memory: Physical Address: … TLB: 10/13/14cs162 fa14 L1922

Page Table (2 nd level) PageTablePtr Page Table (1 st level) Virtual Address: Offset Virtual P2 index Virtual P1 index … TLB: Putting Everything Together: Cache Offset Physical Memory: Physical Address: Physical Page # … tag:block: cache: index bytetag 10/13/14cs162 fa14 L1923

Admin: Projects Project 1 – deep understanding of OS structure, threads, thread implementation, synchronization, scheduling, and interactions of scheduling and synchronization – work effectively in a team effective teams work together with a plan => schedule three 1-hour joint work times per week Project 2 – exe load and VAS creation provided for you – syscall processing, FORK+EXEC, file descriptors backing user file handles, ARGV registers & stack frames – two development threads for team but still need to work together 10/13/14cs162 fa14 L1924

Virtual Memory – the disk level 10/13/14cs162 fa14 L1925

Reacall: the most basic OS function 10/13/14cs162 fa14 L1926

Loading an executable into memory.exe – lives on disk in the file system – contains contents of code & data segments, relocation entries and symbols – OS loads it into memory, initializes registers (and initial stack pointer) – program sets up stack and heap upon initialization: CRT0 10/13/14cs162 fa14 L1927 disk (huge) memory code data info exe

Create Virtual Address Space of the Process Utilized pages in the VAS are backed by a page block on disk – called the backing store – typically in an optimized block store, but can think of it like a file 10/13/14cs162 fa14 L1928 disk (huge) memory code data heap stack kernel process VAS sbrk kernel code & data user page frames user pagetable

Create Virtual Address Space of the Process User Page table maps entire VAS All the utilized regions are backed on disk – swapped into and out of memory as needed For every process 10/13/14cs162 fa14 L1929 disk (huge, TB) memory code data heap stack kernel process VAS (GBs) kernel code & data user page frames user pagetable code data heap stack

Create Virtual Address Space of the Process User Page table maps entire VAS – resident pages to the frame in memory they occupy – the portion of it that the HW needs to access must be resident in memory 10/13/14cs162 fa14 L1930 disk (huge, TB) memory code data heap stack kernel VAS – per process kernel code & data user page frames user pagetable code data heap stack PT

Provide Backing Store for VAS User Page table maps entire VAS Resident pages mapped to memory frames For all other pages, OS must record where to find them on disk 10/13/14cs162 fa14 L1931 disk (huge, TB) memory code data heap stack kernel kernel code & data user page frames user pagetable code data heap stack VAS – per process

What data structure is required to map non-resident pages to disk? FindBlock(PID, page#) => disk_block Like the PT, but purely software Where to store it? Usually want backing store for resident pages too. Could use hash table (like Inverted PT) 10/13/14cs162 fa14 L1932

Provide Backing Store for VAS 10/13/14cs162 fa14 L1933 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data

On page Fault … 10/13/14cs162 fa14 L1934 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data active process & PT

On page Fault … find & start load 10/13/14cs162 fa14 L1935 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data active process & PT

On page Fault … schedule other P or T 10/13/14cs162 fa14 L1936 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data active process & PT

On page Fault … update PTE 10/13/14cs162 fa14 L1937 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data active process & PT

Eventually reschedule faulting thread 10/13/14cs162 fa14 L1938 disk (huge, TB) memory kernel code & data user page frames user pagetable code data heap stack code data heap stack kernel VAS 1 PT 1 code data heap stack kernel VAS 2 PT 2 heap stack data active process & PT

Where does the OS get the frame? Keeps a free list Unix runs a “reaper” if memory gets too full As a last resort, evict a dirty page first 10/13/14cs162 fa14 L1939

How many frames per process? Like thread scheduling, need to “schedule” memory resources – allocation of frames per process utilization? fairness? priority? – allocation of disk paging bandwith 10/13/14cs162 fa14 L1940

Historical Perspective Mainframes and minicomputers (servers) were “always paging” – memory was limited – processor rates <> disk xfer rates were much closer When overloaded would THRASH – with good OS design still made progress Modern systems hardly every page – primarily a safety net + lots of untouched “stuff” – plus all the other advantages of managing a VAS 10/13/14cs162 fa14 L1941

Summary Virtual address space for protection, efficient use of memory, AND multi-programming. – hardware checks & translates when present – OS handles EVERYTHING ELSE Conceptually memory is just a cache for blocks of VAS that live on disk – but can never access the disk directly Address translation provides the basis for sharing – shared blocks of disk AND shared pages in memory How else can we use this mechanism? – sharing ??? – disks transfers on demand ??? – accessing objects in blocks using load/store instructions 10/13/14cs162 fa14 L1942