Download presentation
Presentation is loading. Please wait.
Published byLinda Smith Modified over 9 years ago
1
Virtual Memory and I/O Mingsheng Hong
2
I/O Systems Major I/O Hardware Hard disks, network adaptors … Problems related with I/O Systems Various types of Hardware – device drivers to provide OS with a unified I/O interface Typically much slower than CPU and memory speed – system bottleneck Too much CPU involvement in I/O operations
3
Techniques to Improve I/O Performance Buffering e.g. download a file from network DMA Caching CPU cache, TLB, file cache..
4
Other Techniques to Improve I/O Performance Virtual Memory Page Remapping (IO- Lite) Allows (cached) files and memory to be shared by different processes without extra data copy Prefetching Data (Software Pretching and Caching for TLBs) Prefetches and caches page table entries
5
Summary of First Paper IO-Lite: A Unified I/O Buffering and Caching System (Pai et al. Best Paper of 3 rd OSDI, 1999) A unified I/O System Uses immutable data buffers to store all I/O data (only one physical copy) Uses VM page remapping IPC file system (disk files, file cache) network subsystem
6
Summary of Second Paper Software Prefetching and Caching for Translation Lookaside buffers (Bala et al. 1994) A software approach to help reduce TLB misses Works well for IPC-intensive systems Bigger performance gain for future systems
7
Features of IO-Lite Eliminates redundant data copying Saves CPU work & avoids cache pollution Eliminates Multiple buffering Saves main memory => improves hit rate of file cache Enables cross-subsystem optimizations Cache Internet checksum Supports application-specific cache replacement policies
8
Related work before IO-Lite I/O APIs should preserve copy semantics Memory-mapped files Copy On Write Fbufs
9
Key Data Structures Immutable Buffers and Buffer Aggregates
10
Discussion I When we pass a buffer aggregate from process A to process B, how to efficiently do VM page remapping (modify B’s page table entries)? Possible Approach 1: find any empty entry, and modify the VM address contained in buffer aggregate Very inefficient Possible Approach 2: reserve the range of virtual addresses of buffers in the address space of each process Basically limited the total size of buffers – How about dynamically allocated buffers?
11
Impact of Immutable I/O Buffers Copy-On-Write Optimization Modified values are stored in a new buffer, as opposed to “in-place modification” Three situations when the data object is … Completely modified Allocates a new buffer Partially modified (modification localized) Chains unmodified and modified portions of data Partially modified (modification not localized) Compares the cost of writing an entire object with that of chaining; chooses the cheaper method
12
Discussion II How to measure the two costs? Heuristics needed Fragmented data v.s. clustered data Chained data increase reading cost Similar to shadow page technique used in System R Should the cost of retrieving data from buffer also be considered?
13
What does IO-Lite do? Reduces extra data copy in IPC file system (disk files, file cache) network subsystem Makes possible cross-subsystem optimization
14
IO-Lite and IPC Operations on Buffers & Aggregates When I/O data is transferred Pass related aggregates by value Associated buffers are passed by reference When buffer is deallocated Buffer returned to a memory pool Buffer’s VM page mappings persist When buffer is reused (by the same process) No further VM map changes required (Temporarily) grant write permission to associated producer process
15
Io-Lite and Filesystem IO-Lite I/O APIs Provided IOL_read(int fd, IOL_Agg **aggr, size_t size) IOL_write(int fd, IOL_Agg **aggr) IOL_write operations are atomic – concurrency support I/O functions in stdio library reimplemented Filesystem cache reorganized Buffer aggregates (pointers to data), instead of file data, are stored in cache Copy Semantics ensured Suppose a portion of a cached file is read, and then is overwritten
16
Copy Semantics Illustration 1
17
Copy Semantics Illustration 2
18
Copy Semantics Illustration 3
19
More on File Cache Management & VM Paging Cache replacement policy (can be customized) The eviction order is by current reference status & time of last file access Evict one entry when the file cache “appears” to be too large Added one entry on every file cache miss When a buffer page is paged out, data will be written back to swap space, and possibly to several other disk locations (for different files)
20
IO-Lite and Network Subsystem Access control and protection for processes ACL related with buffer pools Must determine the ACL of a data object prior to allocating memory for it Early demultiplexing technique to determine ACL for each incoming packet
21
A Cross-Subsystem Optimization Internet checksum caching Cache the computed checksum for each slice of a buffer aggregate Increment the version number when buffer is reallocated – can be used to check whether data changed Works well for static files. Also has a big benefit on the CGI programs that chain dynamic data with static data
22
Performance – Competitors Flash Web server – a high performance HTTP server Flash-Lite – A modified version of Flash using IO-Lite API Apache 1.3.1 – representing the widely used Web server today
23
Performance – Static Content requesting
24
Performance – CGI Programs
25
Performance – Real Workload Average request size: 17KBytes
26
Performance – WAN Effects Memory for buffers = # clients * T ss
27
Performance – Other Applications
28
Conclusion on I/O-Lite A unified framework of I/O subsystems Impressive performance in Web applications due to copy-avoidance & checksum caching
29
Software Prefetching & Caching for TLBs Prefetching & Caching Never applied to TLB misses in a software approach Improves overall performance by up to 3% But has a great potential on newer architectures Clock Speed: 40MHz => 200 MHz
30
Issues in Virtual Memory User Address Space is typically huge … TLB to cache page tables Software support to help reduce TLB misses
31
Motivations TLB misses occur more frequently in Microkernel-based OS RISC computers handle TLB misses in software (trap) IPCs have a bigger impact on system performance
32
Approach Use a software approach to prefetch and cache TLB entries Experiments done on MIPS R3000- based (RISC) architecture with Mach 3.0 Applications chosen from standard benchmarks, as well as a synthetic IPC- intensive benchmark
33
Discussion The way the authors motivate their paper A right approach for a particular type of system A valid Argument for future computer systems regarding performance gain Figures of experimental results mostly showing the reduced number of TLB misses, instead of overall performance improvement A synthetic IPC-intensive application to support their approach
34
Prefetching: What entries to prefetch? L1U: user address spaces L1K: kernel data structures L2: user (L1U) page tables Stack segments Code segments Data segments L3: L1K and L2 page tables
35
Prefetching: Details On the first IPC call, probe hardware TLB on the PIC path and enter related TLB entries into PTLB On Subsequent IPC calls, entries are prefetched into PTLB by a hashed lookup Entries are stored in upmapped, cached physical memory
36
Prefetching: Performance
37
Rate of TLB misses?
38
Caching: Software Victim Cache Use a region of unmapped, cached memory to cache entries evicted from hardware TLB PTE lookup sequence: hardware TLB STLB generic trap handler
39
Caching: Benefits A faster trap path for TLB misses Avoids overhead of context switch Eliminates (reduces?) cascaded TLB misses
40
Caching: Performance Average STLB penalties Kernel TLB hit rates
41
Caching: Performance
42
Prefetching + Caching: Performance Worse than using PTLB alone! (Don’t understand the authors comment to justify it…)
43
Discussion SLTB (caching) is better than PLTB. So using it alone suffices. Is it possible to improve the IPC performance using both VM page remapping and software prefetching & caching?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.