Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006.

Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006

2 © Alvin R. Lebeck 2001 CPS 220 Outline Finish Main Memory Address Translation –basics –64-bit Address Space Managing memory OS Performance Throughout Review Computer Architecture Interaction with Architectural Decisions

3 © Alvin R. Lebeck 2001 CPS 220 Fast Memory Systems: DRAM specific Multiple RAS accesses: several names (page mode) –64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs to address gap; what will they cost, will they survive? –Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock –RAMBUS: reinvent DRAM interface (Intel will use it) »Each Chip a module vs. slice of memory »Short bus between CPU and chips »Does own refresh »Variable amount of data returned »1 byte / 2 ns (500 MB/s per chip) –Cached DRAM (CDRAM): Keep entire row in SRAM

4 © Alvin R. Lebeck 2001 CPS 220 Main Memory Summary Big DRAM + Small SRAM = Cost Effective –Cray C-90 uses all SRAM (how many sold?) Wider Memory Interleaved Memory: for sequential or independent accesses Avoiding bank conflicts: SW & HW DRAM specific optimizations: page mode & Specialty DRAM, CDRAM –Niche memory or main memory? »e.g., Video RAM for frame buffers, DRAM + fast serial output IRAM: Do you know what it is?

5 © Alvin R. Lebeck 2001 CPS 220 Review: Reducing Miss Penalty Summary Five techniques –Read priority over write on miss –Subblock placement –Early Restart and Critical Word First on miss –Non-blocking Caches (Hit Under Miss) –Second Level Cache Can be applied recursively to Multilevel Caches –Danger is that time to DRAM will grow with multiple levels in between

6 © Alvin R. Lebeck 2001 CPS 220 Review: Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache

7 © Alvin R. Lebeck 2001 CPS 220 Review: Cache Optimization Summary TechniqueMRMPHTComplexity Larger Block Size+–0 Higher Associativity+–1 Victim Caches+2 Pseudo-Associative Caches +2 HW Prefetching of Instr/Data+2 Compiler Controlled Prefetching+3 Compiler Reduce Misses+0 Priority to Read Misses+1 Subblock Placement ++1 Early Restart & Critical Word 1st +2 Non-Blocking Caches+3 Second Level Caches+2 Small & Simple Caches–+0 Avoiding Address Translation+2 Pipelining Writes+1

8 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Core Chip Set Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts System Organization

9 © Alvin R. Lebeck 2001 CPS 220 Computer Architecture Interface Between Hardware and Software Hardware Software Operating System Compiler Applications CPUMemoryI/O MultiprocessorNetworks This is IT

10 © Alvin R. Lebeck 2001 CPS 220 Memory Hierarchy 101 P $ Memory Very fast <1ns clock Multiple Instructions per cycle SRAM, Fast, Small Expensive DRAM, Slow, Big,Cheap (called physical or main) => Cost Effective Memory System (Price/Performance) Magnetic, Really Slow, Really Big, Really Cheap

11 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Motivation Process = Address Space + thread(s) of control Address space = PA –programmer controls movement from disk –protection? –relocation? Linear Address space –larger than physical address space »32, 64 bits v.s. 28-bit physical (256MB) Automatic management Virtual Physical

12 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory Process = virtual address space + thread(s) of control Translation –VA -> PA –What physical address does virtual address A map to –Is VA in physical memory? Protection (access control) –Do you have permission to access it?

13 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Questions How is data found if it is in physical memory? Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped What data should be replaced on a miss? (Take Compsci 210 …)

14 © Alvin R. Lebeck 2001 CPS 220 Segmented Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 30 ) Variable size, base + offset, contiguous in both VA and PA Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000

15 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Physical Address Space

16 © Alvin R. Lebeck 2001 CPS 220 Pentium Segmention (Continued) Segment Descriptors –Local and Global –base, limit, access rights –Can define many Segment Registers –contain segment descriptors (faster than load from mem) –Only 6 Must load segment register with a valid entry before segment can be accessed – generally managed by compiler, linker, not programmer

17 © Alvin R. Lebeck 2001 CPS 220 Paged Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 28 ) –virtual page to physical page frame Fixed Size units for access control & translation Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000 Virtual page number Offset

18 © Alvin R. Lebeck 2001 CPS 220 Page Table Kernel data structure (per process) Page Table Entry (PTE) –VA -> PA translations (if none page fault) –access rights (Read, Write, Execute, User/Kernel, cached/uncached) –reference, dirty bits Many designs –Linear, Forward mapped, Inverted, Hashed, Clustered Design Issues –support for aliasing (multiple VA to single PA) –large virtual address space –time to obtain translation

19 © Alvin R. Lebeck 2001 CPS 220 Alpha VM Mapping (Forward Mapped) “64-bit” address divided into 3 segments –seg0 (bit 63=0) user code/heap –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Alpha 21064 only 43 unique bits of VA –(future min page size up to 64KB => 55 bits of VA) PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) –What do you do for replacement? 21 10 POL3L2L1 base + 10 13 + + phys page frame number seg 0/1

20 © Alvin R. Lebeck 2001 CPS 220 Inverted Page Table (HP, IBM) One PTE per page frame –only one VA per physical frame Must search for virtual address More difficult to support aliasing Force all sharing to use the same VA Virtual page numberOffset VA PA,ST Hash Anchor Table (HAT) Inverted Page Table (IPT) Hash

21 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation + Paging Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Linear Address Space Page Dir Physical Address Space DirOffsetTable Page Table

22 © Alvin R. Lebeck 2001 CPS 220 The Memory Management Unit (MMU) Input –virtual address Output –physical address –access violation (exception, interrupts the processor) Access Violations –not present –user v.s. kernel –write –read –execute

23 © Alvin R. Lebeck 2001 CPS 220 Translation Lookaside Buffers (TLB) Need to perform address translation on every memory reference –30% of instructions are memory references –4-way superscalar processor –at least one memory reference per cycle Make Common Case Fast, others correct Throw HW at the problem Cache PTEs

24 © Alvin R. Lebeck 2001 CPS 220 Fast Translation: Translation Buffer Cache of translated addresses Alpha 21164 TLB: 48 entry fully associative Page Number Page offset... vrwtag phys frame... 48:1 mux 12... 48 3 4

25 © Alvin R. Lebeck 2001 CPS 220 TLB Design Must be fast, not increase critical path Must achieve high hit ratio Generally small highly associative Mapping change –page removed from physical memory –processor must invalidate the TLB entry PTE is per process entity –Multiple processes with same virtual addresses –Context Switches? Flush TLB Add ASID (PID) –part of processor state, must be set on context switch

26 © Alvin R. Lebeck 2001 CPS 220 Hardware Managed TLBs Hardware Handles TLB miss Dictates page table organization Compilicated state machine to “walk page table” –Multiple levels for forward mapped –Linked list for inverted Exception only if access violation Control Memory TLB CPU

27 © Alvin R. Lebeck 2001 CPS 220 Software Managed TLBs Software Handles TLB miss Flexible page table organization Simple Hardware to detect Hit or Miss Exception if TLB miss or access violation Should you check for access violation on TLB miss? Control Memory TLB CPU

28 © Alvin R. Lebeck 2001 CPS 220 Kernel Mapping the Kernel Digital Unix Kseg –kseg (bit 63 = 1, 62 = 0) Kernel has direct access to physical memory One VA->PA mapping for entire Kernel Lock (pin) TLB entry –or special HW detection User Stack Kernel User Code/ Data Physical Memory 0 2 64 -1

29 © Alvin R. Lebeck 2001 CPS 220 Considerations for Address Translation Large virtual address space Can map more things –files –frame buffers –network interfaces –memory from another workstation Sparse use of address space Page Table Design –space –less locality => TLB misses OS structure microkernel => more TLB misses

30 © Alvin R. Lebeck 2001 CPS 220 Address Translation for Large Address Spaces Forward Mapped Page Table –grows with virtual address space »worst case 100% overhead not likely –TLB miss time: memory reference for each level Inverted Page Table –grows with physical address space »independent of virtual address space usage –TLB miss time: memory reference to HAT, IPT, list search

31 © Alvin R. Lebeck 2001 CPS 220 Hashed Page Table (HP) Combine Hash Table and IPT [Huck96] –can have more entries than physical page frames Must search for virtual address Easier to support aliasing than IPT Space –grows with physical space TLB miss –one less memory ref than IPT Virtual page numberOffset VA PA,ST Hashed Page Table (HPT) Hash

32 © Alvin R. Lebeck 2001 CPS 220 Clustered Page Table (SUN) Combine benefits of HPT and Linear [Talluri95] Store one base VPN (TAG) and several PPN values –virtual page block number (VPBN) –block offset VPBNOffset VPBN next PA0 attrib Hash Boff VPBN next PA0 attrib... PA1 attrib PA2 attrib PA3 attrib VPBN next PA0 attrib VPBN next PA0 attrib

33 © Alvin R. Lebeck 2001 CPS 220 Reducing TLB Miss Handling Time Problem –must walk Page Table on TLB miss –usually incur cache misses –big problem for IPC in microkernels Solution –build a small second-level cache in SW –on TLB miss, first check SW cache »use simple shift and mask index to hash table

34 © Alvin R. Lebeck 2001 CPS 220 Cache Indexing Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag Fully Associative: No index Direct-Mapped: Large index Block offset Block Address TAGIndex

35 © Alvin R. Lebeck 2001 CPS 220 Address Translation and Caches Where is the TLB wrt the cache? What are the consequences? Most of today’s systems have more than 1 cache –Digital 21164 has 3 levels –2 levels on chip (8KB-data,8KB-inst,96KB-unified) –one level off chip (2-4MB) Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping

36 © Alvin R. Lebeck 2001 CPS 220 TLBs and Caches CPU TLB $ MEM VA PA Conventional Organization CPU $ TLB MEM VA PA Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem CPU $TLB MEM VA PA Tags PA Overlap $ access with VA translation: requires $ index to remain invariant across translation VA Tags L2 $

37 © Alvin R. Lebeck 2001 CPS 220 Virtual Caches Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache Avoid address translation before accessing cache –faster hit time to cache Context Switches? –Just like the TLB (flush or pid) –Cost is time to flush + “compulsory” misses from empty cache –Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process I/O must interact with cache

38 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts I/O and Virtual Caches I/O Bridge Virtual Cache Physical Addresses I/O is accomplished with physical addresses DMA flush pages from cache need pa->va reverse translation coherent DMA

39 © Alvin R. Lebeck 2001 CPS 220 Aliases and Virtual Caches aliases (sometimes called synonyms); Two different virtual addresses map to same physical address But, but... the virtual address is used to index the cache Could have data in two different locations in the cache Kernel User Stack Kernel User Code/ Data Physical Memory 0 2 64 -1

40 © Alvin R. Lebeck 2001 CPS 220 If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Limits cache to page size: what if want bigger caches and use same trick? –Higher associativity –Page coloring Index with Physical Portion of Address Page Address Page Offset Address Tag Index Block Offset

41 © Alvin R. Lebeck 2001 CPS 220 Page Coloring for Aliases HW that guarantees that every cache frame holds unique physical address OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame –one form of page coloring Page Address Page Offset Address Tag Index Block Offset

42 © Alvin R. Lebeck 2001 CPS 220 Page Coloring to reduce misses Notion of bin –region of cache that may contain cache blocks from a page Random vs careful mapping Selection of physical page frame dictates cache index Overall goal is to minimize cache misses CachePage frames

43 © Alvin R. Lebeck 2001 CPS 220 Careful Page Mapping [Kessler92, Bershad94] Select a page frame such that cache conflict misses are reduced –only choose from available pages (no VM replacement induced) static –“smart” selection of page frame at page fault time dynamic –move pages around

44 © Alvin R. Lebeck 2001 CPS 220 A Case for Large Pages Page table size is inversely proportional to the page size –memory saved Fast cache hit time easy when cache <= page size (VA caches); –bigger page makes it feasible as cache size grows Transferring larger pages to or from secondary storage, possibly over a network, is more efficient Number of TLB entries are restricted by clock cycle time, –larger page size maps more memory –reduces TLB misses

45 © Alvin R. Lebeck 2001 CPS 220 A Case for Small Pages Fragmentation –large pages can waste storage –data must be contiguous within page Quicker process start for small processes(??)

46 © Alvin R. Lebeck 2001 CPS 220 Superpages Hybrid solution: multiple page sizes –8KB, 16KB, 32KB, 64KB pages –4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages Need to identify candidate superpages –Kernel –Frame buffers –Database buffer pools Application/compiler hints Detecting superpages –static, at page fault time –dynamically create superpages Page Table & TLB modifications

47 © Alvin R. Lebeck 2001 CPS 220 Page Coloring Make physical index match virtual index Behaves like virtual index cache –no conflicts for sequential pages Possibly many conflicts between processes –address spaces all have same structure (stack, code, heap) –modify to xor PID with address (MIPS used variant of this) Simple implementation Pick abitrary page if necessary

48 © Alvin R. Lebeck 2001 CPS 220 Bin Hopping Allocate sequentially mapped pages (time) to sequential bins (space) Can exploit temporal locality –pages mapped close in time will be accessed close in time Search from last allocated bin until bin with available page frame Separate search list per process Simple implementation

49 © Alvin R. Lebeck 2001 CPS 220 Best Bin Keep track of two counters per bin –used: # of pages allocated to this bin for this address space –free: # of available pages in the system for this bin Bin selection is based on low values of used and high values of free Low used value –reduce conflicts within the address space High free value –reduce conflicts between address spaces

50 © Alvin R. Lebeck 2001 CPS 220 Hierarchical Best bin could be linear in # of bins Build a tree –internal nodes contain sum of child values Independent of cache size –simply stop at a particular level in the tree

51 © Alvin R. Lebeck 2001 CPS 220 Benefit of Static Page Coloring Reduces cache misses by 10% to 20% Multiprogramming –want to distribute mapping to avoid inter-address space conflicts

52 © Alvin R. Lebeck 2001 CPS 220 Dynamic Page Coloring Cache Miss Lookaside (CML) buffer [Bershad94] –proposed hardware device Monitor # of misses per page If # of misses >> # of cache blocks in page –must be conflict misses –interrupt processor –move a page (recolor) Cost of moving page << benefit

Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006.

Similar presentations

Presentation on theme: "Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006.

Similar presentations

Presentation on theme: "Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006."— Presentation transcript:

Similar presentations

About project

Feedback