1 Memory hierarchy and paging Electronic Computers M.

1 Memory hierarchy and paging Electronic Computers M

2 How do we dream of a memory? Infinite capacity and access time null………. BUT Faster is the memory more expensive and power consuming is (and very often is of bigger physical size) The aimed characteristics are unattainable Alternative solution: multiple level memory hierarchy Big capacity memory: slow access stime Small capacity memory; very fast access time Each level is therefore characterized by :  Access time  Cost per byte  Total capacity  Transfer speed (bandwith)  Single transferred item size

Livelli di gerarchia delle memorie 3 CPU Registers Cache I lev. Cache II lev. Cache III lev. Central Memory Disk Tape N.B. Some cache levels can be missing in the CPUs. How is the memory hierarchy handled? Caches are hardware managed (totally transparent to users) Memory and disks: hardware, OS and user (files) Bigger speed Bigger capacity Capacity/access-time/costs CPU registers Hundreds of Bytes <1 ns Cache Kbytes-Mbytes 1-10 ns $10/ MByte Disc Thousands of G Bytes / Tbytes 10 ms $0.0016/ MByte Tape Infinite Seconds-minutesi Central memory GBytes 100ns- 300ns $1/ MByte CACHE: small and very fast memory. Discussed later

Characteristics 4 Inclusion: All information of the upper levels (those increasingly nearer the CPU) are present in the lower levels. Very often used (but not always) Coherency: information data in different levels must be consistent and therefore update policies must be implemented Write-through: immediate information blocks update Write-back: information update is delayed until mandatory (i.e. a data replacement or its request by other processors) A replacement policy must be therefore defined NB: Information blocks in caches are called «lines» and in the central memory «pages».

Locality principle 5 Each program in any phase of its execution uses only a small portion of the memory data/instrutions Two locality types:  Time locality : when a data item has been accessed it is very likely that the same item will be accessed in the near future (i.e. loop)  Space locality: when a data item has been accessed it is very likely other items of near address will be accessed (i.e. vecotrs, matrices, linear code …) Working Set

Memory hierarchy General issues 6 It solves the following problems:  The speed difference between processors and memories  The need of big size central memories The main characteristic of the balance between cache and central memory is the speed and the transferred elements (indivisible – no portion of them) which are the the lines (32-256 plus bytes. The size depends on the number of cache levels) The main characteristic of the balance between the central memory and the disks is the capacity and the transferred elements are the pages (4KB-128KB), that is blocks of fixed size either of programs or of data – see later for their use A computer can have either, neither or both of them (caching – paging) Exploiting the memory hierarchy and the locality principle we achieve two goals:  A (virtual) memory space is made available to the programmer, whose size is equal to the addressable central memory space (which depends on the parallelism of the computer address). The physical central memory is always smaller than the addressable space. The central memory is much slower than the cache memory  The maximum speed access is granted to the processor, which accesses in most cases only the cache which is much faster than the central memory  This implies that faults must be handled that is cases when a memory level (either cache or central) DOES NOT contain the requested data and must get them from the a lower level memory, for instance cache lines (from central memory) or central memory pages (from disk). Double faults are obviously possible but unlikely if the system is well managed.

Terminology 7 There is a HIT when the requested data are present in the hierarchy level to which it was requested (i.e. the first level cache for the processor or the central memory for the last level cache) There is a MISS when the data ARE NOT present in the hierarchy level to which it was requested and must be retrieved recursively from lower levels Cache Block A Central Memory Block A Block B Block N Processor There is a HIT when the processor requests data belonging to block A and a MISS if the processor requests data belonging to block B. In case of a MISS the time for accessing lines of block B (miss penalty) depends on the request time the time of extracting block B and the transfer time between levels. This time increases according to the distance of the data (n. of levels) from the CPU (it varies from few to thousands clock cycles). Bigger is the block, bigger is the transfer time but in this case the miss rate (the probability of a miss in a data block) decreases. There must be therefore a reasonable balance in order to keep to a minimum the miss rate x miss penalty Block R

Problems 8 Where can be placed in cache (block placement) data (lines) of the block B of the main memory? For instance it could replace data of block A? How can data be found in cache (block identification) ? How can we choose the block in cache (line) to be replaced when the cache is already full ? Many policies (see later caches coherency and BTB) What happens when we write a line (write strategy) (for instance a line of block A) ? Normally a write-back policy (see later caches) is used Cache Block A Memoria Block A Block B Block N Processor Block R

Virtual Memory 9 The concept of logical (virtual) address space vs physical address space is the basis of the memory management Logical addresses -generated by the CPU and known also as virtual addresses or linear addresses of the data Physical addresses – the real, physical addresses where the requested data are stored Memory Management Unit: a hw/sw device which maps the logical (virtual) addresses to physical addresses. Used only in medium to high performance processors The programmer deals only with the logical addresses and is always totally unaware of the physical addresses of where the requested data are located

Paging 10 The physical memory is subdivided by the hw in fixed size (2’s power) blocks called frames The logical (virtual) memory is for the programmer a sequence of consecutive addresses which is interpreted by the hw as subdivided in blocks of equal size (pages). Pages and frames are of the same size The OS manages the frames (free or occupied) In order to execute a program at any time only n of its pages are needed (working set). For the execution therefore a program needs only n frames not necessarily contiguous (normally they are never contiguous) where the working set can be stored. A mapping system is therefore needed (the pages table which contains the initial physical addresses of all frames where the program pages are stored.) The memory fragmentation (but for the last frame of a program) is therefore avoided. The CPUs virtual address is normally interpreted by the hw as made of two components:  The m MSBits are the page number, that is the index in a table (page table) which allows the retrieval (the first physical address) of the corresponding frame  The n LSBits are the offset in page that is the value which must be added to the initial physical address to retrieve that data. Since the pages (and frames) are always of the same size and a 2’s power (they aligned - the initial address of each one has its n LSBits equal zero) the offset must be only joined to the MSBits

11 Paging MAPPING Page table Page number to frame initial address (MSBs) Physical Memory Virtual addresses PAGE FRAME Physical addresses 2 k elements Logical Memory Page numberOffset k bits n bits Frame numberOffset h bits n bits K>>h

12 Paging Virtual Page NumberOffset in page + Page table Initial address Physical page number (Page initial address) Always aligned (LSbits always zero !!) Offset in page (joined) Page table Processor generated address Page descriptor Datum physical address Status Address translation

Page table implementation 13 The page tables (one for each taks !!) are stored in the central memory A base table address register must point to the page table initial address. The size of each page table corresponds to the size of the virtual memory size divided by the page size and multiplied by the number of bytes for each table entry. The OS must manage another table indicating which physical pages are freee and/or occupied In order to avoid double memory access for each data access a special cache must exist called Translation Lookaside Buffer)which provides the physical page address without accessing the page table in main memory

Translation Lookaside Buffer (TLB) 14 Virtual Page Number Offset in page + Page table Processor generated address Status TLB (Within the processor) Hit Miss The TLB stores the translation (virtual to physical) of the last n addresses. It is a cache Page table Initial address Physical page number Offset Datum physical address

15 Paging (x86) Access protection Dirty bit Reference bits Present/Missing Status bits Each page can be defined as read only, read/write, user, system etc. It indicates whether the page content has been modified, When modified it must be written back to bulk memory when replaced It indicates whether the page was accessed (used by the replacement algorithm)) A virtual page may or may not be in the physical memory. In the latter case in the page descriptor the page address location in the bulk memory is stored Valid/Invalid It indicates whether a physical page corresponds to a frame of the virtual memory

16 Paging Page size Big pages(bigger than 32 KB) Reduced comprehensive access time (latency time) Reduced transfer time (reduced page-miss frequency) Smaller page table size Bigger internal fragmentation Small pages (typically 4-8 KB) Increased access time (increased seek time) Increased transfer time (increased page-miss frequency) Bigger page table size Thrashing Smaller internal fragmentation Normally the page size lies between 4KB and 256 KB

Page fault 17 Page load occurs «on demand» that is when one of its data are requested and the page is not already in memory: a OS trap is generated in this case The OS checks whether a non valid access took place (aborted) or the page is not yet in memory (page fault) In the latter case the OS checks whether a free frame is available and stores there the requested page. When no free frames are available an occupied frame is freed. If modified (dirty bit) the page is written back to the bulk memory. The page table is then modified The OS restarts the interrupted instruction of the interrupted task (restartable instruction)

Main Memory If the page is not in memory? 18 Processor Translation mechanism Virtual address Page already in memory Must be «always» available in memory (at least the portion needed) Page Datum Offset Miss (fault) Bulk memory (disk) Fault handler OS Hit

19 Page table organisation N.B. a different page table exists for each task S.O. TP-P1 TP-P2 Page1 Page 2 Page 3 Page0 File system 27 0 M44 1 M8714 2 D16 3 M Phys page 27 Phys page 16 Phys page 44 Phys. page or disk address i.e.: frame 2 is located in page 44; frame 2 is on disk sector 8714 Frame number Memory or Disk Process 1 page table (TP-P1)

Page table size 20 Consider a virtual address space with 36 bit address parallelism and frames/pages of 16Kbytes (16kbytes corresponds to a 14 bits offset. The frame number consists therefore of 22 bits). The page table contains 2 22 descriptors (2 2 x 2 10 x 2 10 = 4 x 1024 x 1024 = 4M) each of 22 bits (pages are aligned which means that their initial addresses has the 14 LSbits equal zero. They must not therefore be stored in the descriptor). If there are 10 status bits 4 bytes per descriptor are needed. The page table of each process is therefore 4M x 4 bytes = 16 Mbytes ! The total memory space for the page tables is therefore 16mbytes x number of active processes (very often hundreds): memory occupancy unacceptable Multiple levels page table

21 Hyrerarchical organisation (case of 4KB pages and 32 bit address ) Level 1Level 2Offset Address 32 bit. level I: 10 bit; level II : 10 bit; (each table slot 4 bytes !) - offset in page : 12 bit Table level I: 4 KB (points to 1024 level II tables – 4 bytes/address + status) Table level II: 4 KB (points to 1024 data/code pages - 4 bytes/address + status) 1024 (2 10 ) elements 20+12 (status)=32 BIT ->4 KB 1024 2nd lev.tables Each table size (1 or 2 level) is 1024 x 4 = 4 KB that is the size of a page! 2nd level tables are loaded when necessary Addr = 32 bit = 4 bytes but pages aligned (12 LSB = 0) -> 20 bit + 12 status bits First level table, loaded when the task is started, is always present in memory Phys. Addr. (aligned) of Tab. liv. 2 Phys. initial addresses of the user pages (data/code) 10 bit 12 bit System register Initial physicl address of level 1 table One for each process Task physical page Virtual Page Number

22 Hyrerarchical organisation(4 KB pages) 1023 n 0 m 0 Offset I level table – 10 bit (1024 II level tables ) Each element stores the physical initial address + status of a II level table 1 2 3 1 2 3 31 12 11 0 Address status 4 bytes II level table – 10 bit ( 1024 elements ) Each element stores the physiscal initial address + status of a physical memory page (contiguous ddresses) and therefore corrisponds to 1024 x 4 KB (12 bit) = 4 MB Physical page 4 KB (12 bit) (Contiguos addresses) Virtual memory 4 GB conceptually divided in 4MB blocks (4GB/1024 – 10 bit)) (4 MB -> 22 bit) (physical addr.) Level 1Level 2Offset Virtual Page Number 10 bit 12 bit 4 bytes II lev init. addr+status Page init,addr + status (In this example a 16 bit word) 4 bytes if parallelism 32 bit) Dato (physical addr.)

Virtual Page Number 23 Hyrerarchical organisation(4 KB pages) 00000000100000000011000000011001 Ex. Addr. 00803019H -> 00000000100000000011000000011001 4M 1023 n 0 4M 4K 1023 m 0 25 4M 1 2 3 4K 1 2 3 31 12 11 0 4 bytes 1° Liv: slot 22° liv: slot 3 Offset: 25d Physical page Totale 4 GB (a byte in this example) (The size of the addressed data depends on the operation code ) (physical addr.) Address status

24 Hyerarchical organisation Each II level table is a page which does not contain data BUT the physical address of page of the requested data. Upon a context switch only the first level table (4KB) must be present in memory while the second level tables are recalled only when needed using a Least Recently Used meachnism similar to that of the data pages In the modern processors where the address parallelism is over 38 bit 3 levels hyerarchical page sistems are implemented As already pointed out each data access would require multiple memory accesses: unacceptable. The Translation Lookaside Buffer –TLB mechanism is used which stores the last translations between logical and physical addresses, drastically reducing all access delay (but for the memory data access which in turn is reduced with code/data caches – see later) -. N.B. page tables changes (for instance the initial address of a data page) are NOT automatically reflected in the TLB which must be cleared upon a context switch. The OS is responsible for the congruence.

1 Memory hierarchy and paging Electronic Computers M.

Similar presentations

Presentation on theme: "1 Memory hierarchy and paging Electronic Computers M."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Memory hierarchy and paging Electronic Computers M.

Similar presentations

Presentation on theme: "1 Memory hierarchy and paging Electronic Computers M."— Presentation transcript:

Similar presentations

About project

Feedback