CS 6560 Operating System Design Lecture 8: Memory Management.

Slides:



Advertisements
Similar presentations
Memory Management Unit
Advertisements

16.317: Microprocessor System Design I
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
Intel MP.
OS Memory Addressing.
UQC152H3 Advanced OS Memory Management under Linux.
IA-32 Processor Architecture
Vacuum tubes Transistor 1948 ICs 1960s Microprocessors 1970s.
1 Hardware and Software Architecture Chapter 2 n The Intel Processor Architecture n History of PC Memory Usage (Real Mode)
X86 segmentation, page tables, and interrupts 3/17/08 Frans Kaashoek MIT
Chapter 8.3: Memory Management
Memory Management (II)
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
CE6105 Linux 作業系統 Linux Operating System 許 富 皓. Chapter 2 Memory Addressing.
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
Memory Organization.
Chapter 3.2 : Virtual Memory
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
CS2422 Assembly Language & System Programming September 22, 2005.
Microprocessor Systems Design I Instructor: Dr. Michael Geiger Fall 2012 Lecture 15: Protected mode intro.
ECE 424 Embedded Systems Design Lecture 8 & 9 & 10: Embedded Processor Architecture Chapter 5 Ning Weng.
Virtual Memory  Early computers had a small and fixed amount to memory. All programs had to be able to fit in this memory. Overlays were used when the.
Operating System Machine Level  An operating system is a program that, from the programmer’s point of view, adds a variety of new instructions and features,
UNIT 2 Memory Management Unit and Segment Description and Paging
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
1/2002JNM1 With 20 bits, 1,048,576 different combinations are available. Each memory location is assigned a different combination. Each memory location.
Memory Addressing in Linux  Logical Address machine language instruction location  Linear address (virtual address) a single 32 but unsigned integer.
The Pentium Processor.
Computer Architecture Lecture 28 Fasih ur Rehman.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
The Intel Microprocessors. Real Mode Memory Addressing Real mode, also called real address mode, is an operating mode of and later x86-compatible.
Multitasking Mr. Mahendra B. Salunke Asst. Prof. Dept. of Computer Engg., STES SITS, Narhe, Pune-41 STES Sinhgad Institute of Tech. & Science Dept. of.
Lecture 19: Virtual Memory
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Memory Addressing / Kernel Modules.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Paging Example What is the data corresponding to the logical address below:
8.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Implementation of Page Table Page table is kept in main memory Page-table base.
80386DX.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
EFLAG Register of The The only new flag bit is the AC alignment check, used to indicate that the microprocessor has accessed a word at an odd.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Segment Descriptor Segments are areas of memory defined by a programmer and can be a code, data or stack segment. In segments need not be all the.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Microprocessors CSE Protected Mode Memory Addressing Remember using real mode addressing we were previously able to address 1M Byte of memory.
Page Replacement Implementation Issues Text: –Tanenbaum ch. 4.7.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Chapter 7: Main Memory CS 170, Fall Program Execution & Memory Management Program execution Swapping Contiguous Memory Allocation Paging Structure.
8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Example: The Intel 32 and 64-bit Architectures Dominant industry chips.
Chapter Overview General Concepts IA-32 Processor Architecture
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Chapter 8: Main Memory Source & Copyright: Operating System Concepts, Silberschatz, Galvin and Gagne.
143A: Principles of Operating Systems Lecture 5: Address translation
x86 segmentation, page tables, and interrupts
System Segment Descriptor
Page Replacement Implementation Issues
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS 301 Fall 2002 Computer Organization
MICROPROCESSOR MEMORY ORGANIZATION
Page Replacement Implementation Issues
CS703 - Advanced Operating Systems
Assembly Language for Intel-Based Computers, 5th Edition
Chapter 8: Main Memory CSS503 Systems Programming
Presentation transcript:

CS 6560 Operating System Design Lecture 8: Memory Management

References Our textbook: Robert Love, Linux Kernel Development, 2nd edition, Novell Press, Understanding the LINUX Kernel, 3rd. edition, O’Reilly, (covers 2.6) The kernel code and its own documentation. Also see Ulrich Drepper’s articles in LWN.net –Part I: –Part II: –Part III:

Plan Preliminaries Today Next time: Chap 11: Memory Management

Models of Memory Physical: RAM array accessed through a memory controller Real: Linear array of bytes for entire machine Virtual: Linear array of bytes for each process.

Physical Model Memory Array Memory Controller Bus

Physical Model of Memory Physical Models (See Drepper’s article for details) –Bits in SRAM and DRAM SRAM = static RAM - faster, but several transistors per bit DRAM - dynamic RAM - slower, each bit requires only one transistor, but needs refreshing Many types of DRAM with differing speeds of memory array and bus

Physical Memory Issues –Accessing main memory may be slow: Large addressing requires multiplexing which slows access. DRAM refresh requires time. DRAM speed is limited by power consumption considerations. The speed of light limits access time. c = 299,792,458 meters/second = meters/nanosecond. That’s about 10 cm for one cycle of a 3Gig clock –Other considerations A serial bus can run much faster than a parallel bus. Parallel busses cost more and use more space. The memory itself may run slower than the bus (DDR2).

CPU/Memory Various arrangements: –CPUs, memory, and peripherals share the same bus (classical IBM PC). –CPUs and memory share the same bus (FSB), but peripherals share another bus (PCI bus). –Each CPU has its own memory (AMD Opteron and Intel CSI). Then CPUs share a bus or form a network that connects to the bus.

DMA DMA = Direct Memory Access Here, the CPU sets up a data transfer between memory and a device, but does not participate in the actual transfer. A DMA controller does the transfer on behalf of the processor.

Processor Hierarchy Processors, Cores, and Threads: –Cores - run programs simultaneously with own instruction processing units, but share chip infrastructure. – Threads (hyperthreading) - run programs in interleaving fashion with separate register sets. Modern CPUs are designed for multiplicity at all levels (say 2 threads/core, 4 cores/processor, 2 processors/motherboard)

Challenges CPUs are getting significantly faster than main memory. CPUs are evolving to multiple parallel processing units. Main memory is being distributed among processing units (NUMA)

Memory Caching Caching schemes –Cache memory sits in between CPU and main memory. –Different levels and arrangements of cache –Main memory tends to consist of DRAM and caches tend to consist of SRAM.

Various levels of memory Registers: typical size 512 = 2^9 - typical access time 1 cycle L1: typical size 64KB = 2^16 - typical access time 3 cycles L2: typical size 4MB = 2^22 - typical access time 14 cycles Main memory: typical size 2GB = 2^31 - typical access time 240 cycles Note: sizes are for Intel IA-32 Core technology (see sandpile.org)

How to Caches Work? Cache is divided into what are called cache lines. Each cache line has several bytes of data (64 bytes of data per line for Intel Core) plus addressing information (say 32 bits). (The cache size is computed just from the data size.) The cache lines are divided into sets (say 8192 sets, with 8 lines per set ) (say, 2^13 * 2^3 * 2^6 = 2^22 = 4MB). Cache addressing information is split into fields (say T, S and O). –The rightmost field O is used for addressing within the cache lines (say 6 bits for 64 byte cache lines). –The middle field S is used determine which set (say 13 bits) –The left field T (say 13 bits) is used for matching the address.

Associativity The address matching is done with a comparator. The number of items that can be compared is called the associativity. Modern processors typically achieve 2-way, 8-way or 16-way associativity.

Cashe Hits and Misses Caching works in levels. With three levels –If the data cannot be matched in the L1 cache, then the L2 cache is checked. –If the data cannot be found at L2, L3 is checked. –If not in L3, get it from main memory.

Cache Sharing and Snooping Often cores will have their own L1 caches, one for data and one for instructions, and a shared L2 cache that contains both data and instructions. When two or more processing units share memory or the next level cache, they may snoop on each other’s individual cache and exchange data if needed. Popular cache snooping algorithms include MESI and SI. In these, the address bus is available from each processing to the other and a finite state machine determines fate of a cache line according to whether it was locally or remotely read or written to.

MESI MESI has four states for each cache line –M = Modified = modified by local processing unit and only copy –E = Exclusive = only copy, but not modified –S = Shared = not modified, and might be shared –I = Invalid = unused or invalid

Transition Table Each local or remote read or write changes the state according to something like: MESI Local read MESS Local write MMM+M remote read S*SSI remote write I*+III *Also transmits data to other processor and to memory +Also notifies other processors with RFO

80x86 Processor Memory Memory addressing modes –Logical Address: segment, offset pair –Linear Address: virtual memory address –Physical address: hardware - 32 bit or 36-bit number

Memory addressing modes Real: logical to physical mapping uses formula 16*seg + offset Protected: uses MMU to map from logical to physical by way of linear.

MMU MMU (Protected Mode) maps in two stages –Segmentation Unit: uses Segment Descriptors to map logical to linear addresses –Paging Unit: Uses Page Table to map linear addresses to physical addresses

Segment Descriptor Tables Two types –GDT = Global Descriptor Table –LDT = Local Descriptor Table Linux mainly uses the GDT (at least in the kernel memory). The LDT may be used to help emulate other OSs in user mode. Types of descriptors (8 bytes) –Code –Data –Task State –Local

Fields of Segment Descriptors Fields (distributed within descriptor) –Base (32-bits) = linear address of first byte of segment –G (1-bit) = granularity of segment size (bytes or 4K) –Limit (20-bits) = size of segment (two cases of G) –S (1-bit) = System flag (1=system data) –Type (4-bits) –DPL (2-bits) = descriptor privilege –P (1-bit) segment present (Linux has this as 1)

Segment Selectors Segment info is stored in 16-bit segment selector, contains –Index: index into the GDT or LDT –TI: choice of which table, 0=GDT, 1=LDT –Privilege level Six segment registers in the CPU –cs, ss, ds, es, fs, gs First three are special purpose: code, stack, data cs also contains privilege level (0=kernel, 3=user)

Segmentation in Linux Very limited - only when required by processor. All user process use the same code and data segment descriptors. The kernel has a separate code and data segment descriptors.

Linux GDT Each CPU has its own GDT. These are stored in cpu_gdt_decr array. Each GDT has 18 segment descriptors for –Four user/kernel code/data –A Task State Segment (TSS) to help switch between user and kernel mode and when users access I/O ports –A segment containing the LDT –Three Thread-Local Storage segments –Three Power Management segments –Five Plug and Play segments –One special TSS for exception handling

Virtual Memory Virtual Memory in Linux divides memory into pages which are either 4K or 4M. (The i386 uses segmentation as well.) (on 64-bit architecture, pages may be 8K) The Memory Management Unit (MMU) maps virtual addresses (linear) to physical addresses. The kernel fills in the page mapping table (called the page directory), but the MMU hardware does the work. The page table contains other information about each page including permission. On a 32-bit virtual addressing space, the kernel sees both the process user addressing space and the kernel addressing space. The kernel appears in the upper 1G and the user space as the lower 3G.

Page Faults If a page is not in memory, a page fault interrupt is generated. Code for handling the fault (for i386) is in./arch/i386/mm Page faults should not happen in the kernel. How this is arranged is explained in chap 11 (coming up soon).

Two types of 32-bit paging Regular - pages 4KB Extended (Pentium) - pages 4MB

Linear Address Fields For 32-bit regular paging (4KB), the linear address consist of three bit fields –Directory (bits 22-31) (10 bits) –Table (bits 12-21)(10 bits) –Offset (bits 0-11)(12 bits) For 32-bit extended (4MB) –Directory (bits 22-31) (10 bits) –Offset (bits 0-21) (22 bits) For 64-bit, there are more (later).