© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 2 The Microprocessor and its Architecture The Intel 8086, 80X86, and Pentium Family
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Contents Internal architecture of the Microprocessor: –The programmer’s model, i.e. The registers model –The processor model (organization) Real mode memory addressing using memory segmentation Protected mode memory addressing using memory segmentation Memory paging mechanism
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Objectives for this Chapter Describe the function and purpose of program- visible registers Describe the Flags register and the purpose of each flag bit Describe how memory is accessed using segmentation in both the real mode and the protected mode Describe the program-invisible registers Describe the structures and operation of the memory paging mechanism Describe the processor model
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The Intel Family
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Programming Model General Purpose Registers Special Purpose Registers Segment Registers and above (32-bit data registers)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e General-Purpose Registers The top portion of the programming model contains the general purpose registers: EAX, EBX, ECX, EDX, EBP, ESI, and EDI. Although general in nature, each have a special purpose and name: Can carry both Data & Address offsets EAX – Accumulator Used also as AX (16 bit), AH (8 bit), and AL (8 bit) EBX – Base Index often used to address memory (BX, BH, and BL)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e General-Purpose Registers (continued) ECX – count, for shifts, rotates, and loops (CX, CH, and CL) EDX – data, used with multiply and divide (DX, DH, and DL) EBP – base pointer used to address stack data (BP) ESI – source index (SI) for memory locations, e.g. with string instructions EDI – destination index (DI) for memory locations
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Special-Purpose Registers ESP, EIP, and EFLAGS Each has a specific task –ESP – Stack pointer: addresses the stack segment used with functions (procedures) (SP) –EIP – Instruction Pointer: addresses the next instruction in a program in the code segment (IP) –EFLAGS – indicates latest conditions of the microprocessor (FLAGS)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e EFLAGS 80386DX
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The Flags C – Carry/borrow of result P – the parity flag (little used today) A – auxiliary flag used with BCD arithmetic, e.g. using DAA and DAS (decimal adjust after add/sub) Z – zero S – sign O – Overflow D – direction - Determines auto incr/dec direction for string instructions I – interrupt- Enables (using STI) or disables (using CLI) the processing of hardware interrupts arriving at the INTR input pin T – Trap- (turns trapping on/off for program debugging) Basic Flag Bits (8086 etc.)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Newer Flag Bits IOPL – 2-bit I/O privilege level in protected mode NT – nested task RF – resume flag (used with debugging) VM – virtual mode: multiple DOS programs each with a 1 MB memory partition AC – alignment check: detects addressing on wrong boundary for words/double words VIF – virtual interrupt flag VIP – virtual interrupt pending ID = CPUID instruction available The instruction gives info on CPU version and manufacturer
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Segment Registers The segment registers are: –CS (code), –DS (data), –ES (extra data. used with string instructions), –SS (stack), –FS, and GS. Segment registers define a section of memory (segment) for a program. A segment is either 64K (2 16 ) bytes of fixed length (in the real mode) or up to 4G (2 32 ) bytes of variable length (in the protected mode). All code (programs) reside in the code segment.
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Real Mode Memory Addressing The only mode available on for Real mode memory is the first 1M (2 20 ) bytes of the memory system (real, conventional, DOS memory) All real mode 20-bit addresses are a combination of a segment address (in a segment register) plus an offset address (in another register) The segment register address (16-bits) is appended with a 0H or (or multiplied by 10H) to form a 20-bit start of segment address The effective memory address = this 20-bit segment address + a 16-bit offset address in a register Segment length is fixed = 2 16 = 64K bytes
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e 1 MB 20-bit (5-byte) Physical Memory address 64 KB Segment 16-bit Appended byte 0H + EA (Effective Address) (1 MB)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Effective Address Calculations EA = segment register (SR) x 10H plus offset (a) SR: 1000H = (b) SR: AAF0H AAF = AB034 (c) SR: 1200H FFF0 = 21FF0
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Overlapping segments Top of CS: 090F0 FFFF+ 190EF
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Defaults Default segment numbers in: –CS for program (code) –SS for stack –DS for data –ES for string destination Default offset addresses that go with them: SegmentOffset (16-bit) 8080, 8086, Offset (32-bit) and above Purpose CS IPEIPProgram SS SP, BPESP, EBPStack DS BX, DI, SI, 8-bit or 16-bit #EAX, EBX, ECX, EDX, ESI, EDI, 8-bit or 32-bit # Data ES DIEDIString destination
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Segmentation: Pros and Cons Advantages: Allows easy and efficient relocation of code and data A program can be located anywhere in memory without any change to the program Program writing needs not worry about actual memory structure of the computer used to execute it To relocate code or data only the segment number needs to be changed Disadvantages: Complex hardware and for address generation Software: Programs limited by segment size (only 64KB with the 8086)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Limitations of the above real mode segmentation scheme Segment size is fixed at 64 KB Segment can not begin at an arbitrary memory address… With 20-bit memory addressing, can only begin at addresses starting with 0H, i.e. at 16 byte intervals Difficult to use with 24 or 32-bit memory addressing with segment registers remaining at 16-bits and above use 24, 32, 36 bit addresses but still 16-bit segment registers Use memory segmentation in the protected mode
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Protected Mode: and above The Windows operating system domain 32-bit addressing: 4G of memory with 2G for the system and 2 G for the application Protected mode still uses segment and offset addresses, but: - Segment definition is through a more complex selector/descriptor mechanism (greater flexibility) - Offset address: 16-bit (286) or 32-bits (386 and above: e.g. EIP register) Descriptors are placed in descriptor tables in main memory Protection is provided by restricting access to memory segments through priority levels and access rights
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Descriptors specify memory segments Segment number (still in a 16-bit segment register) defines the segment using selector/descriptor (not directly as in real mode but more flexibility) 16 bits = 13 bit descriptor selector + 1 bit descriptor table selector + 2-bit requested privilege 2 13 = 8192 Segment Register, e.g. DS How many segments can be defined?
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Base: 3-byte 24 bit addressing Limit: 2-byte (16 bit) 1B-64 KB Segments Note provision for upward compatibility The segment descriptor contains: Base address (start address of segment) (length=address bus) Limit (offset for the end address of segment) Privilege level and access rights to this segment So a segment can start at any location & have a specified length. 8-byte Segment Descriptors Base: 4-byte 32 bit addressing Limit: 2 1/2-byte (20 bit) 1B-1MB With G bit 4 K multiplier = 1: 4KB-4GB Segment Availability Instruction Mode: 16/32 bits Contains 2-bit Privilege level Limit = largest offset Limit < largest offset
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Base Limit Processor: Segment size = 1+ Limit = 100H bytes Access Rights byte What is the RPL value? H What is the selector value? Are we using the global or local descriptor table? 24-bit Address Always 0’s for upward compatibility 8-byte Segment Descriptor # 1 MSB 16-bit Segment Register (in main memory)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The base is a 32-bit address at which the memory segment starts The limit is a 20-bit number. When added to the base, it addresses the last location in the segment The limit has a modifier bit called Granularity (G). If G=0: no change If G=1, append limit with FFFH, i.e. segment size is multiplied by 4K With limit giving 1 MB segments and G=1 (i.e. 4K multiplier): Segment size = 4K x 1 MB = 4 GB With 16K segments like this, the system can address 16K x 4 GB = 64 TB (not necessarily all in physical memory) Protected Mode: and above (Pentium class)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Example: base = H and a limit of 012FFH G = 0 Segment start = H Segment end = H + 012FFH=230012FFH Segment size = 12FFH+1H = 1300H (= 19 x 256 bytes) G = 1 (limit = 012FFFFFH) (append limit in descriptor by FFFH) Segment start = H Segment end = H + 012FFFFFH = 242FFFFFH Segment size = 12FFFFF+1H = H = 2 12 x 1300H = 4K x 1300H
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Access Rights Compare with the request priority level (RPL) in the segment register specifying this segment. Allow access to the segment if RPL has higher or equal priority to the DPL This is the access rights byte in the descriptor
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Program Invisible Registers (caches) Types of Descriptor Tables in memory: One Global Descriptor Table (GDT): 1. Entries that describe global segments (available to all tasks) 2. Entries that point to several Local Descriptor Tables (LDTs)- one for each task (defined in LDTR) with a maximum size of 64 KB Entries in the LDT describe segments associated with a given task 3. Entries that describe tasks (defined in a TR) One Interrupt Descriptor Table (IDTs): (size = 64 KB) Entries point to the starting addresses of up to 256 interrupt service routines
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Program Invisible Registers (caches) Segment Selector (for the tables themselves, 64 KB each) (16-bit) Descriptor Table for interrupts Global Descriptor Table (GDT) for: segments, tasks, and LDT Task Register LDT Register LDT is accessed through a selector in the GDT Segment Descriptor Invisible Registers In main memory Task Selector LDT Selector Visible Loaded from tables In memory
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Memory Paging: and above The paging mechanism translates a logical (virtual, linear) address generated by the program into a physical address that accesses a storage location either in memory or mass storage (e.g. hard disk) Applies to both real and protected modes View linear address space as pages of bytes which may or may not reside in memory If page is not in memory, it will be brought in for use 32-bit linear address space (as generated by software) is divided into three parts: –Directory: 10 bits, determines which page table in directory –Page table: 10 bits, determines which page in that page table –Memory offset: 12 bits, determines which byte in that page
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Memory Paging: and above 32 bit linear address (4 G of virtual bytes) bits 1K x 4 bytes (In memory) Page directory Base address Which page table? 1024 page table entries 1024 page entries 1024 x 1024 = 1 M 4 K bytes 1024 x 1024 x 4 K = 4 G Physical bytes Which page in that page table? Which byte in that page? offset Addressed Physical byte 1K x 4 bytes = 4KB (In memory)
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Paging: and above The paging unit is controlled mainly by four control registers CR0-CR3 in the p CR4 is an additional control register on the Pentium processors only
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Control Registers: and above 1: Paging 0: No Paging (address is physical) Determine the state of the PCD and PET pins on the p to control the operation of caches connected to the p
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Format for the linear address Format for an entry in the page directory or a page table 20 bits 000H 32-bit Address 10 bits 12 bits
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e 000H Memory Paging: and above 32 bit linear address (from segmentation) (4 G of virtual bytes) bits (In memory) Base Address of page directory Which page table in Directory? 1024 page table entries 1024 page entries 1024 x 1024 = 1 M 4 K byte page 1024 x 1024 x 4 K = 4 G Physical bytes Which page in that page table? Which byte in that page? Addressed Physical byte + CR3 000H 20 bits Append 12 bits 10 bits + + offset 10 bits Offset 12 bits offset 000H 20 bits 000H Append 00b Append 00b 12 bits Spread out table base addresses by 4K For 1024 pages 4 bytes WK 2
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Memory Paging: Example + 000H 12 bits 20 bits Physical Memory Pages Linear Byte Address Base Address of Page Directory 320H + 000H + Corresponding Physical Byte 000H + Start of Page Table 0 00b Physical Page 00110H (Linear) Physical Address
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Memory required to accommodate the page directory and the page tables To page the full linear address space of 4 GB: - Each page is 4KB, so we need to map (translate address for) 1M pages - 1M pages = 1K tables x (1 K page addresses / table) Page Tables: 1K tables x (1K x 4) = 4 MB = 4096 KB Page Directory: 1K x 4 = 4 KB Total: 4100 KB This is a considerable amount of memory So, some operating systems do not support paging for the total memory space, e.g. Windows 3.1 pages only 16 MB (only 4K pages) This requires only 4 page tables, occupying 16 KB of memory. The page directory table is 4 x 4B = 16 B
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Speeding Up the Paging Mechanism Paging requires accessing the page directory and a page table (in main memory) to generate the physical memory address of the required memory location This slows down memory access To speed this up, a cache memory can be used to store the most recent page address translations, which are likely to be accessed in the near future The uses a 32-entry TLB (translation look- aside buffer) Pentium processors use separate TLBs for instructions and data
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The Processor Model - Functional aspects- how the processor actually functions - Internal organization is determined by functionality required External Buses Control bus Memory I/O Devices Microprocessor-based System; e.g. a computer Two main tasks for the microprocessor in a system: 1.Interface with external peripherals 2.Execute instructions
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The 8086 processor model (Organization) Two main functional units: - The Bus Interface Unit (BIU) - The Execution Unit (EU) The BIU generates memory and I/O addresses for reading code and transferring data to/from the processor The EU receives code and data from the BIU, executes the instructions, and stores results in the general purpose registers Pipelined architecture: Two hopefully independent operations that can overlap in time: - Fetch by the BIU - Execute by the EU
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The 8086 processor model BIU EU FIFO Instruction/Operand Queue BIU fills it by fetches from memory EU empties it by executing instructions ALU Interfacing: Generate all system timing signals (BIU) Synchronize data transfers With all system modules (BIU) Execution: Recognize, decode, and execute fetched program instructions (EU) No direct interaction With external p busses External p busses
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e The 8086 processor model Scenarios for pipeline inefficiency: Operand is not in queue Jump or branch instructions Long-executing instructions: e.g. 83 clock cycles for execution vs. 4 cycles for a fetch. BIU fills the buffer and waits idly! Non-pipelined Pipelined (8086) Fetch-Execute Overlap Wasted fetches after a Jump inst. 1. Operand not in Queue 2. Fetch it 3. Execute a. Turned out to be a Jump inst.! b. Start fetching at the target location c. Execute it * = Wasted Fetches and Executes (inefficiency) RISC & modern architectures can help: Reduce fetches from memory (operate mostly on registers Speed up memory fetches (cache) Use small instructions (both in length and in execution time) Try to predict how the jump will go