COS2014 IA-32 Processor Architecture
1 Overview Goal: Understand IA-32 architecture Basic Concepts of Computer Organization Instruction execution cycle Basic computer organization Data storage in memory How programs run IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System
2 Recall: Computer Model for ASM CPU Memory MOV AX, a ADD AX, b MOV x, AX … a b x AX BX PC Register ALU x a + b
Meanings of the Code (assumed) Assembly code Machine code MOV AX, a (Take the data stored in memory address ‘a’, and move it to register AX) ADD AX, b (Take the data stored in memory address ‘b’, and add it to register AX) MOV x, AX (Take the data stored in register AX, and move it to memory address ‘x’) MOV register address AX memory address a ADD
Another Computer Model for ASM 4 … ALU Memory Register AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR PC: program counter IR: instruction register Stored program architecture Processor
Step 1: Fetch (MOV AX, a) 5 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR
Step 2: Decode (MOV AX,a) 6 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR Controller clock
Step 3: Execute (MOV AX,a) 7 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR Controller clock
Step 1: Fetch (ADD AX,b) 8 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR
Step 2: Decode (ADD AX,b) 9 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR Controller clock
Step 3a: Execute (ADD AX,b) 10 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR Controller clock
Step 3b: Write Back (ADD AX,b) 11 … ALU MemoryRegister AX BX address … a b x MOV AX, a ADD AX, b MOV x, AX data PC IR Controller clock
12 Basic Computer Organization Clock synchronizes CPU operations Control unit (CU) coordinates execution sequence ALU performs arithmetic and bitwise processing
13 Clock Operations in a computer are triggered and thus synchronized by a clock Clock tells “when”: (no need to ask each other!!) When to put data on output lines When to read data from input lines Clock cycle measures time of a single operation Must long enough to allow signal propagation
14 Instruction/Data for Operations Where are the instructions needed for computer operations from? Stored-program architecture: The whole program is stored in main memory, including program instructions (code) and data CPU loads the instructions and data from memory for execution Don ’ t worry about the disk for now Where are the data needed for execution? Registers (inside the CPU, discussed later) Memory Constant encoded in the instructions
15 Memory Organized like mailboxes, numbered 0, 1, 2, 3, …, 2 n -1. Each box can hold 8 bits (1 byte) So it is called byte-addressing Address of mailboxes: 16-bit address is enough for up to 64K 20-bit for 1M 32-bit for 4G Most servers need more than 4G!! That ’ s why we need 64-bit CPUs like Alpha (DEC/Compaq/HP) or Merced (Intel) …
16 Storing Data in Memory Character String: So how are strings like “ Hello, World! ” are stored in memory? ASCII Code! (or Unicode … etc.) Each character is stored as a byte Review: how is “ 1234 ” stored in memory? Integer: A byte can hold an integer number: ‒ between 0 and 255 (unsigned) or ‒ between – 128 and 127 (2 ’ s complement) How to store a bigger number? Review: how is 1234 stored in memory?
17 Big or Little Endian? Example: 1234 is stored in 2 bytes. = in binary = 04 D2 in hexadecimal Do you store 04 or D2 first? Big Endian: 04 first Little Endian: D2 first Intel ’ s choice Reason: more consistent for variable length (e.g., 2 bytes, 4 bytes, 8 bytes … etc.)
18 Cache Memory High-speed expensive static RAM both inside and outside the CPU. Level-1 cache: inside the CPU chip Level-2 cache: often outside the CPU chip Cache hit: when data to be read is already in cache memory Cache miss: when data to be read is not in cache memory
19 How a Program Runs?
20 Load and Execute Process OS searches for program’s filename in current directory and then in directory path If found, OS reads information from directory OS loads file into memory from disk OS allocates memory for program information OS executes a branch to cause CPU to execute the program. A running program is called a process Process runs by itself. OS tracks execution and responds to requests for resources When the process ends, its handle is removed and memory is released How? OS is only a program!
21 Multitasking OS can run multiple programs at same time Multiple threads of execution within the same program Scheduler utility assigns a given amount of CPU time to each running program Rapid switching of tasks Gives illusion that all programs are running at the same time Processor must support task switching What supports are needed from hardware?
22 What's Next General Concepts IA-32 Processor Architecture Modes of operation Basic execution environment Floating-point unit Intel microprocessor history IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System
23 Modes of Operation Protected mode native mode (Windows, Linux) Programs are given separate memory areas named segments Real-address mode native MS-DOS System management mode power management, system security, diagnostics Virtual-8086 mode hybrid of Protected each program has its own 8086 computer
24 Basic Execution Environment Address space: Protected mode 4 GB 32-bit address Real-address and Virtual-8086 modes 1 MB space 20-bit address
25 Basic Execution Environment Program execution registers: named storage locations inside the CPU, optimized for speed Z N Register Memory PCIR ALU clock Controller
26 General Purpose Registers Used for arithmetic and data movement Addressing: AX, BX, CX, DX: 16 bits Split into H and L parts, 8 bits each Extended into E?X to become 32-bit register (i.e., EAX, EBX,…etc.)
27 Index and Base Registers Some registers have only a 16-bit name for their lower half:
28 Some Specialized Register Uses General purpose registers EAX: accumulator, automatically used by multiplication and division instructions ECX: loop counter ESP: stack pointer ESI, EDI: index registers (source, destination) for memory transfer, e.g. a[i,j] EBP: frame pointer to reference function parameters and local variables on stack EIP: instruction pointer (i.e. program counter)
29 Some Specialized Register Uses Segment registers In real-address mode: indicate base addresses of preassigned memory areas named segments In protected mode: hold pointers to segment descriptor tables CS: code segment DS: data segment SS: stack segment ES, FS, GS: additional segments EFLAGS Status and control flags (single binary bits) Control the operation of the CPU or reflect the outcome of some CPU operation
30 Status Flags (EFLAGS) Reflect the outcomes of arithmetic and logical operations performed by the CPU Carry: unsigned arithmetic out of range Overflow: signed arithmetic out of range Sign: result is negative Zero: result is zero Auxiliary Carry: carry from bit 3 to bit 4 Parity: sum of 1 bits is an even number Z N Register Memory PCIR ALU clock Controller
31 System Registers Application programs cannot access system registers IDTR (Interrupt Descriptor Table Register) GDTR (Global Descriptor Table Register) LDTR (Local Descriptor Table Register) Task Register Debug Registers Control registers CR0, CR2, CR3, CR4 Model-Specific Registers
32 Floating-Point, MMX, XMM Reg. Eight 80-bit floating-point data registers ST(0), ST(1),..., ST(7) arranged in a stack used for all floating-point arithmetic Eight 64-bit MMX registers Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations
33 Intel Microprocessors Early microprocessors: Intel 8080: ‒ 64K addressable RAM, 8-bit registers ‒ CP/M operating system ‒ S-100 BUS architecture ‒ 8-inch floppy disks! Intel 8086/8088 ‒ IBM-PC used 8088 ‒ 1 MB addressable RAM, 16-bit registers ‒ 16-bit data bus (8-bit for 8088) ‒ separate floating-point unit (8087) This is where “real-address mode” comes from!
34 Intel Microprocessors The IBM-AT Intel ‒ 16 MB addressable RAM ‒ Protected memory ‒ Introduced IDE bus architecture ‒ floating point unit Intel IA-32 Family Intel386: 4 GB addressable RAM, 32-bit registers, paging (virtual memory) Intel486: instruction pipelining Pentium: superscalar, 32-bit address bus, 64-bit internal data path
35 Intel Microprocessors Intel P6 Family Pentium Pro: advanced optimization techniques in microcode Pentium II: MMX (multimedia) instruction set Pentium III: SIMD (streaming extensions) instructions Pentium 4 and Xeon: Intel NetBurst micro- architecture, tuned for multimedia
36 What ’ s Next General Concepts of Computer Architecture IA-32 Processor Architecture IA-32 Memory Management Real-address mode Calculating linear addresses Protected mode Multi-segment model Paging Components of an IA-32 Microcomputer Input-Output System Understand it from the view point of the processor
Real-address Mode Programs assigned 1MB (2 20 ) of memory Programs can access any area of memory Can run only one program at a time Segmented memory scheme 16-bit segment * 10h + 16-bit offset = 20-bit linear (or absolute) address Segment value in CS, SS, DS, ES Offset value in IP, SP, BX & SI, DI
Example Accessing a variable in the data segment DS (data segment) = 0A43 (16-bits) BX (offset) = 0030 (16-bits) 0A43 * 10 = 0A430 (20-bits) (16-bits) 0A460 (linear address)
39 Segmented Memory linear addresses one segment Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value (in CS, DS, SS, or ES) added to a 16-bit offset segment value offset represented as
40 Calculating Linear Addresses Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset all done by the processor Example: convert 08F1:0100 to a linear address Adjusted Segment value: 0 8 F 1 0 Add the offset: Linear address:
41 Protected Mode Designed for multitasking Each process (running program) is assigned a total of 4GB of addressable RAM Two parts: Segmentation: provides a mechanism of isolating individual code, data, and stack so that multiple programs can run without interfering one another Paging: provides demand-paged virtual memory where sections of a program’s execution environ. are moved into physical memory as needed Give segmentation the illusion that it has 4GB of physical memory
42 Segmentation in Protected Mode Segment: a logical unit of storage (not the same as the “segment” in real-address mode) e.g., code/data/stack of a program, system data structures Variable size Processor hardware provides protection All segments in the system are in the processor’s linear address space (physical space if without paging) Need to specify: base address, size, type, … segment descriptor & descriptor table linear address = base address + offset
43 Flat Segment Model Use a single global descriptor table (GDT) All segments (at least 1 code and 1 data) mapped to entire 32-bit address space
44 Multi-Segment Model Local descriptor table (LDT) for each program One descriptor for each segment located in a system segment of LDT type
45 Segmentation Addressing Program references a memory location with a logical address: segment selector + offset Segment selector: provides an offset into the descriptor table CS/DS/SS points to descriptor table for code/data/stack segment
46 Convert Logical to Linear Address Segment selector points to a segment descriptor, which contains base address of the segment. The 32-bit offset from the logical address is added to the segment’s base address, generating a 32-bit linear address
47 Paging Supported directly by the processor Divides each segment into 4096-byte blocks called pages Part of running program is in memory, part is on disk Sum of all programs can be larger than physical memory Virtual memory manager (VMM): An OS utility that manages loading and unloading of pages Page fault: issued by processor when a page must be loaded from disk
48 What's Next General Concepts IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Skipped … Input-Output System
49 What's Next General Concepts IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System How to access I/O systems?
50 Different Access Levels of I/O Call a HLL library function (C++, Java) easy to do; abstracted from hardware slowest performance Call an operating system function specific to one OS; device-independent medium performance Call a BIOS (basic input-output system) function may produce different results on different systems knowledge of hardware required usually good performance Communicate directly with the hardware May not be allowed by some operating systems
51 Displaying a String of Characters When a HLL program displays a string of characters, the following steps take place: Calls an HLL library function to write the string to standard output Library function (Level 3) calls an OS function, passing a string pointer OS function (Level 2) calls a BIOS subroutine, passing ASCII code and color of each character BIOS subroutine (Level 1) maps the character to a system font, and sends it to a hardware port attached to the video controller card Video controller card (Level 0) generates timed hardware signals to video display to display pixels
IA-32: Addressing Modes IA-32 machine instructions act on zero or more operands. Some operands are specified explicitly in an instruction whereas other operands are implicit in an instruction. The data for a source operand can be located in any of the following places - –The instruction itself (immediate operand) –A register –A memory location –An I/O Port
ตัวแสดงการทำงานของรหัสคำสั่ง ข้อมูลสามารถเก็บไว้ที่ –A register –A memory location –An I/O Port –เราจะดูความแตกต่างของ addressing modes ใน IA- 32. จะดูจากตัวอย่างคำสั่ง MOV ที่จะอธิบายการอ้งอิง แอดเดรสในรูปแบบต่างๆ. –คำสั่ง MOV ใช้เคลื่อนย้ายข้อมูลไปหรือมาจาก memory, register หรือ immediate values. มี กฏเกณฑ์ดังนี้ –MOV, –ตัวแสดงการทำงานของรหัสคำสั่งในส่วนของ destination และ source อยู่ใน memory, register or immediate. However, they can’t both be memory at the same time.
ตัวอย่าง –MOV EAX, 5 ; Moves into the EAX register the value 5 –MOV EBX, EAX ; Moves into register EBX the value of register EAX –MOV [EBX], EAX ; Moves into the memory address pointed to by EBX the value EAX
The IA-32 provides a total of seven distinct addressing modes: –register –immediate –direct –register indirect –based relative –indexed relative –based indexed relative
1.Register Addressing Mode Examples of register addressing mode follow: –MOV BX, DX ; copy contents of register BX into register DX –MOV ES, AX ; copy contents of register AX into segment register ES –MOV EDX, ESI ;copy contents of register ESI into register EDX
Basic Operand Types Register Operands Very efficient Examples inc ECX mov BX, AX mov SI, 10
–2. Immediate Addressing Mode –Examples: –MOV AX, 2550h ; move value 2550h into the AX register –MOV ESI, h ; move value h into the ESI register –MOV BL, 40h ; move value 40h into the BL register
Basic Operand Types Immediate Operands Immediate operands are constants Examples add AX, 10 mov BL, 'a' mov BL, 61h Numbers do not have a length attribute. For example, we can add 10 to AH, AX, and EAX. Of course the immediate value fit in the destination mov AL, h ; Illegal
To move information to the segment registers, the data must first be moved to a general-purpose register and then into the segment registers. Example: –real-address mode: –MOV AX, 2550h ; AX contains the segment address –MOV DS, AX ; load the DS segment register with AX –protected mode: –MOV AX, 08h ; AX contains the segment selector –MOV DS, AX ; load the DS segment register with AX
3. Direct Addressing Mode –real-address mode: –MOV DL, [2400h] ; moves content of DS:2400h into DL register –MOV AX, [5500h] ; moves content of DS:5500h into the AX register –protected mode: –MOV EAX, [ h] ; moves content of DS: h into the EAX register
Basic Operand Types Direct Operands Allows accessing memory variables by name Slower because memory is "slow" Example mov DX, wVal1 add DX, wVal2 mov wData, DX
4. Register Indirect Addressing Mode –real-address mode: –MOV AL, [BX] ; moves into AL the contents of the memory location pointed to by DS:BX –MOV [DI], AL ; moves into DS:DI the contents of the register AL –protected mode: –MOV AL, [EAX] ; moves into AL the contents of the memory location pointed to by DS:EAX –MOV [EBX], ESI ; moves into DS:EBX the contents of the register ESI –MOV BL, [ESI] ; moves into BL the contents of the memory location pointed to by DS:ESI
Indirect Addressing Indirect Addressing Modes mov AX, [ESI] AX ESI Memory A 2C 2E
Indirect Addressing Indirect Addressing Modes: Example .data List DWORD 1, 3, 10, 6, 2, 9, 2, 8, 9 Number = ($ - List)/4 or LENGTHOF List.code … ; sum values in list mov EAX, 0 ; sum = 0 mov ECX, Number ; number of values mov ESI, OFFSET List ; ptr to List L3: add EAX, [ESI] ; add value add ESI, 4 ; point to next value loop L3 ; repeat as needed
Indirect Addressing Indirect Addressing Modes String BYTE "the words" Long = $ - String or LENGTHOF String … ; Print string 1 character at a time ; using WriteChar mov ECX, Long ; ECX <- number char mov ESI, OFFSET String; pt to string L9: mov AL, [ESI] ; get char. call WriteChar ; print char inc ESI ; point to next char loop L9 ; repeat
5. Based Relative Addressing Mode –In real-address mode, base registers BX and BP, as well as a displacement value, are used to calculate what is called an effective address. The default segment registers used for the calculation of the physical address are DS for BX and SS for BP. Example: –MOV CX, [BX+10h] ; moves 16-bit word at DS:BX+10h into CX register –MOV [BP+22h], CX ; moves contents of CX register into memory location SS:BP+22h
–In case of protected mode, registers EAX, EBX, ECX, EDX, ESI, EDI, EBP and ESP can be used as base registers. The default segment registers used for calculation of physical address are DS for EAX, EBX, ECX, EDX, ESI and EDI and SS for EBP and ESP. –MOV EAX, [EBX+10h] ; moves 32-bit quantity at DS:EBX+10h into the EAX register –MOV [EBP+22h], EDX ; moves contents of EDX register into memory location SS: EBP+22h
6. Indexed Relative Addressing Mode In real-address mode, index registers SI and DI, as well as a displacement value, are used to calculate what is called an effective address. The default segment register used for the calculation of the physical address is the DS register. MOV CX, [SI+10h] ; moves 16-bit word at DS:SI+10h into CX register –MOV [DI+22h], CX ; moves contents of CX register into memory location DS:DI+22h –MOV EAX, [ESI+10h] ; moves 32-bit quantity at DS:ESI+10h into the EAX register
–In case of protected mode, registers EAX, EBX, ECX, EDX, ESI, EDI, EBP and ESP can be used as index registers. The default segment registers used for calculation of physical address are DS for EAX, EBX, ECX, EDX, ESI and EDI and SS for EBP and ESP. –MOV EAX, [ESI+10h] ; moves 32-bit quantity at DS:ESI+10h into the EAX register –MOV [EDI+22h], EDX ; moves contents of EDX register into memory location DS: EDI+22h
7. Based Indexed Addressing Mode –Examples: –real-address mode: –MOV CL, [BX+SI+8h] ; moves into CL register, byte at memory location DS:BX+SI+8h –MOV CL, [BX+DI+8h] ; moves into CL register, byte at memory location DS:BX+DI+8h –MOV CL, [BP+SI+8h] ; moves into CL register, byte at memory location SS:BP+SI+8h –MOV CL, [BP+DI+8h] ; moves into CL register, byte at memory location DS:BP+DI+8h
–protected mode: –MOV CL, [EBX+EAX+10h] ;moves into CL register, byte at memory location DS:EBX+EAX+10h –MOV CL, [EBP+EAX+10h] ;moves into CL register, byte at memory location SS:EBP+EAX+10h –MOV CL, [ESI+EDI+10h] ;moves into CL register, byte at memory location DS:ESI+EDI+10h
–Segment overrides and 32-bit scaling factors –MOV CX, ES:[SI+10h] ; moves 16-bit word at ES:SI+10h into CX register –In case of 32-bit instructions, we can also add a scaling factor in any of the based, indexed or based indexed addressing modes. The scaling factor must always be a power of 2. Example : –MOV EAX, [ESI + EAX*4] ; moves into EAX the 32-bit quantity at DS:ESI+EAX*4
Indirect Addressing Based and Indexed Addressing mov AX, List[ESI] AX ESI 4 List List+2 List+4 List+6 List+8 List+10 List+12 Memory 100
Indirect Addressing Based and Indexed Addressing mov ESI, OFFSET List mov AX, [ESI+4] AX ESI OFFSET List List List+2 List+4 List+6 List+8 List+10 List+12 Memory 100
Indirect Addressing Based - Indexed Addressing mov EBX, OFFSET List mov ESI, 4 mov AX, [EBX+ESI] AX ESI 4 List List+2 List+4 List+6 List+8 List+10 List+12 Memory 100 EBX OFFSET List +
77 Summary Central Processing Unit (CPU) Arithmetic Logic Unit (ALU) Instruction execution cycle Multitasking Floating Point Unit (FPU) Complex Instruction Set Real mode and Protected mode Motherboard components Memory types Input/Output and access levels