Computer architecture Lecture 6: Processor’s structure Piotr Bilski.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Registers of the 8086/ /2002 JNM.
Computer Organization and Architecture
ENGS 116 Lecture 41 Instruction Set Design Part II Introduction to Pipelining Vincent H. Berk September 28, 2005 Reading for today: Chapter 2.1 – 2.12,
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
IA-32 Processor Architecture
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 2 The Microprocessor and its Architecture.
Computer Organization and Architecture The CPU Structure.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Chapter 12 Pipelining Strategies Performance Hazards.
Assembly Language for Intel-Based Computers Chapter 2: IA-32 Processor Architecture Kip Irvine.
Chapter 12 CPU Structure and Function. Example Register Organizations.
ICS312 Set 3 Pentium Registers. Intel 8086 Family of Microprocessors All of the Intel chips from the 8086 to the latest pentium, have similar architectures.
Microprocessor Systems Design I Instructor: Dr. Michael Geiger Spring 2012 Lecture 2: 80386DX Internal Architecture & Data Organization.
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 2 The Microprocessor and its Architecture.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
The 8086 Microprocessor The 8086, announced in 1978, was the first 16-bit microprocessor introduced by Intel Corporation 8086 is 16-bit MPU. Externally.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
An Introduction to 8086 Microprocessor.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Group 5 Tony Joseph Sergio Martinez Daniel Rultz Reginald Brandon Haas Emmanuel Sacristan Keith Bellville.
The Pentium Processor.
The Pentium Processor Chapter 3 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
Intel Pentium II Processor Brent Perry Pat Reagan Brian Davis Umesh Vemuri.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Chapter 11 CPU Structure and Function. CPU Structure CPU must: —Fetch instructions —Interpret instructions —Fetch data —Process data —Write data.
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
1 ICS 51 Introductory Computer Organization Fall 2009.
(-133)*33+44* *33+44*14 Input device memory calculator Output device controller Control bus data bus memory.
Chapter 2 Parts of a Computer System. 2.1 PC Hardware: Memory.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 21 & 22 Processor Organization Register Organization Course Instructor: Engr. Aisha Danish.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computers organization & Assembly Language Chapter 1 THE 80x86 MICROPROCESSOR.
INTRODUCTION TO INTEL X-86 FAMILY
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
1 x86 Programming Model Microprocessor Computer Architectures Lab Components of any Computer System Control – logic that controls fetching/execution of.
BITS Pilani Pilani Campus Pawan Sharma Lecture / ES C263 INSTR/CS/EEE F241 Microprocessor Programming and Interfacing.
The Microprocessor & Its Architecture A Course in Microprocessor Electrical Engineering Department Universitas 17 Agustus 1945 Jakarta.
Chapter 12 Processor Structure and Function. Central Processing Unit CPU architecture, Register organization, Instruction formats and addressing modes(Intel.
Chapter Overview General Concepts IA-32 Processor Architecture
Chapter 11 CPU Structure and Function
Protection in Virtual Mode
Part of the Assembler Language Programmers Toolbox
William Stallings Computer Organization and Architecture 8th Edition
Difference between Microprocessor and Microcontroller
Introduction to 8086 Microprocessor
Basic Microprocessor Architecture
Processor Organization and Architecture
CS 301 Fall 2002 Computer Organization
Assembly Language (CSW 353)
Computer Architecture
Computer Architecture CST 250
Unit-I 80386DX Architecture
CPU Structure and Function
Chapter 11 Processor Structure and function
Presentation transcript:

Computer architecture Lecture 6: Processor’s structure Piotr Bilski

Procesor’s tasks: Instruction fetching Instruction interpretation Data fetching Data processing Data saving These justify existence of the registers (temporary memory space)

Internal processor’s structure Registers Control Unit ALU Status flags Shifter Complementer Arithmetic and Boolean Logic

Block Scheme of Pentium 3 Processor

Block Scheme of P6 Core (Pentium Pro) – 1995 r. Front-end of the processor Core Completion unit

Register types Accessible for the user (addressing, data etc.) Inaccessible for the user (control, status) This categorization is not formal!

Registers accessible by the user General Purpose Registers (GPR) Data Addressing (segment pointer, stack, indexing) Conditional codes (state pointer, flags) – read-only!

Control and state registers Basic: –Program Counter (PC) –Instruction Decoding Register (IR) –Memory Address Register (MAR) –Memory Buffer Register (MBR) Program Status Word (PSW) Interrupt Vector Register Page Table Pointer

Program Status Word S – sign bit Z – bit set, if operation result is zero P – carry bit R – logical comparison result bit O – overflow bit I – Enable/disable interrupt execution N – supervisor mode SZ PR OIN OTHER

Registers in the Motorola MC68000 processor Data and address registers (32-bit) Specialization: 8 data registers (D0-D7) and 9 address registers (two used interchangeably in the user and supervisor modes) Control bus 24-bit, data bus 16-bit A7 register used as a Stack Pointer (SP) State register (SR)16-bit (another name: CCR) Program counter (PC) 32-bit Instructions are stored under even addresses

Registers in the Intel 8086 Processor 16-bit address and data registers Data/General Purpose Registers (AX, BX, CX, DX) Pointer and index registers (SP, BP, SI, DI) Segment registers (CS, DS, SS, ES) Instruction pointer State register

Intel 8086 Registers (cont.) AX BX CX DX Accumulator Base Counting Data SP BP SI DI Stack pointer Base pointer Source index Displ. ndex

Intel Pentium Processors Registers Organization 32-bit data and address registers Eight General Purpose Registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI) For the backward compatibility, the lower part of the registers are 16-bit registers 32-bit status register 32-bit instruction pointer

Floating-point registers of the Pentium processor Eight 80-bit numerical registers 16-bit control register 16-bit state register 16-bit floating point register content type word 48-bit instruction pointer 48-bit data pointer

EFLAGS register TF – trap flag IF – interrupt enable flag DF – direction flag IOPL – privileged input/output flag RF – resume flag AC – alignment control ID – identification flag CFCF PFPF AFAF ZFZF SFSF TFTF IFIF DFDF OFOF IO F NTNT 0 15 RFRF VMVM ACAC VI F VI P IDID 21 31

Registers in the Athlon 64 processor Compatibility with x86-64 architecture (40-bit physical address space, 48-bit virtual address space) Data and address registers 64-bit 8 general purpose registers (RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP), work in the 32-bit compatibility mode Opteron contains additional 8 general purpose registers (R8-R15) 16 SSE registers (XMM0-XMM15) 8 floating-point registers x87, 80-bit

Registers in the PowerPC processor 32 general purpose registers (64-bit) + exception register (XER) 32 registers for the floating point unit (64- bit) + state and control register (FPSCR) Branch processing unit registers: 32-bit condition register, 64-bit counting and binding registers

Instruction mode Instruction fetch Instruction address calc. Instruction decoding Argument address calc. Argument fetching Data operation Interrupts checking Interrupt handling Argument address calc. Writing argument Instruction executed, fetch the next one Multiple arguments Multiple results No interrupts Return to data Indirect addressing

Instruction fetching cycle Processor MAR MBR CU Memory Address bus Control bus Data bus PC IR

Indirect mode Processor MAR MBR CU Memory Address bus Control bus Data bus

Interrupt mode Processor MAR MBR CU Memory Address bus Control bus Data bus PC

Pipeline Problem: during the instruction cycle only one instruction is processed Solution: divide the cycle into smaller fragments Condition: time instants, when no main memory access is required! Cycle 1 Cycle 2 Cycle 3

Pipeline example - laundry LADR PA LA DR PA LA DR PA CYCLE 1 CYCLE 2 CYCLE 3 LADR PA 3 hours / cycle – 9 hours for all 3 hours / cycle – 5 hours for all !!

Prefetch NOTE: acceleration is smaller than double, as the memory access lasts longer than the instruction execution Instruction fetch Execution Instruction Result Instruction fetching Execution Instruction Result Waiting New address Denial

Basic phases of the instruction cycle: Instruction fetching (FI) Instruction decoding (DI) Operands calculation (CO) Operands fetching (FO) Instruction execution (EI) Writing outcome (WO) FI DI CO FO EI WO I1 I2 I3 I4

Branches and pipelining FI DI CO FO EI WO FI DI CO FO FI DI CO I1 I2 I3 I4 I5 I6 I21 I22 FI DI CO FO EI WO FI DI FI

Pipeline implementation algorithm

Problems of the pipelining Subsequent pipe phases don’t last the same amount of time Transferring data between the buffers may significantly increase pipeline execution time Dependency between the registers and memory in the pipeline optimization may be minimized with high stakes

Efficiency of the pipelining Cycle execution time: Time required to execute all the instructions: Instruction pipeline acceleration ratio:

Example of the pipeline efficiency

Modern Processors Pipelines Pentium 3 – 10 stages Athlon – 10 stages for ALU, 15 stages for FPU Pentium M – 12 stages Athlon 64/ 64 X2 – 12 stages for ALU, 17 stages for FPU Pentium 4 Northwood – 20 stages (hyperpipeline!!) Pentium 4 Prescott – 31 stages Core2Duo – 14 stages

Hazards They are pipelining disturbances There are data, resources and control hazards

Branch handling Pipeline multiplication Prefetch of the instruction Loop buffer Branch prediction Delayed branch

Multiplied pipelining Both instructions for simultaneous processing as a result of branch are loaded into two pipelines The main problem is to gain memory access for both instructions

Prefetch and loop buffer When branch instruction is decoded, the target instruction is fetched. It is stored until the branch is executed A buffer in memory to store the subsequent instructions is created It is useful when there are conditional branch instructions and loops involved Loop buffer Prefetch

Conditional Branch Prediction Static –Never occuring branch (Sun SPARC, MIPS) –Always occuring branch –Operation code prediction Dynamic –Occured/Didn’t occur switch –Branch history table

Static prediction The simplest, used as the fallback method, for instance in the Motorola MPC7450 processor Pentium 4 allowed inserting the code suggesting if the static prediction should point at the branch or not (so-called prediction hint)

Dynamic prediction of the conditional branches A conditional branch instruction history is stored It is represented by the bits stored in the cache memory Every instruction has its own history bits Another solution is the table storing informations about the conditional branch result

History bits prediction

Branch history table Branch instruction address History bits Target instruction

Local Branch Prediction Requires a separate history buffer for each instruction, although the history table can be common for all instructions Pentium MMX, Pentium 2 i 3 processors have local prediction circuits with 4 history bits and 16 positions for every type of instruction Local prediction efficiency is estimated at 97 %

Global Branch Prediction A common history for all branch instructions is stored in memory. It allows to consider dependencies between different branch instructions Rarely a better solution than the local prediction Hybrid solutions: shared unit of the global prediction and the history table (AMD processors, Pentium M, Core, Core 2)

Branch Prediction Unit A processor circuit responsible for prediction of the disturbances in the sequential code execution Often connected with the microoperation cache memory In Pentium 4 processor, the buffer for the branch prediction has 4096, in Pentium 3 – only 512. Therefore the former has a 33 percent better hit ratio than the latter

Location of the Branch Prediction Unit