POWERPC ARCHITECTURE Term Paper Presentation by by Umut Yazkurt CMPE 511 Fall 2003-2004 Fall 2003-2004.

Slides:



Advertisements
Similar presentations
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
Advertisements

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Computer Organization and Architecture
Power PC Architecture Nirmal Chhugani.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Computer Organization and Architecture
Computer Organization and Architecture
Vacuum tubes Transistor 1948 ICs 1960s Microprocessors 1970s.
Processor Technology and Architecture
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
CS2422 Assembly Language & System Programming November 28, 2006.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
ARM programmer’s model and assembler Embedded Systems Programming.
PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.
What is an instruction set?
1 RISC Machines l RISC system »instruction –standard, fixed instruction format –single-cycle execution of most instructions –memory access is available.
Microprocessor Systems Design I Instructor: Dr. Michael Geiger Spring 2012 Lecture 2: 80386DX Internal Architecture & Data Organization.
Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.
Embedded Systems Programming
Gursharan Singh Tatla Block Diagram of Intel 8086 Gursharan Singh Tatla 19-Apr-17.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Computer Organization and Assembly language
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CSE378 MIPS ISA1 MIPS History MIPS is a computer family –R2000/R3000 (32-bit); R4000/4400 (64-bit); R8000; R10000 (64-bit) etc. MIPS originated as a Stanford.
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Intel
Some material taken from Assembly Language for x86 Processors by Kip Irvine © Pearson Education, 2010 Slides revised 2/2/2014 by Patrick Kelley.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Topic:The Motorola M680X0 Family Team:Ulrike Eckardt Frederik Fleck André Kudra Jan Schuster Date:Thursday, 12/10/1998 CS-350 Computer Organization Term.
CET 520/ Gannod1 The MIPS Architecture Section 2.12.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Chapter Six Sun SPARC Architecture. SPARC Processor The name SPARC stands for Scalable Processor Architecture SPARC architecture follows the RISC design.
4-1 Chapter 4 - The Instruction Set Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Introduction First 32 bit Processor in Intel Architecture. Full 32 bit processor family Sixth member of 8086 Family SX.
Computer Architecture and Organization
IBM System/360 Matt Babaian Nathan Clark Paul DesRoches Jefferson Miner Tara Sodano.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
IBM/Motorola/Apple PowerPC
Lecture 04: Instruction Set Principles Kai Bu
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
Chao Han ELEC6200 Computer Architecture Fall 081ELEC : Han: PowerPC.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II.
Chapter Overview General Concepts IA-32 Processor Architecture
Protection in Virtual Mode
A Closer Look at Instruction Set Architectures
Introduction to 8086 Microprocessor
PowerPC 604 Superscalar Microprocessor
Superscalar Processors & VLIW Processors
Computer Architecture
Introduction to Microprocessor Programming
Lecture 4: Instruction Set Design/Pipelining
Chapter 11 Processor Structure and function
Presentation transcript:

POWERPC ARCHITECTURE Term Paper Presentation by by Umut Yazkurt CMPE 511 Fall Fall

History PowerPC is a RISC architecture. PowerPC is a RISC architecture. It was jointly designed by Apple, IBM, and Motorola by early 1990s. It was jointly designed by Apple, IBM, and Motorola by early 1990s. Aim was to form the basis of a new generation of high-performance low-cost products ranging from low cost embedded controllers to massively parallel supercomputers. Aim was to form the basis of a new generation of high-performance low-cost products ranging from low cost embedded controllers to massively parallel supercomputers. Because of its already largely installed software base, they began with IBM’s POWER architecture which was developed for RS/6000 systems. Because of its already largely installed software base, they began with IBM’s POWER architecture which was developed for RS/6000 systems.

History Apple, IBM, and Motorola designed the first four members of the PowerPC microprocessor family simultaneously. PowerPC 601™ : the first 32 bit implementation of the PowerPC architecture providing medium levels of performance for desktop computers and workstations. PowerPC 601™ : the first 32 bit implementation of the PowerPC architecture providing medium levels of performance for desktop computers and workstations. PowerPC 603™ : a 32-bit low-power processor primarily for cost- sensitive desktop and portable personal computer systems. PowerPC 603™ : a 32-bit low-power processor primarily for cost- sensitive desktop and portable personal computer systems. PowerPC 604™ : 32-bit implementations of the PowerPC architecture designed for use in high performance desktop, workstation, and symmetric multiprocessing computer systems. PowerPC 604™ : 32-bit implementations of the PowerPC architecture designed for use in high performance desktop, workstation, and symmetric multiprocessing computer systems. PowerPC 620™ : 64-bit implementation of the PowerPC architecture providing high levels of performance for technical and scientific workstations, application and LAN servers and symmetric multiprocessing computer systems. PowerPC 620™ : 64-bit implementation of the PowerPC architecture providing high levels of performance for technical and scientific workstations, application and LAN servers and symmetric multiprocessing computer systems.

History (G1) (G1) 604/604E (G2) 740/750 (G3) G4G5 First shipping Year Clock Speed (MHZ) Up to 2000 L1 Cache - 32kb inst 32kb data 32kb inst 32kb data 32kb inst 32kb data 64kb inst 32kb data L2 Cache Support-- 256k – 1Mb 256kb-1Mb 512kb on die # of trans (10^6) Over 58

General The PowerPC architecture specifies an instruction set architecture (ISA). The PowerPC architecture specifies an instruction set architecture (ISA). It is independent of implementation aspects. It is independent of implementation aspects. It allows anyone to design and fabricate compatible PowerPC processors independent of implementation differences as the technology advances. It allows anyone to design and fabricate compatible PowerPC processors independent of implementation differences as the technology advances.

General All PowerPC processors run the same core PowerPC instruction set. All PowerPC processors run the same core PowerPC instruction set. They differ primarily in the degree of dedicated hardware support for multiple execution units, cache size and capability, length of pipeline, and interface busses. They differ primarily in the degree of dedicated hardware support for multiple execution units, cache size and capability, length of pipeline, and interface busses. These differences result in different tradeoffs in processing performance, die area, and power dissipation. These differences result in different tradeoffs in processing performance, die area, and power dissipation.

Programming Model The PowerPC architecture is a full 64-bit architecture with full 64-bit integers and 64-bit logical address pointers. The PowerPC architecture is a full 64-bit architecture with full 64-bit integers and 64-bit logical address pointers. It also has a well defined 32-bit subset. Designers may implement either 32- or 64-bit machines. To enable 32- bit applications to run on all PowerPC processors, 64 bit machines are required to support a 32-bit operating mode. It also has a well defined 32-bit subset. Designers may implement either 32- or 64-bit machines. To enable 32- bit applications to run on all PowerPC processors, 64 bit machines are required to support a 32-bit operating mode. The 32-bit processors have 32-bit wide general registers and branch-address registers; 64 bit processors have 64- bit wide registers. The 32-bit processors have 32-bit wide general registers and branch-address registers; 64 bit processors have 64- bit wide registers.

Programming Model Instructions always operate on machine’s full register width: 32 or 64 bits. Instructions always operate on machine’s full register width: 32 or 64 bits. Instructions are mode independent ; a given instruction operates the same on 32-bit machines, 64-bit machines, and 64-bit machines operating in 32-bit mode. Instructions are mode independent ; a given instruction operates the same on 32-bit machines, 64-bit machines, and 64-bit machines operating in 32-bit mode. A 64-bit machine operating in 32-bit mode passes only the low-order 32 bits of an address to the address translation mechanism, and the ALU calculates carry and over-flow based on a 32-bit result. A 64-bit machine operating in 32-bit mode passes only the low-order 32 bits of an address to the address translation mechanism, and the ALU calculates carry and over-flow based on a 32-bit result.

Logical Address Space For 32-bit machines and 64-bit machines operating in the 32-bit mode, the linear array of bytes that can be addressed by a pointer is 4 gigabytes. For 32-bit machines and 64-bit machines operating in the 32-bit mode, the linear array of bytes that can be addressed by a pointer is 4 gigabytes. For 64-bit machines operating in 64-bit mode, 18 terabytes of memory can be addressed. For 64-bit machines operating in 64-bit mode, 18 terabytes of memory can be addressed.

Initialization When the processor is first initialized, it is in supervisor (also called privileged) mode. In this mode, all processor resources, including registers and instructions are accessible. When the processor is first initialized, it is in supervisor (also called privileged) mode. In this mode, all processor resources, including registers and instructions are accessible. The processor can limit access to certain privileged registers and instructions by placing itself in user mode. The processor can limit access to certain privileged registers and instructions by placing itself in user mode. This protection limits application code from being able to modify global and sensitive resources, such as the caches, memory management system, and timers. This protection limits application code from being able to modify global and sensitive resources, such as the caches, memory management system, and timers.

Architecture defines five types of registers : Special Purpose Registers (SPRs) Special Purpose Registers (SPRs) General Purpose Registers (GPRs) General Purpose Registers (GPRs) Floating Point Registers (FPRs) Floating Point Registers (FPRs) Device Control Registers (DCRs) Device Control Registers (DCRs) Machine State Register (MSR) Machine State Register (MSR) Registers

Registers SPRs give status and control of resources within the processor core. SPRs give status and control of resources within the processor core.

Registers Five important user mode SPRs are: The Fixed-Point Exception Register (XER) is used for indicating conditions for integer operations, such as carries and overflows. The Fixed-Point Exception Register (XER) is used for indicating conditions for integer operations, such as carries and overflows. The Floating-Point Status and Control Register (FPSCR) is a 32-bit register used to store the status and control of the floating-point operations. The Floating-Point Status and Control Register (FPSCR) is a 32-bit register used to store the status and control of the floating-point operations. The Count Register (CTR) is used to hold a loop count that can be decremented during the execution of branch instructions. The Count Register (CTR) is used to hold a loop count that can be decremented during the execution of branch instructions. The Condition Register (CR) is a 32-bit register grouped into eight fields, where each field is 4 bits that signify the result of an instruction’s operation: Equal (EQ), Greater Than (GT), Less Than (LT), and Summary Overflow (SO). The Condition Register (CR) is a 32-bit register grouped into eight fields, where each field is 4 bits that signify the result of an instruction’s operation: Equal (EQ), Greater Than (GT), Less Than (LT), and Summary Overflow (SO). The Link Register (LR) contains the address to return to at the end of a function call. The Link Register (LR) contains the address to return to at the end of a function call.

Registers General Purpose Registers : General Purpose Registers : The Architecture specifies that all implementations have 32 GPRs (GPR0 - GPR31). The Architecture specifies that all implementations have 32 GPRs (GPR0 - GPR31). GPRs are the source and destination of all fixed-point operations and load/store operations. They also provide access to SPRs and DCRs. GPRs are the source and destination of all fixed-point operations and load/store operations. They also provide access to SPRs and DCRs. They are all available for use in every instruction with one exception: In certain instructions, GPR0 simply means “0” and no lookup is done for GPR0’s contents. They are all available for use in every instruction with one exception: In certain instructions, GPR0 simply means “0” and no lookup is done for GPR0’s contents.

Registers Floating Point Registers : The PowerPC architecture provides thirty-two 64-bit floating-point registers. The PowerPC architecture provides thirty-two 64-bit floating-point registers. Device Control Registers : DCRs are similar to SPRs in that they give status and control information, but DCRs are for resources outside the processor core. DCRs are similar to SPRs in that they give status and control information, but DCRs are for resources outside the processor core. DCRs allow for memory-mapped I/O control without using up portions of the memory address space. DCRs allow for memory-mapped I/O control without using up portions of the memory address space.

Registers Machine State Register : Machine State Register : MSR represents the state of the machine. MSR represents the state of the machine. It is accessed only in supervisor mode, and contains the settings for things such as memory translation, cache settings, interrupt enables, user/privileged state, and floating point availability. Exact control bits vary by implementation. It is accessed only in supervisor mode, and contains the settings for things such as memory translation, cache settings, interrupt enables, user/privileged state, and floating point availability. Exact control bits vary by implementation. The MSR does not readily fit into the SPR/DCR/GPR classification, as it contains its own pair of instructions to read and write the contents of the MSR into a GPR. The MSR does not readily fit into the SPR/DCR/GPR classification, as it contains its own pair of instructions to read and write the contents of the MSR into a GPR.

Data Types PowerPC can deal with data types of 8–bits (byte), 16-bits (halfword), 32-bits (word) and 64-bits (doubleword) in length. It can use either little-endian or big-endian style; that is, the least significant byte is stored in the lowest or highest address. PowerPC can deal with data types of 8–bits (byte), 16-bits (halfword), 32-bits (word) and 64-bits (doubleword) in length. It can use either little-endian or big-endian style; that is, the least significant byte is stored in the lowest or highest address. Fixed-point data types include: Fixed-point data types include: * Unsigned byte * Unsigned halfword * Signed halfword * Unsigned word * Signed word * Unsigned doubleword * Byte Strings: From 0 – 128 bytes in length Floating-point data types include IEEE-754 single- and double- precision types. Floating-point data types include IEEE-754 single- and double- precision types.

Instruction Format The architecture encodes all instructions in 32 bits and aligns them on word address boundaries in memory. The architecture encodes all instructions in 32 bits and aligns them on word address boundaries in memory. Instructions are first decoded by the upper 6 bits, in a field called the primary opcode. The remaining 26 bits contain operands and/or reserved fields. Instructions are first decoded by the upper 6 bits, in a field called the primary opcode. The remaining 26 bits contain operands and/or reserved fields. Different types of instructions defined are : Different types of instructions defined are : ALU, Floating Point, Load/Store, Branch, Condition and Synchronization Instructions ALU, Floating Point, Load/Store, Branch, Condition and Synchronization Instructions

Instruction Types

Addressing Modes Three types of operand addressing : Memory operand addressing: Memory operand addressing: Indirect addressing : Indirect addressing : * Base address in a GPR + a 16-bit sign-extended literal * Base address in a GPR + a 16-bit sign-extended literal Indirect-indexed addressing : Indirect-indexed addressing : * Base address in a GPR + displacement from another GPR * Base address in a GPR + displacement from another GPR ALU and Floating-point instruction operand addressing: ALU and Floating-point instruction operand addressing: Three-register Format Three-register Format Branch Operand Addressing : Branch Operand Addressing : Absolute : Use the literal as the absolute address. Absolute : Use the literal as the absolute address. Relative : Use the literal as the displacement from the branch instruction address. Relative : Use the literal as the displacement from the branch instruction address. Indirect : Take the target address from the LR or CTR registers Indirect : Take the target address from the LR or CTR registers

PowerPC G4e Pipelining Seven Stage Pipeline Seven Stage Pipeline Superscalar Microprocessor – allows multiple instructions to be executed in parallel. Superscalar Microprocessor – allows multiple instructions to be executed in parallel. Nine Execution Units BPU : Branch Processing Unit BPU : Branch Processing Unit VPU : Vector Permute Unit VPU : Vector Permute Unit VIU : Vector Integer Unit VIU : Vector Integer Unit VCIU : Vector Complex Integer Unit VCIU : Vector Complex Integer Unit VFPU : Vector Floating Point Unit VFPU : Vector Floating Point Unit FPU : Floating Point Unit FPU : Floating Point Unit IU : Integer Unit IU : Integer Unit CIU : Complex Integer Unit CIU : Complex Integer Unit LSU : Load/Store Unit LSU : Load/Store Unit

G4e’s microarchitecture with emphasis on pipeline stages of the front end and the functional units.

PowerPC G4e Pipeline Stages Stages 1 and 2 - Instruction Fetch: Stages 1 and 2 - Instruction Fetch: These two stages are both dedicated primarily to grabbing an instruction from the L1 cache. These two stages are both dedicated primarily to grabbing an instruction from the L1 cache. The G4e can fetch four instructions per clock cycle from the L1 cache and send them on to the next stage The G4e can fetch four instructions per clock cycle from the L1 cache and send them on to the next stage Stage 3 - Decode/Dispatch: Stage 3 - Decode/Dispatch: Once an instruction has been fetched, it goes into a 12-entry instruction queue to be decoded. Once an instruction has been fetched, it goes into a 12-entry instruction queue to be decoded. The G4e's decoder can dispatch up to three instructions per clock cycle to the next stage. The G4e's decoder can dispatch up to three instructions per clock cycle to the next stage.

PowerPC G4e Pipeline Stages Stage 4 - Issue: Stage 4 - Issue: The first queue Floating-Point Issue Queue (FIQ), which holds floating-point (FP) instructions that are waiting to be executed. The first queue Floating-Point Issue Queue (FIQ), which holds floating-point (FP) instructions that are waiting to be executed. The second is the Vector Issue Queue (VIQ), which holds vector operations. The second is the Vector Issue Queue (VIQ), which holds vector operations. The third queue is the General Instruction Queue (GIQ), which holds everything else. The third queue is the General Instruction Queue (GIQ), which holds everything else. Once the instruction leaves its issue queue, it goes to the execution engine to be executed. Once the instruction leaves its issue queue, it goes to the execution engine to be executed.

PowerPC G4e Pipeline Stages Stage 5 - Execute: Stage 5 - Execute: The instructions can pass out-of-order from their issue queues into their respective functional units and be executed. The instructions can pass out-of-order from their issue queues into their respective functional units and be executed. Stage 6 and 7 - Complete and Write-Back : Stage 6 and 7 - Complete and Write-Back : In these two stages, the instructions are put back into the order in which they came into the processor, and their results are written back to memory. In these two stages, the instructions are put back into the order in which they came into the processor, and their results are written back to memory.

Inside of IBM PowerPC 405lp Processor