PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Computer Organization and Architecture
Computer Organization and Architecture
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Computer Organization and Architecture The CPU Structure.
Chapter 12 Three System Examples The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 12 CPU Structure and Function. Example Register Organizations.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
1 RISC Machines l RISC system »instruction –standard, fixed instruction format –single-cycle execution of most instructions –memory access is available.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
GCSE Computing - The CPU
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Introduction of Intel Processors
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
A Simple Computer consists of a Processor (CPU-Central Processing Unit), Memory, and I/O Memory Input Output Arithmetic Logic Unit Control Unit I/O Processor.
POWERPC ARCHITECTURE Term Paper Presentation by by Umut Yazkurt CMPE 511 Fall Fall
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
The MIPS R10000 Superscalar Microprocessor Kenneth C. Yeager Nishanth Haranahalli February 11, 2004.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
The R10000 Microprocessor A Presentation by Dennis Konz, Birger Nahs, Klaus Wagner & Sibbi Wormser.
Introduction to Microprocessors
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
IBM/Motorola/Apple PowerPC
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computer Architecture System Interface Units Iolanthe II in the Bay of Islands.
Chapter 3: Computer Organization Fundamentals
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Chao Han ELEC6200 Computer Architecture Fall 081ELEC : Han: PowerPC.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II.
Visit for more Learning Resources
William Stallings Computer Organization and Architecture 8th Edition
Embedded Systems Design
PowerPC 604 Superscalar Microprocessor
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Flow Path Model of Superscalars
Superscalar Processors & VLIW Processors
CS170 Computer Organization and Architecture I
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Comparison of Two Processors
* From AMD 1996 Publication #18522 Revision E
Introduction to Microprocessor Programming
First Generation 32–Bit microprocessor
Presentation transcript:

PowerPC 601 Stephen Tam

To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache Unit Memory Management Unit (MMU) Pipeline Structure Instruction buffer Multiply-Add Benchmark

PowerPC Processors The PowerPC 6XX line of microprocessors from IBM, Motorola and Apple viewed that personal PC’s would be required to fulfill and accommodate more power and resource intensive applications such as those associated with multimedia. Four implementations of the PowerPC architecture were initially announced:  PowerPC Original PowerPC microprocessor  PowerPC Low-cost, least powerful and consumes the least amount of power  PowerPC Faster, higher performance.  PowerPC The first 64-bit implementation of the PowerPC architecture. The PowerPC 601 is a high performance super-scalar processor implementing 3 independent execution units and 2 register files Execution (pipeline processing) units: Integer Unit (IU) or Fixed-Point Unit (FXU) Floating Point Unit (FPU) Branch Processing Unit (BPU)

Features PowerPC 601 Basic architecture Load/store Instruction length32 bit Byte/halfword load and storeYes Condition codesYes Conditional movesNo # of Integer registers32 Integer register size32/64 bit # of Floating point registers32 Floating point register size64 bit Floating point formatIEEE 32 bit, 64 bit Virtual address52-80 bit 32/64 bit mode bitYes SegmentationYes Page Size4 Kbytes Instruction/data cache size32 Kbytes Clock speed MHz

PowerPC 601 Architecture there are wide buses for memory, internal processor transfers, registers and on-board processing units.

Fixed-Point (Integer) Unit & Floating-Point Unit FXU(IU) Executes one instruction at a time Most instructions are single cycle instructions Interfaces with cache and MMU FPU Contains: a) Single precision multiply-add array b) Floating-point status and control register c) bit registers Buffers 2 extra instructions when FPU is busy Supports IEEE 754 FP data types

Branch Processing Unit Contains: a) An adder to compute the target address b) 3 special purpose registers i) Link register (LR) ii) Count Register (CTR) iii) Condition Register (CR) Performs look ahead in condition branches into CR Uses dedicated registers other than the General Purpose Registers (GPR)

Branching & Branch Prediction The 601 has special purpose registers in the BPU for holding, operating on and testing conditions A single branch instruction may implement a loop-closing branch by decrementing the hardware counter CTR, testing its value and branching if non-zero For unconditional branches or ones that only depend on the CTR, the branch is executed immediately and is considered a zero cycle branch. Branch prediction is uses static branch prediction made by the compiler To protect against wrong predictions the contents of the instruction buffer are save for a short period of time until instructions from the take paths are delivered from memory allows for instructions for the non-taken path to be available immediately if a wrong prediction is made.

Cache Unit & Memory Management Unit 32 Kbytes 8-way associative Unified (instruction and data) Has 2 ports 1) Instruction fetch 2) “snooping” transactions on system interface Supports (externally) 4 PetaBytes(2 52 ) of virtual memory and 4 Gb of physical memory Implements demand paging for VM

Pipeline Structure FetchUp to eight instructions are fetched into an instruction buffer DispatchInstructions are dispatched to either the FXU or FPU DecodeInstructions are decoded, with the source registers being read Instructions to the FXU are decoded together in the dispatch stage. ExecuteThis stage exists in the BPU as well as the FXU, where integer instructions execute and cache lookup and address processing also occur Execute1FPU multiplication Execute2FPU addition CacheFloating-point operands are sent to the FPU and the integer operands are sent to the FXU. WriteRegister file write.

Instruction Buffer The 601 has several buffers in the pipelines that allow storage of multiple fetched instructions and also the storage of several dispatched instructions. allows out-of-order dispatching (therefore, when a pipeline is blocked, dispatching may still continue to non-blocked ones) cache is unified meaning that both the instruction and data share a cache data and instructions will need to contend for cache access fetched instruction buffer of 8 instructions (even though the maximum processing rate is 3 instructions per cycle) data will have priority, the instructions are fetched and stored while it is able to

Hence… Up to three 32-bit instructions may be dispatched each cycle one each to FXU, FPU and BPU The unified cache provides A 32-bit interface to the FXU A 64-bit interface to the FPU a 256-bit interface to both the instruction and memory queues The I/O had a 32-bit address bus and a 64-bit data bus These buses are logically and physically decoupled from one another for support of piplined, non-pipelined, or even split bus transactions To reduce latency and increase performance, the 601 itself is capable of pipelining up to two outstanding operations onto the bus

Multiply-Add PowerPC 601 takes in three operands processes (A x B + C) or (A x B – C) in a single instruction Assuming program and data are cached, a 100-MHz 601 can sustain 100 million MACs (multiply-accumulate operations) per second on some digital filters

Benchmarking

Direct Benchmark Comparison with ADSP-2106x

More Benchmark Comparisons

General Purpose Processor Why use DSP when benchmarks show GPPs like PowerPCs perform better? Performance gained from complicated dynamic features Not suited for real-time applications Decreased real-time predictability Complicated optimizing code

References Hoskins, John, “The PowerPC Initiative”, 1995, Smith, James; Weis, Shlomo, “PowerPC 601 and Alpha 21064: A Tale of Two RISCs”, IEEE- Computer, June 1994, Vol. 27, No. 6, Page Lee, Ben, “Chapter 2: A Simple SuperScalar Processor – PowerPC 601”, “PowerPC Microprocessor- White paper”, (link is broken) Durisety, Chandra S.A., “PowerPC 601”, (link is broken) “Analysts Show CPU Can Handle Some Signal-Processing Tasks”, Microprocessor Report, May 8,