ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.

Slides:



Advertisements
Similar presentations
COMP375 Computer Architecture and Organization Senior Review.
Advertisements

Computer Organization and Architecture
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Vacuum tubes Transistor 1948 ICs 1960s Microprocessors 1970s.
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Chapter 12 Pipelining Strategies Performance Hazards.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
Chapter 12 CPU Structure and Function. Example Register Organizations.
PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.
1 CS402 PPP # 1 Computer Architecture Evolution. 2 John Von Neuman original concept.
Pipelining By Toan Nguyen.
1 Chapter 4 The Central Processing Unit and Memory.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
CH12 CPU Structure and Function
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer performance.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Computing Hardware Starter.
Advanced Higher Computing  Computer Architecture  Chapter 2.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Classification of Computers
Measuring System Performance The speed of a computer is often referred to as THROUGHPUT. This is very difficult to measure. It can be done with Measures.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
History of Microprocessor MPIntroductionData BusAddress Bus
IT253: Computer Organization
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Copyright © Curt Hill Parallelism in Processors Several Approaches.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
CPU Central Processing Unit
Protection in Virtual Mode
Visit for more Learning Resources
A Closer Look at Instruction Set Architectures
Embedded Systems Design
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Chapter III Desktop Imaging Systems & Issues
Introduction to Pentium Processor
Comparison of Two Processors
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
* From AMD 1996 Publication #18522 Revision E
Presentation transcript:

ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002

 What makes a supercomputer “super” is its ability to execute at least one billion floating- point operations per second. This is a staggering measure of speed, also known as a “gigaflop”  APPLE G4: can deliver performance of over one gigaflop, and has a theoretical peak performance of 3.6 gigaflops.  It is the first architecture to deliver over one gigaflop.

 The G4 Processor is also known as the MPC7400.  One of its advantages is its shorter processor pipelines: Pentium 4: 20 stages to accomplish a task. 20 stages to accomplish a task.G4: 7 stages.

 The G4 has a L3 cache that uses 2MB of DDR SDRAM running at a data rate of up to 500 MHz. It boosts processor function by providing fast access to data and application code at speeds of up to 4 gigabytes per second G4: L3 cache Pentium 4: L2 cache

More on L3 cache  The high speed L3 cache, with its dedicated bus, enables the G4 processor to receive data up to five times faster than it could from main memory. This low latency keeps the processors constantly fed with data, preventing idling while waiting for the next task to arrive. The L3 cache is large enough to store active application code and data. When an application is run, most of the active code for the program — along with most of the data being used — is in L3 cache. Therefore, the information most required by the processor is close at hand. It’s analogous to the caching of web pages on a hard disk drive: When the ‘Back’ button is clicked on a web browser, the computer will use the data loaded two pages ago — skipping the step of reloading the same data again — making the page appear quicker.

 The G4 processor was designed to be targeted at both portable and desktop computing system applications. This had a dramatic effect on its design, which is a 32-bit architecture (as shown in the next slide), combined with a 128-bit engine named Velocity Engine. This provides 32-bit effective addresses, integer data types of 8, 16, and 32-bits and floating-point data types of 32 and 64 bits.  See diagram on next slide:

G4 Hardware Design Diagram:

Standard features in the G4 architecture:  Branch processing unit This unit allows one branch to be processed per clock cycle, as well as fetching four instructions and resolving 2 speculations. This unit incorporates a 512-entry branch history table (BHT) and a 64-entry, 4-way set associative branch target instruction cache (BTIC).  Dispatch unit  Completion Unit This unit incorporates instruction tracking and peak completion of two instructions per cycle. As well as an 8-entry completion buffer.

features continued…  Fixed-point units (FXUs) that share 32 GPRs for integer operands.  Three-stage floating-point unit and a 32-entry FPR file  System Unit  Load/Store Unit: This unit incorporates all of the usual features such as 1 cycle load and store cache access, effective address generation, zero padding and sign extension. It also incorporates such features as internal floating-point conversion, sequencing for load/store multiples, as well as support for Big- and Little-endian addressing and all of their variants.  Memory Management Unit

The Velocity Engine  Behind the G4’s phenomenal performance is its Velocity Engine. The Velocity Engine processes data in huge 128-bit chunks, instead of the smaller 32-bit or 64-bit chunks used in traditional processors (it’s the 128-bit vector processing technology used in scientific supercomputers plus 162 new instructions to speed up computations). In addition, the G4 can perform four (in some cases eight) 32-bit floating-point calculations in a single cycle — two to four times faster than processors found in PCs.  See diagram on next page:

Structural Overview For G4 Velocity Engine Technology:

Applications  G4 : Resource-consuming software has been tested and compared: Adobe Photoshop 6 (20 Actions) I I Athlon 1.4GHz I================ 48 I I I I I Pentium 4 1.8GHz I====================== 59 I I I I I Dual G4/1000 I=============== 47 I time in seconds (SHORTER bar means faster)

 Digital media production and streaming on the G4 architecture is outstanding: Taking advantage of the power of the G4 architecture for digital media processing, producing the highest quality streaming audio available can be done in significantly less time.

Memory  The G4 microprocessor contains separate memory management units (MMUs) for instructions and data, supporting 4 Petabytes (2^52) of virtual memory and 4 Gigabytes (2^32) of physical memory. They also offer four instruction block address translation (iBAT) and four data block address translation (dBAT) registers.

Buses Provided By The G4  The G4 has a separate 32-bit address and 64-bit data bus each with its own set of arbitration and control signals. This allows for the decoupling of the data tenure from the address tenure of a transaction, and provides for a wide range of system bus implementations.  This is supported by a choice of two interface protocols; the 60x-bus interface and the MPX bus interface. The 60x protocol implements the PowerPC 32-bit bus interface. However, the MPX protocol includes several additional features that provide higher memory bandwidth, and more efficient use of the system bus in a multiprocessing environment. These interface protocols have been put into place to make the most of the Velocity Engine’s features, and to try and decrease the data and instruction transfer times.

Balance of Power  Optimum performance requires efficient operation from all levels of the system architecture. Accordingly, to enhance performance at the system level, the G4 architecture has been designed to accommodate the high volumes of system traffic required for complex processing. The major features of this balanced design include reduced memory traffic, integrated high-speed I/O and a fast, direct PCI bus.

To Sum It Up:  The G4 processor is an innovative design that has come about because of the increased demand for speed under processor intensive tasks, with impressive memory management, and short pipeline processing. Because the G4 pipeline is short, the processor recovers from bubbles more quickly, resulting in higher processor utilization. With fewer processing steps, faster recovery and higher processor utilization, processor output is maximized.

HAVE A NICE SUMMER! RON WEINWURZEL