IA-64 Microarchitecture --- Itanium Processor

Slides:



Advertisements
Similar presentations
CS 7810 Lecture 22 Processor Case Studies, The Microarchitecture of the Pentium 4 Processor G. Hinton et al. Intel Technology Journal Q1, 2001.
Advertisements

EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Lecture 18: VLIW and EPIC Static superscalar, VLIW, EPIC and Itanium Processor (First introduce fast and high- bandwidth L1 cache design)
Computer Architecture Lec 8 – Instruction Level Parallelism.
DAP.F96 1 Lecture 4: Hazards, Introduction to Compiler Techniques, Chapter 2.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Instruction Level Parallelism (ILP) Colin Stevens.
Mult. Issue CSE 471 Autumn 011 Multiple Issue Alternatives Superscalar (hardware detects conflicts) –Statically scheduled (in order dispatch and hence.
Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
DAP.F96 1 Lecture 9: Introduction to Compiler Techniques Chapter 4, Sections L.N. Bhuyan CS 203A.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Intel IA-64 Architecture Chehun Kim Glenn Ramos. Contents *Pipelining - Stages of pipelining *Microprogramming *Interconnection Structures.
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
Intel Pentium 4 Processor Presented by Presented by Steve Kelley Steve Kelley Zhijian Lu Zhijian Lu.
Intel Itanium Matt Layman Adam Sanders Aaron Still.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Intel Itanium IA64 - Merced Barbora Petrtýlová Tomáš Kubeš LS 2002/2003 Presentaiton for E36APS presented:
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα.
André Seznec Caps Team IRISA/INRIA 1 High Performance Microprocessors André Seznec IRISA/INRIA
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Use of Pipelining to Achieve CPI < 1
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
CS 352H: Computer Systems Architecture
Protection in Virtual Mode
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Out of Order Processors
/ Computer Architecture and Design
Introduction to Pentium Processor
The Microarchitecture of the Pentium 4 processor
Superscalar Processors & VLIW Processors
The EPIC-VLIW Approach
Superscalar Pipelines Part 2
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Comparison of Two Processors
Yingmin Li Ting Yan Qi Zhao
Alpha Microarchitecture
Lecture 8: Dynamic ILP Topics: out-of-order processors
Lecture 23: Static Scheduling for High ILP
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Chapter 3: ILP and Its Exploitation
8 – Simultaneous Multithreading
Presentation transcript:

IA-64 Microarchitecture --- Itanium Processor Jun Feng Jun Xie Huafeng Lü

Outline Introduction Pipeline Issue Performance Comparison Summary

Itanium Processor First implementation of IA-64 Compiler based exploitation of ILP Also has many features of superscalar

10-stage Pipeline Front-end Instruction delivery Operand delivery Execution

Front-end IPG, Fetch, Rotate Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold 8 bundles) Branch prediction is done using a multilevel adaptive predictor

Instruction delivery EXP and REN Distributes up to 6 instructions to the 9 functional units Implements registers renaming for both rotation and register stacking

Operand delivery WLD and REG Accesses the register file Performs register bypassing Accesses and updates a register scoreboard Checks predicate dependences

Execution EXE, DET and WRB Executes instructions through ALUs and load/store units Detects exceptions and posts NaTs Retires instructions and performs write-back

Integer Performance SPECint benchmark: considerably slower Itanium is considerably slower than Alpha 21264 and Pentium 4. Only: 60% of of P4, 68% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, 256KB on-chip L2 cache

Floating Point Performance SPECfp benchmarks: a different story Itanium is quicker than Alpha 21264 and Pentium 4. 108% of of P4, 120% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip, L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, on-chip L2 cache

Discussion on SPECfp Floating point app: competitive .higher degrees of ILP .aggressive memory system Art benchmark: 4 times of Pentium 4 Alpha: outperform when tuned In terms of power: worse than P4 56% of floating point performance per watt

Summary By Us Good floating point performance Poor integer performance Overall: not so good as Intel has advertised

Conclusion Large code size Only static instruction-level parallelism Cannot manage cache misses/hits flexibly Lack of applications