Intel IA-64 Architecture Chehun Kim Glenn Ramos. Contents *Pipelining - Stages of pipelining *Microprogramming *Interconnection Structures.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Organization, Bus Structure
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Final Project : Pipelined Microprocessor Joseph Kim.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
CIS 501: Comp. Arch. | Prof. Joe Devietti | Superscalar1 CIS 501: Computer Architecture Unit 8: Superscalar Pipelines Slides developed by Joe Devietti,
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
DAP.F96 1 Lecture 4: Hazards, Introduction to Compiler Techniques, Chapter 2.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Computer Organization and Architecture
1 Microprocessor-based Systems Course 4 - Microprocessors.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 17 Parallel Processing.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
DAP.F96 1 Lecture 9: Introduction to Compiler Techniques Chapter 4, Sections L.N. Bhuyan CS 203A.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Multicore Designs Presented By: Mahendra B Salunke Asst. Professor, Dept of Comp Engg., SITS, Narhe, Pune. URL:
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Basic Processing Unit (Week 6)
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Chapter 5 Basic Processing Unit
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Processor Architecture
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Protection in Virtual Mode
William Stallings Computer Organization and Architecture 8th Edition
Superscalar Processors & VLIW Processors
The EPIC-VLIW Approach
Superscalar Pipelines Part 2
IA-64 Microarchitecture --- Itanium Processor
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Control unit extension for data hazards
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Computer Architecture
Presentation transcript:

Intel IA-64 Architecture Chehun Kim Glenn Ramos

Contents *Pipelining - Stages of pipelining *Microprogramming *Interconnection Structures

Pipelining Pipelining - A means of introducing parallelism in to the essentially sequential nature of a machine-instruction program. IA 64 uses following methods for branches Prefetching with predication Prefetching with rotating registers Architectural support for software pipelining Modulo scheduling

IA-64 Instruction Pipeline -The Intel IA-64 Architecture’s pipeline is divided into 10 stages. -This design was developed for a single cycle ALU (4 ALU’s globally bypassed) and for low latency from data cache. IPGFETROTEXPRENWLDREGEXEDETWRB

IA-64 Pipeline IPGFETROTEXPRENWLDREGEXEDETWRB IPG – Instruction Pointer Generation FET – Fetch ROT - Rotate EXP – Expand REN – Rename WL.D – Word-Line Decode REG – Register Read EXE – Execute DET – Exception Detect WRB – Write-Back

IA-64 Pipeline The IA-64 instruction pipeline’s 10 stages are grouped into 4 phases. - Front End - Instruction Delivery - Operand Delivery - Execution

Front End -Front-end consists of stages IPG, FET and ROT. -This phase is responsible for fetching up to 32 bytes into a pre-fetch buffer. IPGFETROT EXPRENWLDREGEXEDETWRB

Instruction Delivery -This phase is composed of EXP and REN stages. -Dispersal of up to 6 instructions to 9 functional units occurs. -Implementation of registers for use in rotation and stacking occur. IPGFETROT EXPREN WLDREGEXEDETWRB

Operand Delivery - - This phase is composed of the WLD and REG stages. - - Register files are accessed and - - Accesses and updates a register scoreboard. This scoreboard is used to detect when individuals can proceed, so that a stall of 1 instruction in a bundle will not cause the entire bundle to stall. - - Check for dependencies. IPGFETROTEXPREN WLDREG EXEDETWRB

Execution -This phase consists of the EXE, DET and WRB stages. -In this phase, instructions are executed through the ALU and load/store units. -Exceptions are detected and NaTs are posted. -Instructions are retired and write-backs are performed. IPGFETROTEXPRENWLDREG EXEDETWRB

Microprogramming Intel IA-64 instruction set, EPIC, is a derived form of VLIW Intel IA-64 instruction set, EPIC, is a derived form of VLIW EPIC, standing for explicitly parallel instruction computing EPIC, standing for explicitly parallel instruction computing VLIW, standing for Very Long Instruction Word VLIW, standing for Very Long Instruction Word VLIW can perform multiple operations per cycle using horizontal microinstructions VLIW can perform multiple operations per cycle using horizontal microinstructions Remember that IA 64’s instruction word is 128 bits long and consists of 3 instructions Remember that IA 64’s instruction word is 128 bits long and consists of 3 instructions

Interconnection Structure

The processor uses a multidrop, shared system bus to provide four-way glueless multiprocessor system support. The processor uses a multidrop, shared system bus to provide four-way glueless multiprocessor system support. The 64-bit system bus uses a source-synchronous data transfer to achieve 266-Mtransfers/ s, which enables a bandwidth of 2.1 Gbytes/s. The 64-bit system bus uses a source-synchronous data transfer to achieve 266-Mtransfers/ s, which enables a bandwidth of 2.1 Gbytes/s. The combination of these features makes the Itanium processor system a scalable building block for large multiprocessor systems. The combination of these features makes the Itanium processor system a scalable building block for large multiprocessor systems.

Interconnection sturcture Source Synchronous Mode Source Synchronous Mode In source synchronous mode, the clock to data phase relationship at the input pins is maintained at the clock and data ports of the IOE input register. This mode is recommended for source synchronous data transfers. Data and clock signals at the IOE experience similar buffer delays as long as the same I/O standard is used. In source synchronous mode, the clock to data phase relationship at the input pins is maintained at the clock and data ports of the IOE input register. This mode is recommended for source synchronous data transfers. Data and clock signals at the IOE experience similar buffer delays as long as the same I/O standard is used. Multidrop Bus System Multidrop Bus System In multidrop bus systems when one device transmits, all others within range can receive In multidrop bus systems when one device transmits, all others within range can receive

Summary Pipeline Pipeline IA 64 has 10 stages IA 64 has 10 stages Microprogramming Microprogramming EPIC EPIC Interconnection Structure Interconnection Structure IA 64 uses source-synchronous data transfer IA 64 uses source-synchronous data transfer The processor uses a multidrop, shared system bus The processor uses a multidrop, shared system bus

Sources