IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004.

Slides:



Advertisements
Similar presentations
® IA-64 Architecture Innovations John Crawford Architect & Intel Fellow Intel Corporation Jerry Huck Manager & Lead Architect Hewlett Packard Co.
Advertisements

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
1 Lecture 6: Static ILP Topics: loop analysis, SW pipelining, predication, speculation (Section 2.2, Appendix G) Assignment 2 posted; due in a week.
Chapter 21 IA-64 Architecture (Think Intel Itanium)
IA-64 Architecture (Think Intel Itanium) also known as (EPIC – Extremely Parallel Instruction Computing) a new kind of superscalar computer HW 5 - Due.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.
® Compiling for the Intel® Itanium™ Architecture Steve Skedzielewski Intel Corporation Compiler Tricks.
 Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
M. Mateen Yaqoob The University of Lahore Spring 2014.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
IA-64 Architecture RISC designed to cooperate with the compiler in order to achieve as much ILP as possible 128 GPRs, 128 FPRs 64 predicate registers of.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.
StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα.
Pipelining and Parallelism Mark Staveley
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
VLIW CSE 471 Autumn 021 A (naïve) Primer on VLIW – EPIC with slides borrowed/edited from an Intel-HP presentation VLIW direct descendant of horizontal.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
CS 352H: Computer Systems Architecture
Computer Architecture Principles Dr. Mike Frank
Visit for more Learning Resources
VLIW Architecture FK Boachie..
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Simultaneous Multithreading
5.2 Eleven Advanced Optimizations of Cache Performance
Henk Corporaal TUEindhoven 2009
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
The EPIC-VLIW Approach
Lecture 6: Static ILP, Branch prediction
Yingmin Li Ting Yan Qi Zhao
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Henk Corporaal TUEindhoven 2011
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Sampoorani, Sivakumar and Joshua
Instruction Level Parallelism (ILP)
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
VLIW direct descendant of horizontal microprogramming
CSC3050 – Computer Architecture
How to improve (decrease) CPI
Presentation transcript:

IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004

IA-64 Architecture  Agenda Architecture Basics Predication Speculation Software Pipelining

Architectural Basics  A full 64-bit address space  Large directly accessible register files  Enough instruction bits to communicate information from the compiler to the hardware  The ability to express arbitrariliy large amounts of ILP

Register Resources  bit general registers  bit floating-point registers  Space for up to bit special-purpose application registers  8 64-bit branch registers for function call linkage and return  64 one-bit predicate registers that hold the result of conditional expression evaluation

Register Resources

Instruction Encoding  14 bits for opcode  7 bits for registers  5 bits for template to help decode and route instruction and indicate the location of stops that mark end of groups of instructions that can execute in parallel.

Instruction Encoding

IA-64 virtual memory Model  Can map 16 million Gbytes of virtual space.  Bits of a virtual address index into eigth region registers that contain 24-bit region identifiers(RIDs)  The 24-bit RID is concatenated with the virtual page number(VPN) to form a unique lookup into the Translation look-aside buffer(TLB)

IA-64 virtual memory Model  The TLB lookup generates two main items:the physical page number and access privileges.

IA-64 virtual memory Model

Predication  Removes branches, converts to predicated execution Executes multiple paths simultaneously  Increases performance by exposing parallelism and reducing critical path Better utilization of wider machines Reduces mispredicted branches

Predication

Predication Benefits  Reduces branches and mispredict penalties 50% fewer branches and 37% faster code*  Parallel compares further reduce critical paths  Greatly improves code with hard to predict branches Large server apps- capacity limited Sorting, data mining- large database apps Data compression  Traditional architectures’ “bolt-on” approach can’t efficiently approximate predication Cmove: 39% more instructions, 23% slower performance* Instructions must all be speculative

Data Speculation  Compiler can issue a load prior to a preceding, possibly-conflicting store

Architectural Support for Data Speculation  Instructions ld.a - advanced loads ld.c - check loads chk.a - advance load checks  Speculative Advanced loads - ld.sa - is an advanced load with deferral  ALAT - HW structure containing outstanding advanced loads

Speculation Benefits  Reduces impact of memory latency Study demonstrates performance improvement of 79% when combined with predication*  Greatest improvement to code with many cache accesses Large databases Operating systems  Scheduling flexibility enables new levels of performance headroom

Speculation Drawbacks  Drawbacks even if speculation is correct Registers used speculatively must be kept alive until the check (increases register pressure) For each speculation, recovery code is needed, which increases code size  Drawbacks if speculation is incorrect Recovery code has to be executed; additional cycles Recovery code may not be in cache; loading delay

Software Pipelining  Overlapping execution of different loop iterations  More iterations in same amount of time

Software Pipelining  IA-64 features that make this possible Full Predication Special branch handling features Register rotation: removes loop copy overhead Predicate rotation: removes prologue & epilogue

Software Pipelining Benefits  Loop pipelining maximizes performance; minimizes overhead Avoids code expansion of unrolling and code explosion of prologue and epilogue Smaller code means fewer cache misses Greater performance improvements in higher latency conditions  Reduced overhead allows S/W pipelining of small loops with unknown trip counts Typical of integer scalar codes

Performance  Backwardly compatible through emulation with previous instruction sets (RISC – IA32), although performs badly  IA64 code (EPIC instruction set) will run on any member of the Itanium family  To get optimum performance, code must be recompiled with processor-specific information (different numbers of functional units/pipeline changes)  Itanium 2 is two times faster than Itanium

Performance

Target Market  High end servers  Database machines  Development shops  NOT suitable for home PCs

Questions ?