NYU DARPA DIS kick-off September 24, 1999 1 Comparing IA-64 and HPL-PD NYU.

Slides:



Advertisements
Similar presentations
Intro to the “c6x” VLIW processor
Advertisements

Chapter 4 Predication CSE 820. Michigan State University Computer Science and Engineering Go over midterm exam.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Computer Organization and Architecture
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.
Chapter 21 IA-64 Architecture (Think Intel Itanium)
IA-64 Architecture (Think Intel Itanium) also known as (EPIC – Extremely Parallel Instruction Computing) a new kind of superscalar computer HW 5 - Due.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction.
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
 Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
IA-64 Architecture RISC designed to cooperate with the compiler in order to achieve as much ILP as possible 128 GPRs, 128 FPRs 64 predicate registers of.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα.
Principles of Linear Pipelining
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Introduction to MMX, XMM, SSE and SSE2 Technology
Chapter One Introduction to Pipelined Processors
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
The Alpha Thomas Daniels Other Dude Matt Ziegler.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.
IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Basics Of X86 Architecture
Henk Corporaal TUEindhoven 2009
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Pipelining: Advanced ILP
The EPIC-VLIW Approach
Lecture 6: Static ILP, Branch prediction
Yingmin Li Ting Yan Qi Zhao
INTRODUCTION TO HPL-PD
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Multivector and SIMD Computers
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Henk Corporaal TUEindhoven 2011
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Control unit extension for data hazards
Dynamic Hardware Prediction
How to improve (decrease) CPI
Course Outline for Computer Architecture
Control unit extension for data hazards
IA-64 Vincent D. Capaccio.
Presentation transcript:

NYU DARPA DIS kick-off September 24, Comparing IA-64 and HPL-PD NYU

DARPA DIS kick-off September 24, Overview  IA-64 has a number of novel features for supporting ILP: –Predication –Data Speculation –Control Speculation –Software Pipelining –Compiler-directed Caching  These features all exist in HPL-PD! –also great similarity in ISA (arithmetic, logic operations, etc). –there are few extensions –Multimedia Instructions –Semaphore Instructions

NYU DARPA DIS kick-off September 24, Predication Support  IA-64 and Trimaran both support conditional executions of instructions through predicate registers, and instructions to manipulate them.  Both support “parallel” compare operations –I.e. assigning to two predicate registers simultaneously –through a modifier in HPL-PD –through a completer in IA-64 –wired-and, wired-or

NYU DARPA DIS kick-off September 24, Control Speculation  Control Speculation is supported in both IA-64 and HPL-PD with the same semantics  IA-64 –GPR includes 1 bit speculation tag (NAT bit) –FPR uses a special encoding called NATVal –No extra bit needed –Only LOAD instruction has control speculative version –Need verification instruction for exception handling  HPL-PD –Both GPR and FPR have speculation tag –Extra bit like NAT in IA-64 –All integer instruction and float point instruction have control speculative versions –Exception is automatically tracked by the hardware

NYU DARPA DIS kick-off September 24, Control Speculation IA-64 Example

NYU DARPA DIS kick-off September 24, Control Speculation HPL-PD Example

NYU DARPA DIS kick-off September 24, Data Speculation  Data speculation is supported in both IA-64 and HPL-PD in a similar manner. –I.e. moving a load above a store that may write to the same address.  IA-64 –Supports load checking (ld.s) as well as checking with recovery –The compiler can move up not only the definitions, but also one or more of its uses (check.a)  HPL-PD –Also supports recovery in load checking (BRDV)

NYU DARPA DIS kick-off September 24, Data Speculation Examples IA-64 HPL-PD

NYU DARPA DIS kick-off September 24, Data Speculation Recovery Examples (IA-64)

NYU DARPA DIS kick-off September 24, Data Speculation Recovery Examples (HPL-PD)

NYU DARPA DIS kick-off September 24, Compiler Directed Cache  The memory hierarchy is visible to the compiler in both HPL-PD and IA-64  IA-64 –The compiler can supply hints in store, load, and prefetch instructions on where in the cache hierarchy the data will be found or left. –For prefetching, the “lfetch” instructions requests that cache lines be moved between different levels of the memory hierarchy. –lfetch maintains cache coherence  HPL-PD –The compiler can also supply hints in store, and load instructions –Prefetching is simply a load to R0

NYU DARPA DIS kick-off September 24, Compiler Directed Cache IA-64

NYU DARPA DIS kick-off September 24, Compiler Directed Cache HPL-PD

NYU DARPA DIS kick-off September 24, Support for Software Pipelining  Both IA-64 and Trimaran implement rotating registers, loop counters, and epilogue counters in combination with predication. –Used to implement modulo scheduling of loops.

NYU DARPA DIS kick-off September 24, Software Pipelining Example HPL-PD Example of software pipelining in Trimaran “A slice executed as a single VLIW instruction.” Taken from the Trimaran Tutorial

NYU DARPA DIS kick-off September 24, Software pipelining on the IA-64 C source for (i=0; i<n; i++) y[i] = x[i] + 1 loop (p14) ld1 r32 = [r12],1 (p15) add r34 = 1, r33 (p16) st1 [r13] = r35,1 br.ctop loop Taken from the Intel web tutorial Software Pipelining IA-64

NYU DARPA DIS kick-off September 24, Differences  Multimedia Instruction  Semaphore Instruction  Register Stack Engine

NYU DARPA DIS kick-off September 24, Register Stack Engine  IA-64 implements a mechanism called a register stack engine (RSE) that manages the dynamic allocation of stack frames using registers gpr32- gpr127.  The operations of the RSE are transparent to the software.  It ensures that contents of registers are always available.

NYU DARPA DIS kick-off September 24, Multimedia Instruction  IA-64 has multimedia instructions that treat the GPRs as concatenation of eight 8-bit, four 16-bit or two 32-bits and operate on each element independently and in parallel. –Inspired by MMX  The instructions include –parallel addition and subtraction –parallel average –parallel shift left and add –parallel compare –parallel multiply right

NYU DARPA DIS kick-off September 24, Semaphore Instruction  IA-64 has semaphore instructions that –atomically load a general register from memory, – perform an operation and –then store a result to the same memory location.  The instructions include –exchange –compare and exchange –fetch and add