Intel Itanium 2 Processor Intel’s Server Solution Raymond Ball April 2, 2004
Presentation Overview Why Intel Itanium 2 in a DSP class? General specifications and features Instruction set DSP in Itanium 2 Itanium 2 vs. TigerSHARC (?)
Why Itanium 2 Itanium 2 designed for heavy loaded and number crunching servers which has some similarities to DSP It’s always a good idea to see what other solutions are available Designs tend to over time borrow ideas from other fields which may give insight To see if the power in the processor is really worth the cost Because I was interested
Specifications (April 2004) Clock GHz L3 cache up to 6MB 64 bit 128 bit bus (400 MHz) Price: $3k - $5k ea. IA-32 “compatible” Considered RISC Pipeline 8 deep 6 instructions / cycle in 2 bundles of 3 Power consumption: 110W (130W max) registers
Register Stack Engine (RSE) First 32 registers are global (static) GR0 is hardwire as 0 GR0 is hardwire as 0 Seen this in SHARC because immediate will kill the pipeline Seen this in SHARC because immediate will kill the pipeline GR32 – GR63 local procedure registers The remaining 96 registers are used to store stacked register frames If more room is needed, the registers are pushed onto memory Transparently maintains the illusion of an infinite number of registers Only for the GRs (other registers are all global)
Instruction set Instructions come in bundles of 3 operations and 2 bundles are pulled in once a cycle Uses a special Explicitly Parallel Instruction Computing (EPIC) format The format moves the responsibility of resource management on to the compiler Template value dictates to which execution unit an operation will be performed Slot 2 Slot 1 Slot 0 Template Bit 0 Bit 127 Bit 5Bit 46Bit 87
Bundled Code Example {.mii add r1 = r2, r3 sub r4 = r5, r6 ;; shr r7 = r8, r9 } {.mfi ld4r14=[r56] fadd f10=f12,f13 add r16=r18,r19 } {.mmi st4 [r16]=r67 ;; add r24=r56,r57 add r28=r58,r59 } Cycle 0 – Start of a Memory-Integer-Integer bundle Cycle 1 – Part of the last bundle plus another Memory-Float-Integer bundle done in this cycle Cycle 2 – A single operation Cycle 3 – last two operations in the snippet
Save me compiler! Instruction set and pipeline so difficult to handle you won’t do much better than the compiler With the EPIC architecture, more resource management is put on the compiler, which means extra work for human compilers The most efficient DSP algorithms tend to come from human compilers Difficult to utilize all of the system resources like a hand made DSP algorithm Difficult to utilize all of the system resources like a hand made DSP algorithm What’s wrong with r1 = r2 + r3?
DSP Relation How does the instruction set compare to a DSP processor? RISC type instruction set RISC type instruction set For example, no mem-to-mem move For example, no mem-to-mem move Itanium 2 could easily be used to efficiently do a DSP algorithm The Itanium 2 basically includes every trick in the book thus far, which includes borrowing ideas from DSP
Pro-DSP Many single cycle instructions Instructions are designed for a heavily pipelined environment Processor has ways of accessing the data in a SIMD fashion (8x8-bit, 4x16-bit, 2x32-bit, 1x64- bit) High precision registers (82-bit floating-point accumulator) People wonder whether 64-bit processing is necessary, well THIS is where it’s necessary People wonder whether 64-bit processing is necessary, well THIS is where it’s necessary High number of registers for fast access
Anti-DSP No hardware loops No hardware circular buffers Only a single bus (although fast 6.4GB/s) High power usage
TigerSHARC vs. Itanium 2 COST! ($0.3k vs. $3k) Both heavily pipelined Both very hard to code by hand There really is no comparison Processors were made for two different intensions Processors were made for two different intensions The framework that is typically built around the chips makes it even harder to compare The framework that is typically built around the chips makes it even harder to compare
Conclusion You get what you pay for… or maybe a little less The Itanium 2 is consider to be a high-end server processor The Itanium 2 is consider to be a high-end server processor Anything high-end tends to be very over priced (rack mount equipment) Anything high-end tends to be very over priced (rack mount equipment) Sure, it’s a DSP processor but for that price it should make you toast in the morning too
References Intel Itanium 2 Processor Hardware Developer’s Manual Intel Itanium 2 Processor Reference Manual A 1.5-GHz 130-nm Itanium 2 Processor With 6MB On-die L3 Cache. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER Stefan Rusu, Senior Member, IEEE, Jason Stinson, Simon Tam, Member, IEEE, Justin Leung, Harry Muljono, and Brian Cherkauer.