Download presentation
Presentation is loading. Please wait.
Published byArlene Lawson Modified over 8 years ago
1
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE 511 16/12/2004
2
IA-64 Architecture Agenda Architecture Basics Predication Speculation Software Pipelining
3
Architectural Basics A full 64-bit address space Large directly accessible register files Enough instruction bits to communicate information from the compiler to the hardware The ability to express arbitrariliy large amounts of ILP
4
Register Resources 128 65-bit general registers 128 82-bit floating-point registers Space for up to 128 64-bit special-purpose application registers 8 64-bit branch registers for function call linkage and return 64 one-bit predicate registers that hold the result of conditional expression evaluation
5
Register Resources
6
Instruction Encoding 14 bits for opcode 7 bits for registers 5 bits for template to help decode and route instruction and indicate the location of stops that mark end of groups of instructions that can execute in parallel.
7
Instruction Encoding
8
IA-64 virtual memory Model Can map 16 million Gbytes of virtual space. Bits 63-61 of a virtual address index into eigth region registers that contain 24-bit region identifiers(RIDs) The 24-bit RID is concatenated with the virtual page number(VPN) to form a unique lookup into the Translation look-aside buffer(TLB)
9
IA-64 virtual memory Model The TLB lookup generates two main items:the physical page number and access privileges.
10
IA-64 virtual memory Model
11
Predication Removes branches, converts to predicated execution Executes multiple paths simultaneously Increases performance by exposing parallelism and reducing critical path Better utilization of wider machines Reduces mispredicted branches
12
Predication
13
Predication Benefits Reduces branches and mispredict penalties 50% fewer branches and 37% faster code* Parallel compares further reduce critical paths Greatly improves code with hard to predict branches Large server apps- capacity limited Sorting, data mining- large database apps Data compression Traditional architectures’ “bolt-on” approach can’t efficiently approximate predication Cmove: 39% more instructions, 23% slower performance* Instructions must all be speculative
14
Data Speculation Compiler can issue a load prior to a preceding, possibly-conflicting store
15
Architectural Support for Data Speculation Instructions ld.a - advanced loads ld.c - check loads chk.a - advance load checks Speculative Advanced loads - ld.sa - is an advanced load with deferral ALAT - HW structure containing outstanding advanced loads
16
Speculation Benefits Reduces impact of memory latency Study demonstrates performance improvement of 79% when combined with predication* Greatest improvement to code with many cache accesses Large databases Operating systems Scheduling flexibility enables new levels of performance headroom
17
Speculation Drawbacks Drawbacks even if speculation is correct Registers used speculatively must be kept alive until the check (increases register pressure) For each speculation, recovery code is needed, which increases code size Drawbacks if speculation is incorrect Recovery code has to be executed; additional cycles Recovery code may not be in cache; loading delay
18
Software Pipelining Overlapping execution of different loop iterations More iterations in same amount of time
19
Software Pipelining IA-64 features that make this possible Full Predication Special branch handling features Register rotation: removes loop copy overhead Predicate rotation: removes prologue & epilogue
20
Software Pipelining Benefits Loop pipelining maximizes performance; minimizes overhead Avoids code expansion of unrolling and code explosion of prologue and epilogue Smaller code means fewer cache misses Greater performance improvements in higher latency conditions Reduced overhead allows S/W pipelining of small loops with unknown trip counts Typical of integer scalar codes
21
Performance Backwardly compatible through emulation with previous instruction sets (RISC – IA32), although performs badly IA64 code (EPIC instruction set) will run on any member of the Itanium family To get optimum performance, code must be recompiled with processor-specific information (different numbers of functional units/pipeline changes) Itanium 2 is two times faster than Itanium
22
Performance
23
Target Market High end servers Database machines Development shops NOT suitable for home PCs
24
Questions ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.