Download presentation
Presentation is loading. Please wait.
Published byMorris Scott Modified over 9 years ago
1
Microprocessor system architectures – IA64 Jakub Yaghob
2
Application architecture
3
Application architecture features – I Instruction set Architecture Load-Execute-Store architecture, no stack, no division Explicit parallelism Massive resources (128 integer and FP registers, 64 predicate registers, 8 branch registers) Enhancements Speculation, predication, software pipelining, branch prediction, multimedia instructions Instruction level parallelism Independent instructions in bundles Multiple bundles per clock
4
Application architecture features – II Explicit parallelism Instruction group Defined by a compiler Parallel execution of instructions Strict requirements on dependencies Forbidden register RAW, WAW dependencies Memory model Relatively weak Only restriction is RAW, WAW, WAR dependencies on one memory location Explicit memory access synchronization
5
Speculation Early memory load Control speculation Advancing load in a condition Sometimes load executed “uselessly”, when the condition is not met Data speculation Advancing load before a store with aliases Checking using ALAT Speculation check No speculative load, if it would cause an exception Data speculation is invalid, if there is a write to the memory location
6
Prediction Predicate registers 64 1-bit predicate registers PR0-PR63 PR0 hardwired to 1, write is ignored No specialized arithmetic/logic flags Set by compare instructions Pair of PR (one for the comparison, one for complementary comparison) Modes of setting (some of them breach WAW inside of an instruction group) Nearly all instructions are conditioned by a PR
7
Register stack Support for function calls GR0-GR31 are global registers GR32-GR127 create a register stack Each procedure has a register frame 2 variable sized areas: local and output Register renaming using alloc instruction First output register becomes GR32 If register stack overflows, then CPU will free some registers by saving them into the memory
8
Privilege levels and serialization Privilege levels Like IA-32, levels 0-3 System instructions and registers accessible only with CPL=0 Serialization Data dependency All application and system resources excluding control registers Values written to a register are observed by instructions in subsequent instruction groups Instruction serialization Modifications are observed before subsequent instruction group fetches are re-initiated Data serialization Modifications affecting both execution and data memory access are observed In-flight Non-serialized resources have “some” value for reads
9
System registers
10
Processor Status Register (PSR) Current execution environment Divided into four overlapped sections Special instructions
11
Control registers 128 control registers Large number of reserved, only 26 used Groups Global control registers CR0 (DCR=Default Control Register) CR2 (IVA=Interruption Vector Address) CR8 (PTA=Page Table Address) Global interrupt control registers Control of an active interrupt Writes are not serialized
12
Banked general registers Fast switching of GR16-GR31 for interrupt handlers Current bank in PSR.bn Bank switching Interrupt selects bank 0 rfi sets the bank from IPSR.bn bsw switches to the specified bank Including NaT
13
Virtual memory model Virtual regions Supports OS with Multiple Address Spaces Protection domain mechanism Supports OS with Single Address Space TLB Algorithms for paging deferred to OS VHPT (Virtual Hash Page Table) Augmenting TLB performance Inverted page tables Other mechanisms Various page sizes, fixed translations, …
14
Address translation
15
TLB Separated for code and data Data TLB translates accesses to VHPT or RSE Each TLB divided into two parts Translation registers (TR) Fully associative array OS can explicitly set the translation No automatic replacement Translation cache (TC) Entries can be inserted by an instruction Automatic replacement (from VHPT)
16
Access rights on pages Defined by TLB.ar and TLB.pl Using TLB.ar Read only Read, execute Read, write Read, write, execute Read only/read, write Read, execute/read, write, execute Read, write, execute/read, write Exec, promote/read, execute
17
Virtual addressing – other – I Page sizes 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 4G Region registers (RR) Highest 3 bits of VA create an index into RR rid – region identification ps – preferred page size ve – VHPT enabling
18
Virtual addressing – other – II Protection keys At least 16 keys A key in TLB entry is compared with protection keys; exception „key miss fault“
19
VHPT – I
20
VHPT – II Vlastnosti CPU do VHPT nic nezapisuje CPU neudržuje koherenci TLB a VHPT Dva formáty Krátký – pro každou oblast, položka 8B Dlouhý – jedna velká pro systém, položka 32B Různé velikosti mocniny 2 Prohledáváno, pokud selže TLB Pokud nalezeno ve VHPT, automaticky vloženo do TC Pevné hashovací funkce
21
Physical addressing and memory attributes Only 63 bits Current architecture and implementation only 50 bits Memory attributes Virtual – like IA-32 (WB, WC, …) Physical – using bit 63 of FA 0 – WB, speculative 1 – UC, nonspeculative Nontrivial rules for memory ordering
22
Interrupts – I Kinds depending on handlers IVA Handled by OS, a vector defined by CR2 PAL Handled by PAL or by system firmware, ev. by OS Kinds depending on behavior Abort Interrupt External, asynchronous Fault Trap Interrupts are disabled during interrupt handling
23
Interrupts – II Currently defined 81 exceptions 5 for „hard“ exceptions RESET, INIT, INT, MCA, PMI 23 for IA-32 emulation IVA-interrupts Vectors have fixed address Exception groups on one vector External interrupts 256 vectors Priority division using vector number Current vector CR65 (IVR=Interrupt Vector Register) Current priority in CR66 (TPR=Task Priority Register)
24
RSE – 1 Register Stack Engine (RSE) Transfers registers stack from/to memory Without software intervention in the background Different activity modes (lazy-store intensive-load intensive-eager) Physical register stack must have size at least 96 registers More in multiplies of 16
25
RSE – II
26
Firmware Processor Abstraction Layer (PAL) Unified interface to the CPU firmware System abstraction layer (SAL) Separates OS from implementation variation of platforms Extensible firmware interface (EFI) OS booting Each FW layer (including OS) has defined an entry point PAL and SAL placed in 16M memory exactly below 4G Fixed structure
27
Model firmware
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.