Download presentation
Presentation is loading. Please wait.
Published byDerek Alexander Modified over 9 years ago
1
Page 1 Trace Caches Michele Co CS 451
2
Page 2 Motivation High performance superscalar processors High instruction throughput Exploit ILP –Wider dispatch and issue paths Execution units designed for high parallelism –Many functional units –Large issue buffers –Many physical registers Fetch bandwidth becomes performance bottleneck
3
Page 3 Fetch Performance Limiters Cache hit rate Branch prediction accuracy Branch throughput Need to predict more than one branch per cycle Non-contiguous instruction alignment Fetch unit latency
4
Page 4 Problems with Traditional Instruction Cache Contain instructions in compiled order Works well for sequential code with little branching, or code with large basic blocks
5
Page 5 Suggested Solutions Multiple branch target address prediction Branch address cache (1993, Yeh, Marr, Patt) –Provides quick access to multiple target addresses –Disadvantages Complex alignment network, additional latency
6
Page 6 Suggested Solutions (cont’d) Collapsing buffer Multiple accesses to btb (1995, Conte, Mills, Menezes, Patel) –Allows fetching non- adjacent cache lines –Disadvantages Bank conflicts Poor scalability for interblock branches Significant logic added before and after instruction cache Fill unit Caches RISC-like instructions derived from CISC instruction stream (1988, Melvin, Shebanow, Patt)
7
Page 7 Problems with Prior Approaches Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can begin Extra stages, additional latency Complex alignment network necessary Multiple simultaneous access to instruction cache Multiporting is expensive Sequencing Additional stages, additional latency
8
Page 8 Potential Solution – Trace Cache Rotenberg, Bennett, Smith (1996) Advantages Caches dynamic instruction sequences –Fetches past multiple branches No additional fetch unit latency Disadvantages Redundant instruction storage –Between trace cache and instruction cache –Within trace cache
9
Page 9 Trace Cache Details Trace Sequence of instructions potentially containing branches and their targets Terminate on branches with indeterminate number of targets –Returns, indirect jumps, traps Trace identifier Start address + branch outcomes Trace cache line Valid bit Tag Branch flags Branch mask Trace fall-through address Trace target address
10
Page 10
11
Page 11 Next Trace Prediction (NTP) History register Correlating table Complex history indexing Secondary Table Indexed by most recently committed trace ID Index generating function
12
Page 12 NTP Index Generation
13
Page 13 Return History Stack
14
Page 14 Trace Cache vs. Existing Techniques
15
Page 15 Trace Cache Optimizations Performance Partial matching [Friendly, Patel, Patt (1997)] Inactive issue [Friendly, Patel, Patt (1997)] Trace preconstruction [Jacobson, Smith (2000)] Power Sequential access trace cache [Hu, et al., (2002)] Dynamic direction prediction based trace cache [Hu, et al., (2003)] Micro-operation cache [Solomon, et al., 2003]
16
Page 16 Trace Processors Trace Processor Architecture Processing elements (PE) –Trace-sized instruction buffer –Multiple dedicated functional units –Local register file –Copy of global register file Use hierarchy to distribute execution resources Addresses superscalar processor issues Complexity –Simplified multiple branch prediction (next trace prediction) –Elimination of local dependence checking (local register file) –Decentralized instruction issue and result bypass logic Architectural limitations –Reduced bandwidth pressure on global register file (local register files)
17
Page 17 Trace Processor
18
Page 18 Trace Cache Variations Block-based trace cache (BBTC) Black, Rychlik, Shen (1999) Less storage capacity needed
19
Page 19 Trace Table: BBTC Trace Prediction
20
Page 20 Block Cache
21
Page 21 Rename Table
22
Page 22 BBTC Optimization Completion time multiple branch prediction (Rakvic, et al., 2000) Improvement over trace table predictions
23
Page 23 Tree-based Multiple Branch Prediction
24
Page 24 Tree-PHT
25
Page 25 Tree-PHT Update
26
Page 26 Trace Cache Variations (cont’d) Software trace cache Ramirez, Larriba-Pey, Navarro, Torrellas (1999) Profile-directed code reordering to maximize sequentiality –Convert taken branches to not-taken –Move unused basic blocks out of execution path –Inline frequent basic blocks –Map most popular traces to reserved area of i-cache
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.