Download presentation
Presentation is loading. Please wait.
Published byDonavan Baxley Modified over 9 years ago
1
Xiaomi An, Jiqiang Song, Wendong Wang SimpLight Nanoelectronics Ltd 2008/03/24 Temporal Distribution Based Software Cache Partition To Reduce I-Cache Misses
2
SimpLight Confidential Patent pending2 outline Traditional code layout optimizations Code layout optimizations in Open64 compiler Temporal distribution based software cache partition to reduce I-Cache misses Future work
3
SimpLight Confidential Patent pending3 Traditional code layout optimizations Code layout is a kind of optimization to change the code organization in memory. Main benefits of code layout: Improve branch prediction by placement of basic blocks Reduce I-cache misses by changing code’s mapping onto cache (mainly compulsory misses and conflict misses) Fit code into complex memory hierarchy (e.g. scratch-pad memory and cache)
4
SimpLight Confidential Patent pending4 Traditional code layout optimizations Representation of temporal relationship: control flow graph with edge frequency weighted call graph temporal relation graph Consideration of cache architecture: Linearize code, do not consider cache architecture (Pettis and Hansen) Distribute temporal interleaved code onto different cache lines (Hashemi, Gloy, etc)
5
SimpLight Confidential Patent pending5 Code layout optimizations in Open64 compiler Profile based basic block reordering and procedure-splitting in CG Based on control flow graph with edge frequency Pettis and Hansen based algorithm Procedure reordering in IPA Based on weighted call graph with call-edge frequency Kind of Pettis and Hansen based algorithm
6
SimpLight Confidential Patent pending6 Software cache partition What is Software cache partition? Through code layout optimization, different code blocks are mapped to different regions of the I-cache. Benefits of software cache partition Reduce cache misses Remove interference of multi-programs and avoid additional hardware support (embedded systems) Soft implementation of scratch pad memory on top of I-cache
7
SimpLight Confidential Patent pending7 Benefits of software cache partition (1) Remove interference of multi-programs and avoid additional hardware support Video app Audio app I-cache is partitioned according to the performance demand and code locality of the video application and the audio application.
8
SimpLight Confidential Patent pending8 Benefits of software cache partition (2) Soft implementation of scratch pad memory on top of I-cache Other code Code with real time requirement I-cache is partitioned to guarantee code with real time requirement will not be replaced after they are brought into the cache.
9
SimpLight Confidential Patent pending9 Benefits of software cache partition (3) Reduce I-cache misses Runtime trace of code blocks: ABCDEF(UV) 5 ABCDEF(PQ) 5 ABCDEF(XY) 5 ABCDEF A B C D E/U/P/X F/V/Q/Y A/E B/F C D U/P/X V/Q/Y Layout 1: 24 misses Layout 2: 18 misses
10
SimpLight Confidential Patent pending10 Temporal distribution based layout of code blocks in the partitioned cache Selection of good candidates holding cache lines exclusively Hot, Dense and Temporal Distribution Hot, dense and good regularity Hot and good locality Cold Hot and good locality Cold Mapping into I-cache: Share cache lines
11
SimpLight Confidential Patent pending11 Temporal distribution Temporal locality and temporal regularity Trace: ABCDEF(UV) 5 ABCDEF(PQ) 5 ABCDEF(XY) 5 ABCDEF A,B,C,D,E,F have good temporal regularity since they have uniform distribution along the trace. U,V,P,Q,X,Y have good temporal locality since they exhibit a large skew in the reference distribution. UVUV ABCDABCD PQPQ XYXY EFEF Our mapping: Totally 18 misses Share cache lines
12
SimpLight Confidential Patent pending12 Qualification of temporal distribution Variance of reuse distance Weighted temporal distribution Temporal distribution
13
SimpLight Confidential Patent pending13 Iterative partition and layout Func Partition (RB, IRB) Sort nodes in RB by instruction density // highest //instruction density first RB_SIZE = Calc_rb_size(RB) IRB_SIZE = Calc_irb_size(IRB) While(RB_SIZE+IRB_SIZE>CACHE_SIZE) { Adjust(RB, IRB) RB_SIZE = Calc_rb_size(RB) IRB_SIZE = Calc_irb_size(IRB) }
14
SimpLight Confidential Patent pending14 Experiments and results (1) Cumulative effect of optimizations on I-cache miss reduction 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% H264 encH264 decAVSM decMPEG4 decG729.A BB reorder BB reorder + layout BB reorder + pu split + layout
15
SimpLight Confidential Patent pending15 Experiments and results (2) Reduction of I-cache misses by TD, PH and TRG. 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% H264 encH264 decAVSM decMPEG4 decG729.A TD PH TRG
16
SimpLight Confidential Patent pending16 Experiments and results (3) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% HE:stenfanHE:akiyoHE:footballHD:stenfanHD:akiyoHD:football TD PH TRG H264 codec I-cache miss reduction by TD, PH and TRG with various inputs
17
SimpLight Confidential Patent pending17 Future work Improve current iterative partition algorithm Incorporate more cache configurations into the layout algorithm, e.g. cache line size, L2 cache … Develop effective software cache partition method for multi-thread programs on our memory hierarchy
18
SimpLight Confidential Patent pending18 Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.