“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz
NAHALAL ARCHTECTURE NAHALAL architecture defines the memory cache banks of the L2 cache. Each processor has a private backyard bank and all processors shared a small bank. The architecture is based on the hot shared line phenomenon.
LSU Improvement Placement Policy Replacement Policy from Private Bank : LRU Replacement Policy from Public Bank : NAHALAL LRU X LSU LSU policy wisely select the Least Shared Used line to throw from the public bank.
LSU Implementation Shift-register with N cells for each Line. Each cell in the shift-register hold CPU num In throwing by CPUi : For each shift-register do XOR between each cell and the ID of CPUi. The shift-register on which the XOR produce 0, will be the chosen one. If non produce 0 then do regular LRU. In order ro reduce memory overhead, define N=4. Therefore 2 *4*3 = MB 18.75% memory overhead. 14 Simple, short time algorithm in HW
Simulation Structure in Simics Using pyhton script we defined :
Writing Benchmarks Writing Benchmarks is done in the simulated target console :
Writing Benchmarks Using Threads with pthread library Each Thread is associated to a CPU using sched library. Parallel code is written in the benchmark Also OS code and pthread code cause to Parallel code. Each benchmark we run first without LSU and second with LSU.
Collecting Statistics Cache statistics: l2c Total number of transactions: Total memory stall time: Total memory hit stall time: Device data reads (DMA): 0 Device data writes (DMA): 0 Uncacheable data reads: 17 Uncacheable data writes: Uncacheable instruction fetches: 0 Data read transactions: Total read stall time: Total read hit stall time: Data read remote hits: 0 Data read misses: Data read hit ratio: 97.43% Instruction fetch transactions: 0 Instruction fetch misses: 0 Data write transactions: Total write stall time: Total write hit stall time: Data write remote hits: 0 Data write misses: 0 Data write hit ratio: % Copy back transactions: 0 Number of replacments in the middle (NAHALAL): 557
Results 1. Improvement of 54% in average stall time per transaction. 2. Improvement of 61% in average stall time per transaction % from the transactions cause a replacement in the middle without LSU, and with LSU only 0.09% ! Improvement of ∆=8.28% % from the transactions cause a replacement in the middle without LSU, and with LSU only 0.02% ! Improvement of ∆=8.73%
Conclusions LSU policy significantly improve average stall time per transaction, Therefore : LSU Policy implemented in NAHALAL architecture significantly reduce number of cycles for a benchmark. LSU policy significantly reduce number of replacements in the middle, Therefore : LSU Policy implemented in NAHALAL architecture, better keep the hot shared lines in the public bank. According to our implementation, LRU is activated if LSU did not find a line, Therefore : LSU Policy as we implemented is always preferable then LRU.