Reducing Cache Traffic and Energy with Macro Data Load Lei Jin and Sangyeun Cho* Dept. of Computer Science University of Pittsburgh
Motivation Data cache access is a frequent event 20~40% of all instructions access data cache Data cache energy can be significant (~16% in StrongARM chip [Montanaro et al. 1997]) Reducing cache traffic leads to energy savings Existing thoughts Store-to-load forwarding Load-to-load forwarding Use available resources to keep data for reuse LSQ [Nicolaescu et al. 2003] Reorder buffer [Önder and Gupta 2001]
Macro Data Load (ML) Previous works are limited by exact data matching Same address and same data type Exploit spatial locality in cache-port-wide data Accessing port-wide data is free Naturally fits datapath and LSQ width Recent processors support 64 bits Many accesses are less than 64 bits w/o ML w/ ML
ML Potential ML uncovers more opportunities CINT2k CFP2k MiBench ML uncovers more opportunities ML especially effective with limited resource
ML Implementation Architectural changes Net impact Relocated data alignment logic Sequential LSQ-cache access Net impact LSQ becomes a small fully associative cache with FIFO replacement
Result: Energy Reduction CINT CFP MiBench Up to 35% (MiBench) energy reduction! More effective than previous techniques