Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.

Similar presentations


Presentation on theme: "Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO."— Presentation transcript:

1 Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

2 Opportunities for Adaptivity Cache organization Cache performance “assist” mechanisms Hierarchy organization Memory organization (DRAM, etc) Data layout and address mapping Virtual Memory Compiler assist

3 Opportunities - Cont’d Cache organization: adapt what? –Size: NO –Associativity: NO –Line size: MAYBE, –Write policy: YES (fetch,allocate,w-back/thru) –Mapping function: MAYBE

4 Opportunities - Cont’d Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels. Adapt what? –Which mechanism(s) to use –Mechanism “parameters”

5 Opportunities - Cont’d Hierarchy Organization: –Where are cache assist mechanisms applied? Between L1 and L2 Between L1 and Memory Between L2 and Memory –What are the data-paths like? Is prefetch, victim cache, write buffer data written into the cache? How much parallelism is possible in the hierarchy?

6 Opportunities - Cont’d Memory Organization –Cached DRAM? –Interleave change? –PIM

7 Opportunities - Cont’d Data layout and address mapping –In theory, something can be done but… –MP case is even worse –Adaptive address mapping or hashing based on ???

8 Opportunities - Cont’d Compiler assist –Can select initial configuration –Pass hints on to hardware –Generate code to collect run-time info and adjust execution –Adapt configuration after being “called” at certain intervals during execution –Select/run-time optimize code

9 Opportunities - Cont’d Virtual Memory can adapt –Page size? –Mapping? –Page prefetching/read ahead –Write buffer (file cache) –The above under multiprogramming?

10 Applying Adaptivity What Drives Adaptivity? Performance impact, overall and/or relative “Effectiveness”, e.g. miss rate Processor Stall introduced Program characteristics When to perform adaptive action –Run time: use feedback from hardware –Compile time: insert code, set up hardware

11 Where to Implement In Software: compiler and/or OS +(Static) Knowledge of program behavior +Factored into optimization and scheduling -Extra code, overhead -Lack of dynamic run-time information -Rate of adaptivity -requires recompilation, OS changes

12 Where to Implement - Cont’d Hardware +dynamic information available +fast decision mechanism possible +transparent to software (thus safe) –delay, clock rate limit algorithm complexity –difficult to maintain long-term trends –little knowledge of about program behavior

13 Where to Implement - Cont’d Hardware/software +Software can set coarse hardware parameters +Hardware can supply software dynamic info +Perhaps more complex algorithms can be used –Software modification required –Communication mechanism required

14 Current Investigation L1 cache assist –See wide variability in assist mechanisms effectiveness between Individual Programs Within a program as a function of time –Propose hardware mechanisms to select between assist types and allocate buffer space –Give compiler an opportunity to set parameters

15 Mechanisms Used Prefetching –Stream Buffers –Stride-directed, based on address alone –Miss Stride: prefetch the same address using the number of intervening misses Victim Cache Write Buffer, all after L1

16 Mechanisms Used - Cont’d A mechanism can be used by itself or All are used at once Buffer space size and organization fixed No adaptivity involved

17 Observed Behavior Programs exhibit different effect from each mechanism, e.g none a consistent winner Within a program the same holds in the time domain between mechanisms.

18 Observed Behavior - Cont’d Both of the above facts indicate a likely improvement from adaptivity –Select a better one among mechanisms Even more can be expected from adaptively re- allocating from the combined buffer pool –To reduce stall time –To reduce the number of misses

19 Proposed Adaptive Mechanism Hardware: –a common pool of 2-4 word buffers –a set of possible policies, a subset of: Stride-directed prefetch PC-based prefetch History-based prefetch Victim cache Write buffer

20 Adaptive Hardware - Cont’d Performance monitors for each type/buffer –misses, stall time on hit, thresholds Dynamic buffer allocator among mechanisms Allocation and monitoring policy: –Predict future behavior from observed past –Observe over a time interval dT, set for next –Save perform. trends in next-level tags (<8bits)

21 Further opportunities to adapt L2 cache organization –variable-size line L2 non-sequential prefetch In-memory assists (DRAM)

22 MP Opportunities Even longer latency Coherence, hardware or software Synchronization Prefetch under and beyond the above –Avoid coherence if possible –Prefetch past synchronization Assist Adaptive Scheduling


Download ppt "Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO."

Similar presentations


Ads by Google