1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.

1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin

2 Memory Bottleneck Memory system performance is not increasing as fast as CPU performance Latency: Use caches, prefetching, … Bandwidth: Use parallelism inside memory system

3 How to Increase Memory Command Parallelism? time Read Bank 0 Read Bank 1 bank conflict Bank 0 Bank 1 Bank 2 DRAM Bank 3 Read Bank 0 Read Bank 1 Read Bank 0 better order  Similar to instruction scheduling, can reorder commands for higher bandwidth

4 Inside the Memory System caches DRAM Read Queue Memory Queue Write Queue arbiter Memory Controller FIFO not FIFO the arbiter schedules memory operations

5 Our Work Study memory command scheduling in the context of the IBM Power5 Present new memory arbiters 20% increased bandwidth Very little cost: 0.04% increase in chip area

6 Outline The Problem Characteristics of DRAM Previous Scheduling Methods Our approach History-based schedulers Adaptive history-based schedulers Results Conclusions

7 Understanding the Problem: Characteristics of DRAM Multi-dimensional structure Banks, rows, and columns IBM Power5: ranks and ports as well Access time is not uniform Bank-to-Bank conflicts Read after Write to the same rank conflict Write after Read to different port conflict …

8 Previous Scheduling Approaches: FIFO Scheduling caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches

9 Memoryless Scheduling caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches long delay Adapted from Rixner et al, ISCA2000

10 What we really want Keep the pipeline full; don’t hold commands in the reorder queues until conflicts are totally resolved Forward them to memory queue in an order to minimize future conflicts C 5 8 BD CA 3 7 D D is better  To do this we need to know history of the commands Read/Write Queues memory queue arbiter

11 Another Goal: Match Application’s Memory Command Behavior  Arbiter should select commands from queues roughly in the ratio in which the application generates them  Otherwise, read or write queue may be congested  Command history is useful here too

12 Our Approach: History-Based Memory Schedulers Benefits: Minimize contention costs Consider multiple constraints Match application’s memory access behavior 2 Reads per Write? 1 Read per Write? … The Result: less congested memory system, i.e. more bandwidth

13 How does it work? Use a Finite State Machine (FSM) Each state in the FSM represents one possible history Transitions out of a state are prioritized At any state, scheduler selects the available command with the highest priority FSM is generated at design time

14 An Example First Preference Second Preference Third Preference Fourth Preference most appropriate command to memory available commands in reorder queues next state current state

15 How to determine priorities? Two criteria: A: Minimize contention costs B: Satisfy program’s Read/Write command mix First Method: Use A, break ties with B Second Method : Use B, break ties with A Which method to use? Combine two methods probabilistically (details in the paper)

16 Limitation of the History-Based Approach Designed for one particular mix of Read/Writes Solution: Adaptive History-Based Schedulers Create multiple state machines: one for each Read/Write mix Periodically select most appropriate state machine

17 Adaptive History-Based Schedulers Arbiter1Arbiter2Arbiter3 Arbiter Selection Logic Read Counter Write Counter Cycle Counter select 2R:1W 1R:1W 1R:2W

18 Evaluation Used a cycle accurate simulator for the IBM Power5 1.6 GHz, 266-DDR2, 4-rank, 4-bank, 2-port Evaluated and compared our approach with previous approaches with data intensive applications: Stream, NAS, and microbenchmarks

19 The IBM Power5 Memory Controller 2 cores on a chip SMT capability Large on-chip L2 cache Hardware prefetching 276 million transistors (1.6% of chip area)

20 Results 1: Stream Benchmarks

21 Results 2: NAS Benchmarks (1 core active)

22 Results 3: Microbenchmarks

23 caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches 12 concurrent commands

24 DRAM Utilization Number of Active Commands in DRAM Our Approach Memoryless Approach

25 Why does it work? caches DRAM Read Queue Memory Queue Write Queue arbiter Memory Controller Low Occupancy in Reorder Queues Full Reorder Queues Full Memory QueueBusy Memory System detailed analysis in the paper

26 Other Results We obtain >95% performance of the perfect DRAM configuration (no conflicts) Results with higher frequency, and no data prefetching are in the paper History size of 2 works well

27 Conclusions Introduced adaptive history-based schedulers Evaluated on a highly tuned system, IBM Power5 Performance improvement Over FIFO: Stream 63% NAS 11% Over Memoryless: Stream 19%NAS 5% Little cost: 0.04% chip area increase

28 Conclusions (cont.) Similar arbiters can be used in other places as well, e.g. cache controllers Can optimize for other criteria, e.g. power or power+performance.

29 Thank you

1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.

Similar presentations

Presentation on theme: "1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.

Similar presentations

Presentation on theme: "1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin."— Presentation transcript:

Similar presentations

About project

Feedback