Download presentation
Presentation is loading. Please wait.
Published byRashad Martel Modified over 9 years ago
1
1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin
2
2 Memory Bottleneck Memory system performance is not increasing as fast as CPU performance Latency: Use caches, prefetching, … Bandwidth: Use parallelism inside memory system
3
3 How to Increase Memory Command Parallelism? time Read Bank 0 Read Bank 1 bank conflict Bank 0 Bank 1 Bank 2 DRAM Bank 3 Read Bank 0 Read Bank 1 Read Bank 0 better order Similar to instruction scheduling, can reorder commands for higher bandwidth
4
4 Inside the Memory System caches DRAM Read Queue Memory Queue Write Queue arbiter Memory Controller FIFO not FIFO the arbiter schedules memory operations
5
5 Our Work Study memory command scheduling in the context of the IBM Power5 Present new memory arbiters 20% increased bandwidth Very little cost: 0.04% increase in chip area
6
6 Outline The Problem Characteristics of DRAM Previous Scheduling Methods Our approach History-based schedulers Adaptive history-based schedulers Results Conclusions
7
7 Understanding the Problem: Characteristics of DRAM Multi-dimensional structure Banks, rows, and columns IBM Power5: ranks and ports as well Access time is not uniform Bank-to-Bank conflicts Read after Write to the same rank conflict Write after Read to different port conflict …
8
8 Previous Scheduling Approaches: FIFO Scheduling caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches
9
9 Memoryless Scheduling caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches long delay Adapted from Rixner et al, ISCA2000
10
10 What we really want Keep the pipeline full; don’t hold commands in the reorder queues until conflicts are totally resolved Forward them to memory queue in an order to minimize future conflicts C 5 8 BD CA 3 7 D D is better To do this we need to know history of the commands Read/Write Queues memory queue arbiter
11
11 Another Goal: Match Application’s Memory Command Behavior Arbiter should select commands from queues roughly in the ratio in which the application generates them Otherwise, read or write queue may be congested Command history is useful here too
12
12 Our Approach: History-Based Memory Schedulers Benefits: Minimize contention costs Consider multiple constraints Match application’s memory access behavior 2 Reads per Write? 1 Read per Write? … The Result: less congested memory system, i.e. more bandwidth
13
13 How does it work? Use a Finite State Machine (FSM) Each state in the FSM represents one possible history Transitions out of a state are prioritized At any state, scheduler selects the available command with the highest priority FSM is generated at design time
14
14 An Example First Preference Second Preference Third Preference Fourth Preference most appropriate command to memory available commands in reorder queues next state current state
15
15 How to determine priorities? Two criteria: A: Minimize contention costs B: Satisfy program’s Read/Write command mix First Method: Use A, break ties with B Second Method : Use B, break ties with A Which method to use? Combine two methods probabilistically (details in the paper)
16
16 Limitation of the History-Based Approach Designed for one particular mix of Read/Writes Solution: Adaptive History-Based Schedulers Create multiple state machines: one for each Read/Write mix Periodically select most appropriate state machine
17
17 Adaptive History-Based Schedulers Arbiter1Arbiter2Arbiter3 Arbiter Selection Logic Read Counter Write Counter Cycle Counter select 2R:1W 1R:1W 1R:2W
18
18 Evaluation Used a cycle accurate simulator for the IBM Power5 1.6 GHz, 266-DDR2, 4-rank, 4-bank, 2-port Evaluated and compared our approach with previous approaches with data intensive applications: Stream, NAS, and microbenchmarks
19
19 The IBM Power5 Memory Controller 2 cores on a chip SMT capability Large on-chip L2 cache Hardware prefetching 276 million transistors (1.6% of chip area)
20
20 Results 1: Stream Benchmarks
21
21 Results 2: NAS Benchmarks (1 core active)
22
22 Results 3: Microbenchmarks
23
23 caches arbiter Read Queue Write Queue Memory Queue (FIFO) DRAM caches 12 concurrent commands
24
24 DRAM Utilization Number of Active Commands in DRAM Our Approach Memoryless Approach
25
25 Why does it work? caches DRAM Read Queue Memory Queue Write Queue arbiter Memory Controller Low Occupancy in Reorder Queues Full Reorder Queues Full Memory QueueBusy Memory System detailed analysis in the paper
26
26 Other Results We obtain >95% performance of the perfect DRAM configuration (no conflicts) Results with higher frequency, and no data prefetching are in the paper History size of 2 works well
27
27 Conclusions Introduced adaptive history-based schedulers Evaluated on a highly tuned system, IBM Power5 Performance improvement Over FIFO: Stream 63% NAS 11% Over Memoryless: Stream 19%NAS 5% Little cost: 0.04% chip area increase
28
28 Conclusions (cont.) Similar arbiters can be used in other places as well, e.g. cache controllers Can optimize for other criteria, e.g. power or power+performance.
29
29 Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.