Download presentation
Presentation is loading. Please wait.
Published byVincent Owens Modified over 8 years ago
1
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information Science and Technology
2
2 Background “Memory-wall” –High memory access latency DRAM Structure –Channel, Rank, Bank, Row, Column … –Various timing constraint Challenge of multi-core –High parallelism –More data contention Solution –More memory channels –Efficient memory scheduler
3
3 Motivation Threads classification [TCM:Kim:2008] –Latency-sensitive threads –Bandwidth-sensitive threads A memory scheduler should –Improve system throughput –Avoid starvation –Keep fair among different threads
4
4 Goals Requests of latency-sensitive threads –To be issued ASAP Requests of bandwidth-sensitive threads –Avoid unfairness Our proposal: PBFS –Prioritize latency-sensitive threads –Avoid starvation of bandwidth-sensitive threads
5
5 Basic Idea Each thread gets a priority –Range from -1 to n Top-priority (n) – latency sensitive threads Bottom-priority (0) – intermediate threads Medium-priority (1,n-1) – latency sensitive threads Idle (-1) – finished threads or compute-intensive threads
6
Priority Updating Rules Dynamically update –Once a request is issued The corresponding thread priority - 1 –When there no thread has top-priority All thread’s priorities +1 –When a time threshold is arrived Identify Idle threads, Adjust top-priority –Extremely unbalance: increase top-priority –Extremely balance: decrease top-priority –Other case: unchanged –Upper/lower boundaries are adjusted by active threads 6
7
System throughput Latency-sensitive threads –Easy to get top-priority –Issued as soon as possible Example –2-core CMP Thread A, latency-sensitive Thread B, bandwidth-sensitive Top-priority = 2 Init, both threads’ priorities are 2 7
8
Example 8 Rq 0 Rq 0 Rq 1 Rq 2 Rq 3 Rq 5 Rq 6 Rq 7 Rq 8 Rq 1 Rq 0 Rq 0 Rq 1 Rq 2 Rq 3 Rq 5 Rq 6 Rq 7 Rq 8 Rq 1 Rq 4 Rq 9 Rq 4 Rq 9 01234567891011 222122221222 100000000000 Thread A Thread B Execution Mem. Cycle Priority A Priority B
9
Starvation Avoidance When a thread continuously issued too many requests –It will be classified as bandwidth-sensitive thread –Other threads may have more chance to promote their priorities Example –2-core CMP Thread A, less bandwidth-sensitive Thread B, bandwidth-sensitive Top-priority = 2 Init, both threads’ priorities are 2 9
10
Example 10 Rq 0 Rq 0 Rq 1 Rq 2 Rq 3 Rq 5 Rq 6 Rq 7 Rq 8 Rq 1 Rq 0 Rq 0 Rq 1 Rq 2 Rq 3 Rq 2 Rq 6 Rq 7 Rq 8 Rq 3 Rq 4 Rq 9 Rq 4 Rq 9 Rq 2 Rq 4 Rq 1 Rq 3 Rq 5 0123456789101112131415 2221121212121222 1000111111111100 Rq 4 Rq 5 Rq 5 Thread A Thread B Execution Mem. Cycle Priorit y A Priority B
11
Hardware overhead Need hardware support to –record the priority of each thread –monitor the threads’ behavior (read counts within a time interval) –maintain the flags that whether a row buffer can close The storage overhead is small and easy to implement 11
12
Evaluation Usimm-1.3 Memory configuration –1 channel –4 channel Benchmarks Metrics –Execution time –Maximum slowdown –EDP 12
13
Execution Time Overall –CLOSE: 4.2% reduction –PBFS: 7.5% reduction 13
14
Maximum Slowdown Overall –CLOSE: 4.7% reduction –PBFS: 7.0% reduction 14
15
EDP 15 Overall –CLOSE: 9.1% reduction –PBFS: 13.8% reduction
16
Summary We proposed PBFS –Classify threads with priority –Dynamically update threads’ priorities –Guarantee system throughput –Avoid starvation of bandwidth-sensitive threads –Low hardware overhead 16
17
Thanks 17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.