Download presentation
Presentation is loading. Please wait.
Published byBailee Fenlon Modified over 10 years ago
1
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State Univ.
2
Feb. 15, 2005HPCA-112 DRAM Memory Optimizations Optimizations at DRAM side can make a big difference on single-threaded processors Enhancement of chip interface/interconnect Access scheduling [Hong et al. HPCA’99, Mathew et al. HPCA’00, Rixner et al. ISCA’00] DRAM-side locality [Cuppu et al. ISCA’99, ISCA’01, Zhang et al., MICRO’00, Lin et al. HPCA’01]
3
Feb. 15, 2005HPCA-113 How does SMT Impact Memory Hierarchy? Less performance loss per cache miss to DRAM memories – Lower benefit from DRAM-side optimizations? But more cache misses due to cache contention – Much more pressure on main memory Is DRAM memory design more important or not?
4
Feb. 15, 2005HPCA-114 Outline Motivation Memory optimization techniques Thread-aware memory access scheduling Outstanding request-based Resource occupancy-based Methodology Memory performance analysis on SMT systems Effectiveness of single-thread techniques Effectiveness of thread-aware schemes Conclusion
5
Feb. 15, 2005HPCA-115 Memory Optimization Techniques Page modes Open page: good for programs with good locality Close page: good for programs with poor locality Mapping schemes Exploitation of concurrency (multiple channels, chips, banks) Row buffer conflicts Memory access scheduling Reorder of concurrent accesses Reducing average latency and improving bandwidth utilization
6
Feb. 15, 2005HPCA-116 Memory Access Scheduling for Single- Threaded Systems Hit-first A row buffer hit has a higher priority than a row buffer miss Read-first A read has a higher priority than a write Age-based An older request has a higher priority than a new one Criticality-based A critical request has a higher priority than a non- critical one
7
Feb. 15, 2005HPCA-117 Memory Access Concurrency with Multithreaded Processors ProcessorMemory Single-threaded Multi-threaded
8
Feb. 15, 2005HPCA-118 Thread-Aware Memory Scheduling New dimension in memory scheduling for SMT systems: considering the current state of each thread States related to memory accesses Number of outstanding requests Number of processor resources occupied
9
Feb. 15, 2005HPCA-119 Outstanding Request-Based Scheme Request-based A request generated by a thread with fewer pending requests has a higher priority H A1 H A2 H B1 H A3 H A4 H B2 time H A1 H A2 H A3 H A4 H B1 H B2
10
Feb. 15, 2005HPCA-1110 Outstanding Request-Based Scheme Request-based Hit-first and read-first are applied on top For SMT processors, sustained memory bandwidth is more important than the latency of an individual access H A1 H A2 M B1 H A3 H A4 M B2 time H A1 H A2 H A3 H A4 M B1 M B2
11
Feb. 15, 2005HPCA-1111 Resource Occupancy-Based Scheme ROB-based Higher priority to requests from threads holding more ROB entries IQ-based Higher priority to requests from threads holding more IQ entries Hit-first and read-first are applied on top
12
Feb. 15, 2005HPCA-1112 Methodology Simulator SMT extension of sim-Alpha Event-driven memory simulator (DDR SDRAM and Direct Rambus DRAM) Workload Mixture of SPEC 2000 applications 2-, 4-, 8-thread workload “ILP”, “MIX”, and “MEM” workload mixes
13
Feb. 15, 2005HPCA-1113 Simulation Parameters Processor speed3 GHzL1 caches64KB I/D, 2-way, 1- cycle latency Fetch width8 inst.L2 cache512KB, 2-way, 10- cycle latency Baseline fetch policyDWarn.2.8L3 cache4MB, 4-way, 20-cycle latency Pipeline depth11MSHR entries(16+4 prefetch)/cache Issue queue size64 Int., 32 FPMemory channels2/4/8 Reorder buffer size256/threadMemory BW/channel200 MHz, DDR, 16B width Physical register num384 Int., 384 FPMemory banks4 banks/chip Load/store queue size64 LQ, 64 SQDRAM access latency15ns row, 15ns column, 15ns precharge
14
Feb. 15, 2005HPCA-1114 Workload Mixes 2-threadILPbzip2, gzip MIXgzip, mcf MEMmcf, ammp 4-threadILPbzip2, gzip, sixtrack, eon MIXgzip, mcf, bzip2, ammp MEMmcf, ammp, swim, lucas 8-threadILPgzip, bzip2, sixtrack, eon, mesa, galgel, crafty, wupwise MIXgzip, mcf, bzip2, ammp, sixtrack, swim, eon, lucas MEMmcf, ammp, swim, lucas, equake, applu, vpr, facerec
15
Feb. 15, 2005HPCA-1115 Performance Loss Due to Memory Access
16
Feb. 15, 2005HPCA-1116 Memory Access Concurrency
17
Feb. 15, 2005HPCA-1117 Memory Channel Configurations
18
Feb. 15, 2005HPCA-1118 Memory Channel Configurations
19
Feb. 15, 2005HPCA-1119 Mapping Schemes
20
Feb. 15, 2005HPCA-1120 Memory Access Concurrency
21
Feb. 15, 2005HPCA-1121 Thread-Aware Schemes
22
Feb. 15, 2005HPCA-1122 Conclusion DRAM optimizations have significant impacts on the performance of SMT (and likely CMP) processors Mostly effective when a workload mix includes some memory-intensive programs Performance is sensitive to memory channel organizations DRAM-side locality is harder to explore due to contention Thread-aware access scheduling schemes does bring good performance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.