Download presentation
Presentation is loading. Please wait.
Published byPrudence Gilbert Modified over 6 years ago
1
SystemC Simulation Based Memory Controller Optimization
Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys
2
Agenda Background Challenges – System Level
Memory Controller Architecture – An Example Optimization & Configuration Requirements Methodology A case Study Conclusion
3
Background SDRAM controller ‘s are an integral piece of today’s System on Chip (SoC) SDRAM access performance is one of the primary bottleneck Memory Controller is responsible for optimizing SDRAM accesses Across the system Optimizing JEDEC interface utilization
4
Challenges – System Level
Early design-space and architecture exploration System level optimization for targeted use cases Interconnect configuration Memory hierarchy (Buffers/caches/ on-chip / off-chip memories) Memory architecture optimization Meeting bandwidth/latency requirements for each application / master in the system System level architecture and design for the targeted use cases and applications SDRAM hardware architecture optimization
5
Memory Controller Architecture – An Example
AXI Port Request Multiplexer RdData / WrRsp Demux Scheduler Memory Access Controller SDRAM AXI IF Command Queue Port Arbiter Programmable interface AXI IF JEDEC IF AXI IF A Sample Memory Controller
6
Optimization & Configuration - Requirements
System level visibility (end to end latency/throughput) Memory access co-relation with system traffic Visualization and analysis of memory interface activity Root cause analysis for various bottlenecks / limitations SDRAM architecture exploration
7
Methodology Specify system constraints like latency, throughput or utilization Simulate and analyze constraint violations Analyze system characteristics to identify bottleneck(s) Investigate to identify the root cause of the problem Re-configure the system to address bottleneck(s) Re-run / re-analyze refined configuration till constraints are satisfied
8
Memory Controller Optimization – A case study
CORE0 SDRAM INTERFACE Bus AXI MEMORY CONTROLLER SDRAM (DDR3) CORE1 AXI PORT INTERFACE (XPI) ARBITER SCHEDULER Objective: Optimize memory controller to achieve desired latencies for CORE0 Optimization on throughput & Utilization is also possible
9
System Level – Latency Analysis
Cumulative average duration de-composed per component Average Duration for Read Transaction
10
System Level – Latency Analysis
CORE0 memory access latency Interconnect latencies Average Delay for CORE0 transactions in Memory Controller Arbiter is 262ns. But Arbiter alone is causing a delay of 100ns Analysis Result For Round-Robin Arbitration Scheme: Average SDRAM access delay for CORE0 is 72ns. Delay in different components of Memory Controller for transactions from CORE0 Delays for memory access for CORE0
11
Priority Based Arbitration for Memory Controller Arbiter
CORE0 memory access latency reduces from 428ns to ~310ns. Average SDRAM access delay for CORE0 is ~68ns while it was 72ns previously. But it is still 22% of the total Latency. Average Delay for CORE0 transactions reduces from 428ns to 310ns. Result For Priority based Arbitration Scheme:- Delay in different components of Memory Controller for transactions from CORE0
12
Memory Channel Utilization Analysis for CORE0
Detailed Memory Channel Utilization for the entire system COMMAND and DATA PHASE utilization for CORE0 only HIT = 8.4 % MISS = % For optimum architecture HIT % >> MISS %
13
Initial Inferences A large percentage of accesses are resulting in page misses, causing: Increased access latency, In-efficient usage of JEDEC interface and Higher power consumption due to increased “precharges” and “activates” Possible reasons for inefficient system could be Mapping of application addresses to memory addresses Page policy Page crossovers Rank crossovers
14
Memory Channel Utilization Analysis for CORE0
Maximum MISSES are due to transaction in same Bank but different rows COMMAND and DATA PHASE utilization for Core_0 with CMD_PHASE divided into CMD setup due to different reasons for MISS.
15
Refined Inferences The reason for almost all Page MISS in this system is transaction on same Bank but different Page Possible solution for resolving this Change in traffic (not always possible) Use a memory with bigger page sizes
16
Increasing Page Size from 1KB to 2KB
Command Setup phase drastically reduces from % to 15% Increased page size results in desired performance Zero Page Miss. All memory accesses result in Page Hit.
17
Effect of increasing Page Size on overall Delay
CORE0 memory access latency reduces from 310ns to ~222ns. Delays for memory access for CORE0 reduces from 68ns to 52 ns.
18
Conclusions System level performance analysis allows detection of performance problems Detailed data path visibility allows identification of hot-spots, e.g: Arbitration scheme & Hit/Miss ratio of SDRAM Analyzing the hot-spots allows identification of root causes Systematic refinement allows creation of optimum architecture for targeted use cases
19
Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.