Presentation is loading. Please wait.

Presentation is loading. Please wait.

University Workshop: Introduction to Internal & External FPGA Memory

Similar presentations


Presentation on theme: "University Workshop: Introduction to Internal & External FPGA Memory"— Presentation transcript:

1 University Workshop: Introduction to Internal & External FPGA Memory
January 2019

2 Objectives Understand basics of RAM memory
SRAM vs. DRAM vs. SDRAM SDRAM evolution Understand basics of FPGA on-chip RAM memory MLAB, M9K, M20K, and eSRAM Understand basics of SDRAM memory How is the memory organized, how does it operate? Understand memory interface components What is the PHY, controller, front-end? What IPs are offered?

3 Memories and Storage Major Components in Computer

4 Computer System Memory Hierarchy
Cost $10 / MByte $10 / GByte $100 / TByte

5 Static Random-Access Memory (SRAM) vs
Static Random-Access Memory (SRAM) vs. Dynamic Random-Access Memory (DRAM) Built from 1T Less expensive and higher in density Bits stored as charge on node capacitance Bit cells loses charge over time and when read Must be periodically refreshed to retain charge Typically used as mass main/system memory Built from 6T (transistors) or 8T More expensive and lower in density Bits stored by inverter pair Bit lines driven by transistors Faster response Typically used as local memory (cache) Stores lookup tables for applications due to faster access times

6 FPGA On-Chip Memory Basics

7 Stratix 10 FPGA Memory Hierarchy Building Blocks
CRAM FF MLUT M20K Fast local storage Local CC/MC FIFOs (variable width, depth) Variable sized buffers Specialized Storage Distributed Storage eSRAM Fast-path/low-latency control Memory Management Wide/Deep FIFOs, video line buffers Fixed Program/Data Storage DDRx High-Capacity storage 1G – 200G Wireline Packet Buffering Processor code and data storage Video frame storage QDR/RLDRAM Fast-path/Low-latency storage Memory Management Statistics HBM Medium-Capacity High-BW storage 200G - 2Tbit Wireline Packet Buffering Processor code and data storage Video frame storage Focus on O On-Chip In-Package On-Board

8 Multi-ported SRAM Memory
Single-port, dual-port, n-port Number of ports specifies the number of address ports Associated with the number of ports is the number of read and write data ports <address ports><read ports><write ports> General Term ASIC Terminology Intel FPGA Terminology Comments Single Port 1RW Dual Port 1R1W Simple Dual Port Used for FIFOs 2RW True Dual Port Shared Memory Triple Port 2R1W Not Available Network type applications ROM 1R Read Only

9 Intel FPGA RAM Structures: Native Block Sizes
MAX 10 (smallest low cost parts): M9K (9x1024 total bits) Stratix V, Arria 10, Stratix 10 (Highest level of integration FPGAs): M20K (20x1024 bits) MLAB: Memories built from Lookup tables Quartus fitter groups multiple blocks to create larger memories FPGA fabric wrapper can make deeper and wider memories by grouping memories

10 Example: RAM: 2-Port IP

11 Example: Byte Enable Functional Waveform
Write data with byte enable (active high) and then data read from memory

12 Quartus Chip Planner – RAM blocks

13 SDRAM Memory Basics (DDR3 as an example)

14 SDRAM vs. DDR SDRAM SDRAM = Synchronous Dynamic Random Access Memory
Synchronized with the system bus that can run at much higher clock speeds Pipelining instructions for better efficiency DDR SDRAM = Double Data Rate SDRAM Data is captured at both rising and falling clock edges Clock Data Single Data Rate SDRAM Double Data Rate SDRAM

15 SDRAM Evolution Type Name Bus Clock (MHz) Data Rate (Mbps)
I/O Standard (Volts) Benefits SDRAM Synchronous Dynamic RAM LVTTL (3.3V) Synchronized to system clock DDR1 SDRAM Double data rate 1 SDRAM SSTL_2 (2.5V) Greater bandwidth (transferring data on both rising and falling clock edges) DDR2 SDRAM Double data rate 2 SDRAM SSTL_18 (1.8V) 2x faster vs. DDR. Improved I/O bus signal. DDR3 SDRAM Double data rate 3 SDRAM SSTL_15 (1.5V) 40% less power vs. DDR2. DDR4 SDRAM Double data rate 4 SDRAM SSTL_12 / POD (1.2V) Better efficiency by 4 new bank groups. Each bank group can operate singlehanded => process 4 data within a clock cycle. For example, PC266 is equivalent to PC2100 (64 bit * 2 * 133 MHz = 2.1 GB/s or 2100 MB/s).

16 External Memory Terminology
Description Use Vendors DDR3, DDR4 Double Data Rate DRAM Main system memory Samsung, Micron, SK Hynix Hybrid Memory Cube (HMC) Serial DRAM Micron High Bandwidth Memory (HBM) In-package (2.5D) DRAM Samsung, SK hynix QDR II, QDR IV Quad Data Rate SRAM Networking control plane memory Cypress, GSI, ISSI RLDRAM3 Reduced Latency DRAM Networking control plane table lookups Micron, Renesas Non-volatile Flash NAND: higher capacity, sequential access Storage Samsung, Micron, SK hynix, Toshiba, etc NOR: faster, random access FPGA configuration Cypress, Samsung, Micron, etc Non-volatile 3D XPoint Emerging Storage class memory Intel, Micron Note: This section is focused on DDR3 as an example. The other protocols are not discussed.

17 External Memory InterFace (EMIF) Subsystem
FPGA, CPU, or SOC 0x000fffff 0x7fffffff

18 DRAM Modules – Overview
DRAM chips have narrow data widths Typical DRAM chip data widths are x4, x8 and x16. DRAM modules are a collection of DRAM chips cascaded to form wider data widths Typically referred to as Dual In-line Memory Module (DIMM). Shares command, control, address lines but not the data strobe and data. Modules have notches in different spaces along the fingers to differentiate different DRAM types. Contains Serial Presence Detect (SPD) EEPROM – stores information about the module type for the memory controller to configure the memory correctly. Example: 8 DRAM chips of x8 forms a 64-bit DIMM. Pros: Provides high capacity DRAM chip with a wide data width. Cons: All accesses must be to the data width provided (i.e. loss of lower granularity accesses).

19 DDR3 Memory Organization
COL n COL n+1 COL n+2 COL n+3 COL n+4 COL n+5 COL n+6 COL n+7 Each column is used to store one data word Each read/write transfer consists of 8 adjacent words Each row consist of multiple columns Active row is called page Each bank consist of multiple rows Each component consist of multiple banks BANK z COL BANK 1 ROW COL Column x Column 0 ROW 0 BANK 0 ROW y While the IO timing parameters for SDRAM memories are similar to other memories like SRAM, the memory organization is very different. Unlike SRAMs that use 6 transistors to store a single bit of information, SDRAM devices like DDR3 using a single transistor plus capacitor to store the same bit of information. The benefit of this implementation is density and cost… and these factors lead to increased interest in DRAMs in our customer base. These density/cost benefits are achieved at the expense of memory controller complexity. Let us examine the organization of SDRAM to understand the controller requirements. In DDR3, each burst contains 8 beats of data… in other words, typical read write transactions to DDR3 have a burst length of 8. COL Column 0 Column x

20 To write/read to a specific row and column address in a bank:
DDR3 Memory Operation To write/read to a specific row and column address in a bank: Issue activate to “open” desired row address Issue write/read to desired column address Issue precharge to “close” an opened row (RC: to precharge sense amp to be ready for next row) Activate and precharge also referred as row command Write and read also referred as column command Each bank can be accessed independently To preserve memory contents Issue refresh commands every 7.8µs on average A DDR SDRAM device can have a number of banks open at once. Each bank has a currently selected row. Changing the column within the selected row of an open bank requires no additional bank management commands to be issued. Changing the row in an active bank, or changing the bank both incur a protocol penalty that requires the precharge (PCH) command closes the active row or bank, and the active (ACT) command thenopens or activates the new row or bank combination. The duration of this penalty is a function of the controller clock frequency, the memory clock frequency, and the memory device characteristics. Calculating the impact of a change of memory and controller configuration on a given system is not a trivial task, as it depends on the nature of the accesses that are performed.

21 Example: Single Read from DDR3
Read operation sequence Activate row (page) containing data Issue read command after tRCD (ACTIVATE to internal READ or WRITE delay time) Data available tCL clock cycles later (internal READ to first bit of output data delay time) Single read requires 18 clock cycles Consider a 533 MHz memory device CAS Latency (tCL) = 7 cycles and tRCD = 7 cycles 4 memory clock cycles to complete burst length 8 transfer Clock Cycle Command ACT RD Address ROW COL/BA tRCD = 7 cycles Activate to Read/Write Data D0 D1 D2 D3 D4 D5 D6 D7 tCL = 7 cycles CAS Latency

22 Example: Single Read vs. Back-to-Back Reads from DDR3
Clock Cycle Single read requires 18 clock cycles Consider a 533 MHz memory device CAS Latency (tCL) = 7 cycles and tRCD = 7 cycles 4 memory clock cycles to complete burst length 8 transfer Command ACT RD Address ROW COL/BA tRCD = 7 cycles Activate to Read/Write Data D0 D1 D2 D3 D4 D5 D6 D7 tCL = 7 cycles CAS Latency Clock Cycle Back-to-Back reads requires 22 clock cycles No additional delay on back to back read commands to same row 4 more clock cycles to complete second burst length 8 transfer Command ACT RD RD Address ROW COL/BA COL/BA tCCD = 4 cycles Read to Read tRCD = 7 cycles Activate to Read/Write Data D0 D1 D2 D3 D4 D5 D6 D7 D0 D1 D2 D3 D4 D5 D6 D7 tCL = 7 cycles CAS Latency

23 Efficiency Efficiency measures data bus utilization
From previous examples (Single Read and Back-to-Back Reads): Efficiency of single read = (4 cycles of data / 18 cycles) = 22% Efficiency of two reads to same page = (8 / 22) = 36% Efficiency of reading full page = (128x4) / ((128x4) + (18-4)) = (512 / 526) = 97% Note: 128 columns in page (row)

24 External Memory Interface IP Solution

25 Memory Interface Layers
FPGA Controller Avalon cmd6 cmd1 cmd5 cmd4 cmd3 cmd2 cmd0 Scheduler/Arbitrator DDR Command Generator (Burst Adaptor) TBP Command Pool Avalon-ST/MM AXI Input Adaptor cmd7 Command Queue Command Queue Command Ordering Logic PHY Clocking Address / Command Calibration Sequencer DDIO Data Path FIFO I/O Buffers AFI Memory Chips / DIMMs PCB

26 Memory Interface Layers (cont.)
Controller Interfaces between the PHY and user logic using Avalon-MM Handles DRAM bank management and command sequence PHY Physical interface between the FPGA and memory device Handles I/O timing requirements imposed by memory device Implemented in FPGA periphery using dedicated circuits IOE registers, DQS clock trees, DLL, PLL, OCT, delay chains, etc. PCB / Memory Memory pins on DRAM chips / DIMMs

27 DDR3 IP Read – Interface Layer Signals
Avalon-MM FPGA FPGA AFI Memory FPGA To

28 DDR3 Memory Controller IP
Fully parameterizable IP Specify memory & board parameters Parameterize PHY & controller settings Generate HDL design files Comprehensive IP solution Clear-text RTL SDC timing constraints I/O logic assignments Example design with traffic generator Simulation testbench and scripts

29 Generated Synthesis and Simulation Example Designs

30 Intel EMIF Support Center
EMIF Support Center website Documentation (User Guides) Training Tools

31 Lab Preview

32 Addressing from Programmer View to SDRAM
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Programmer’s view: 32 bit integer Decodes SDRAM space Mapped through bus protocol 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 B1 Row address B0 Column address SDRAM Controller: Issues Commands (Activate, etc), Row addr, column addr 1, column 2, etc

33 Comparing Efficiency of On-Chip FPGA RAM to SDRAM
Differences in cost and efficiency drive use of memory hierarchy

34 Thank you


Download ppt "University Workshop: Introduction to Internal & External FPGA Memory"

Similar presentations


Ads by Google