Memory Interfaces for the PPC-H Card (Still a work in progress…)

Slides:



Advertisements
Similar presentations
Synchronous Static Random Access Memory (SSRAM). Internal Structure of a SSRAM AREG: Address Register CREG: Control Register INREG: Input Register OUTREG:
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Chapter 5 Internal Memory
Computer Organization and Architecture
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Accelerating DRAM Performance
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Low Power Memory. Quick Start Training Agenda What constitutes low power memory Variations & vendors of low power memory How to interface using CoolRunner-II.
DDR SDRAM Memory Interface. Quick Start Training Agenda Why DDR? DDR vs. SDR Understanding DDR SDRAM – Bus timing CoolRunner-II and DDR SDRAM demo board.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
1 The Basic Memory Element - The Flip-Flop Up until know we have looked upon memory elements as black boxes. The basic memory element is called the flip-flop.
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
Main Memory by J. Nelson Amaral.
Computer Memory Basic Concepts
NS Training Hardware. Memory Interface Support for SDRAM, asynchronous SRAM, ROM, asynchronous flash and Micron synchronous flash Support for 8,
CSIT 301 (Blum)1 Memory. CSIT 301 (Blum)2 Types of DRAM Asynchronous –The processor timing and the memory timing (refreshing schedule) were independent.
Memory Technology “Non-so-random” Access Technology:
SDRAM Synchronous dynamic random access memory (SDRAM) is dynamic random access memory (DRAM) that is synchronized with the system bus. Classic DRAM has.
Memory Hierarchy Registers Cache Main Memory Fixed Disk (virtual memory) Tape Floppy Zip CD-ROM CD-RWR Cost/Bit Access/Speed Capacity.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
Asynchronous vs. Synchronous Counters Ripple Counters Deceptively attractive alternative to synchronous design style State transitions are not sharp! Can.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Memory Hierarchy Registers Cache Main Memory Fixed Disk (virtual memory) Tape Floppy Zip CD-ROM CD-RWR Cost/Bit Access/Speed Capacity.
Computer Architecture Lecture 24 Fasih ur Rehman.
Feb. 26, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 12: State Elements, Registers, and Memory * Jeremy R. Johnson Mon. Feb.
COMP541 Memories II: DRAMs
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Physical Memory and Physical Addressing ( Chapter 10 ) by Polina Zapreyeva.
Washington University
COMP541 Memories II: DRAMs
Lecture 3. Lateches, Flip Flops, and Memory
Interconnection Structures
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
COMP211 Computer Logic Design
Registers and Counters
Modern Computer Architecture
COMP541 Memories II: DRAMs
Appendix B The Basics of Logic Design
CS 286 Computer Organization and Architecture
Dr. Michael Nasief Lecture 2
William Stallings Computer Organization and Architecture 7th Edition
Types of Memory For Embedded System Development
William Stallings Computer Organization and Architecture 8th Edition
ASP/H – CRM Interface John DeHart Applied Research Laboratory Computer Science and Engineering Department
AT91 Memory Interface This training module describes the External Bus Interface (EBI), which generatesthe signals that control the access to the external.
The Xilinx Virtex Series FPGA
Timing Analysis 11/21/2018.
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Programmable Interval timer 8253 / 8254
Memory.
Programmable Interval timer 8253 / 8254
Interfacing Data Converters with FPGAs
The Xilinx Virtex Series FPGA
Sequential Logic.
AKT211 – CAO 07 – Computer Memory
DRAM Hwansoo Han.
William Stallings Computer Organization and Architecture 8th Edition
Bob Reese Micro II ECE, MSU
4-Bit Register Built using D flip-flops:
Presentation transcript:

Memory Interfaces for the PPC-H Card (Still a work in progress…) John DeHart Applied Research Laboratory Computer Science and Engineering Department http://www.arl.wustl.edu/arl/projects/techX

Objectives Learn about how DDR SDRAM and QDR SRAM really work. Understand the Memory interfaces on the PPH Chip so we can get the board level design right Understand the Reference Designs available from Xilinx Understand what will need to change in the RAD circuit to get it to run on the PPH Chip for the port of the IPv4 router SDRAM changes SRAM changes Confirm that the PPH Chip Design is supportable Bank differences top/bottom vs. left/right SSO (Simultaneous Switching Output) restrictions per bank

ATCA – Common Substrate Base Card 64Mx64 (each) 2x200 MHz 2xDDR400 DRAM PAROLI/RX 512Kx36 (each) 2x200 MHz PAROLI/TX 3xQDRII SRAM PP/H v2Pro P100 250Kx72 100-125 MHz TCAM . . . PPH-Serial 16x625MHz (2*18 Diff Pairs) (16 Data, 1 Clock, 1 Frame/Flow_control Sys Flash Xilinx ACE Substrate v2Pro P100 Ethernet 100/BT GPIO/SelectMap (15 single-ended signals) 1xDDR DRAM Power

Hardware Packet Processor: DDR SDRAM 64Mx64 (each) 2x200 MHz 2xDDR400 SDRAM 512Kx36 (each) 2x200 MHz 3xQDRII SRAM PP/H v2Pro P100 250Kx72 100-125 MHz TCAM

DDR SDRAM MICRON MT16VDDF6464H Datasheet location: /project/techX/DataSheets/Micron/DDR_1GB3200_09_2004.pdf 512MB: 64Meg x 64 bits 4 Banks 200 MHz differential clock DDR: Double Data Rate Access on rising and falling edge of clock 200-pin SODIMM 16 devices per module Read or CAS Latency (CL) Depending on clock freq, can be set to 2, 2.5 or 3 clock cycles 75 MHz <= f <= 133MHz: CL=2 75 MHz <= f <= 167MHz: CL=2.5 133 MHz <= f <= 200MHz: CL=3 If we operate at 200MHz, CL=3 Other vendors, other options from Micron might have different CL.

DDR SDRAM (continued) MICRON MT16VDDF6464H Bursts lengths of 2, 4, or 8 Types: sequential or interleaved Data retrieved is same, just order of words is different. Bidirectional Data Strobe (DQS) Intermittent (not free running) i.e. only present when data is present Edge-aligned with data for reads Center-aligned with data for writes Auto Refresh and Self Refresh modes available Auto Refresh: The normal mode of operating, like CAS#-Before-RAS# refresh Addressing generated by internal refresh controller Average refresh interval: 7.8125us (maximum) What exactly is an “average maximum”? Self Refresh Used to retain data in SDRAM even if the rest of the system is powered down. No external clocking required while in Self Refresh Not what we want to use…

DDR SDRAM (continued) MICRON MT16VDDF6464H Auto precharge option Performs a PRECHARGE command automatically upon completion of a READ or WRITE burst Enabled or disabled with each individual READ or WRITE burst Via A10 pin A precharge (either auto or explicit PRECHARGE) must be issued before opening a different row in the same bank. ACTIVE command activates a specific row in a bank That row remains active for accesses until a PRECHARGE is issued for that bank. Then a different row in that bank may be activated and accessed.

DDR SDRAM (continued)

DDR SDRAM (continued)

DDR SDRAM SODIMM Module

DDR SDRAM Interface Signals DQ[0:63] FPGA Data SDRAM DQS[0:7] Data Strobes DM[0:7] Data Write Mask A[0:12] Address BA[0:1] Bank Address S[0:1] Chip Selects SA[0:2] Presence-Detect Addr SDA Serial Presence Detect data SCL Serial Clock for Presence Detect CKE[0:1] Clock Enable CK0/CK0# CK1/CK1# Clocks WE, CAS, RAS Command Inputs Total of 111 Signals

DDR SDRAM: Questions Why two clocks? CK0/CK0# and CK1/CK1# Do we just tie them to the same clock? 8 Data strobes? We certainly have to provide 8 for writes. Do we use all 8 for reads? We could probably get away with using one but there may not be any point to it.

SDRAM In FPX/NSP Implementation FPX/NSP PSM Implementation It looks like our PSM implementation in the NSP/FPX does not use auto precharge (it is available in the SDRAM on the FPX). It does an explicit PRECHARGE command after each READ and WRITE. Refresh: Every 200 clock ticks it sets up to do a refresh After doing the refresh it waits 6 ticks to go back to operational state Clock Rate: 62.5 MHz, Period 16ns 200 clock ticks  3.2 us 6 clock ticks  96ns Data width Address width Fully synchronous SDR vs Source Synchronous DDR Single-ended clock signal vs. Differential clock signals Initialization?

SDRAM State Machine in FPX/PSM case StateMc0.PresentState is when e_Start=> if StateMc0.Run = '1' then StateMc0.NextState <= e_Act; end if; when e_Act => StateMc0.NextState <= e_Nop0; when e_Nop0 => if StateMc0.OpType = '0' then StateMc0.NextState <= e_Rd; else StateMc0.NextState <= e_Wr; when e_Rd => StateMc0.NextState <= e_Nop1; when e_Wr => StateMc0.NextState <= e_Nop1; when e_Nop1 => if StateMc0.BurstLength /= "00000000" then StateMc0.NextState <= e_Nop2; when e_Nop2 => StateMc0.NextState <= e_Nop3; when e_Nop3 => StateMc0.NextState <= e_Nop4; when e_Nop4 => StateMc0.NextState <= e_Pch; when e_Pch => StateMc0.NextState <= e_Start; when others => StateMc0.NextState <= e_Start; end case;

Hardware Packet Processor: QDRII SRAM 64Mx64 (each) 2x200 MHz 2xDDR400 SDRAM 512Kx36 (each) 2x200 MHz 3xQDRII SRAM PP/H v2Pro P100 250Kx72 100-125 MHz TCAM

QDRII SRAM SAMSUNG K7R163682B Datasheet location: 2MB: 512K x 36 bits /project/techX/DataSheets/Samsung/QDR_k7r16xx82b_rev30.pdf 2MB: 512K x 36 bits 200MHz differential clock Clock cycle time: -25: min 4.0 ns, max 6.3 ns (250 MHz – 159 MHz) -20: min 5.0 ns, max 7.88 ns (200 MHz – 127 MHz) -16 : min 6.0 ns, max 8.4 ns (167 MHz – 119 MHz) QDR II DDR with separate Read and Write data bus Common Address bus for Read and Write 165 pin FBGA Package Fixed two word bursts Support for Byte write operations

QDRII SRAM SAMSUNG K7R163682B Clocks: K/K#: Input Clock Address, data inputs and all control signals are synchronized to K or K# C/C#: Input Clock for Output Data Normally: data outputs synchronized to output clocks (C and C#) If C and C# tied high, data outputs synchronized to K and K# We will do this. Any reason not to? All reference designs we have see do it. Tie them high on board or in FPGA? CQ/CQ#: Output Echo Clock Read data are referenced to echo clock outputs (CQ or CQ#). CQ/CQ# High to Data Valid (tCQHQV) (ns): 0.30(-25) 0.35 (-20) 0.40 (-16) CQ/CQ# High to Output Hold (tCQHQX)(ns): -0.30(-25) -0.35 (-20) -0.40 (-16)

QDR SRAM

QDR SRAM D0[0:35] FPGA Data Input (Write) QDR SRAM Q0[0:35] Data Output (Read) SA[0:20] Address (with expansion addr bits) BW[0:3] Byte Write Control R#, W# Read and Write Control ZQ Output Driver Impedance Control Board connect but not to FPGA? VREF[0:1] Input Reference Voltage Doff# DLL Disable JTAG[0:3) JTAG Test Signals K, K# Input Clock C, C# Input Clock for Output Data CQ, CQ# Output Echo Clock Total of 105(+8) Signals

QDRII SRAM Operations Reads Writes Initiated by activating R# at rising edge of positive input clock K. First pipelined data is transferred out of device triggered by C# clock following next K# clock rising edge Next burst data is triggered by rising edge of following C clock rising edge. Writes Initiated by activating W# at rising edge of positive input clock K Address is presented with following K# clock rising edge. First data is registered with same rising edge of K clock as W# Second data word is registered with following K# clock rising edge

QDR SRAM

QDR SRAM

QDR SRAM: Questions Three sets of clocks: K/K#, C/C#, CQ/CQ# Which do we need/want to use? Reference design xapp750/xapp770c.

SRAM In FPX/NSP Implementation 1MB (8Mb), 256Kx36b Operates up to 166 MHz We use it at 62.5 MHz Two uses of SRAM, each with a separate interface CARL QM Burst operations available with SRAM Do we use them? Controlled by ADV/LD# signal Write operations: Assert W# and give Address on clock cycle 1 Two clock cycles later give data Read operations: Assert R and give Address on clock cycle 1 Two clock cycles later, data appears.

SRAM Differences for Port SDR vs. QDR Asynchronous interface Core clock: 125 MHz Memory clock: 200 MHz DDR Burst operations Do we use them in current implementation? Write operations: Timing of data and address presentation are different Read operations: Use of Data Strobes instead of synchronous clock Each new SRAM is twice the size of old, same width

PPH Chip Memory Interfaces Our PPH Chip design has 5 memory interfaces: 2 DDR SDRAM @ 200 MHz 3 QDR SRAM @ 200 MHz Each of the above consumes a whole bank 4 banks available for these 5 memory interfaces The Interface to the Substrate (16b x 625MHz) will probably have the same bank constraint. 4 banks for 6 interfaces: Does not fit! We are still awaiting confirmation from Xilinx and information about the details of why these banks support higher speeds than other banks. Options if we can do all 5 at 200 MHz and the Substrate interface I still need to revisit the clock resources issues to see if any of these are doable. But, I believe we can. Can we run both of the SDRAM interfaces at 166 MHz? Can we run one of the SDRAM and one of the SRAM interfaces at 166 MHz?

Xilinx Reference Design: DDR SDRAM XAPP678c: Xilinx Reference design for Data Capture using CLB FF For 72 bit wide data path to Memory, 144 bit wide data path to User Logic 34 VHDL files 24300 lines of VHDL (verilog version also available) 12400 of that is a chip level file with architectures, components and black boxes. So, more like about 10000 lines of VHDL Still looking through it all XMIL 007: Tool from Xilinx for generating the DDR memory interface. Generates a controller that uses methods from XAPP678c Local clock inversion DQS delay Etc. Puts it in the bank(s) you select. For designs with > 167MHz, banks 2, 3, 6 or 7 must be used! Produces a .ucf file to place and constrain things that need to be Also seems to work for QDRII. Also has the bank limitations for > 167MHz Major Challenge is the Read operation Data Strobes from DDR SDRAM: not free-running Not phase aligned with internal controller clock Edge aligned, so they need to be shifted/delayed

Xilinx Reference Design: DDR SDRAM 8-bit example shown in docs Implemented as two stages: Stage 1: Four clock-enabled CLB flip-flops per data bit Clocked by delayed version of data strobe (delayed_dqs) Clock-enable inputs generated by dividing strobe by 2 Data valid window for next stage increases from half a clock cycle to two clock cycles Stage 2: Four FIFOs Clocked by internal controller clock (200MHz) This is still in SDRAM Controller clock region not our internal logic Interface to user logic is twice as wide as memory interface Our memory interface is 64 bits Our interface to memory controller will be 128 bits If this is not what we want we can probably look at replacing Mux with a FIFO at interface between us and Controller Next slide . . .

DDR SDRAM Controller Interface Signals User Logic SDRAM dip1 : in std_logic; dip3 : in std_logic; rst_dqs_div_in : in std_logic; rst_dqs_div_out : out std_logic; reset_in : in std_logic; user_input_data : in std_logic_vector(127 downto 0); user_output_data : out std_logic_vector(127 downto 0):=(OTHERS => 'Z'); user_data_valid : out std_logic; user_input_address : in std_logic_vector(((row_address_p + column_address_p)- 1) downto 0); user_bank_address : in std_logic_vector((bank_address_p-1) downto 0); user_config_register : in std_logic_vector(9 downto 0); user_command_register : in std_logic_vector(2 downto 0); user_cmd_ack : out std_logic; burst_done : in std_logic; init_val : out std_logic; ar_done : out std_logic; ddr_dqs : inout std_logic_vector(7 downto 0); ddr_dq : inout std_logic_vector(63 downto 0):= (OTHERS => 'Z'); ddr_cke : out std_logic; ddr_csb : out std_logic; ddr_rasb : out std_logic; ddr_casb : out std_logic; ddr_web : out std_logic; ddr_dm : out std_logic_vector(7 downto 0); ddr_ba : out std_logic_vector((bank_address_p-1) downto 0); ddr_address : out std_logic_vector((row_address_p-1) downto 0); ddr1_clk0 : out std_logic; ddr1_clk0b : out std_logic; ddr1_clk1 : out std_logic; ddr1_clk1b : out std_logic; ddr1_clk2 : out std_logic; ddr1_clk2b : out std_logic; clk_int : in std_logic; clk90_int : in std_logic; delay_sel_val : in std_logic_vector(4 downto 0); sys_rst : in std_logic; sys_rst90 : in std_logic; sys_rst180 : in std_logic; sys_rst270 : in std_logic

Xilinx Reference Design: DDR SDRAM

Xilinx Reference Design: DDR SDRAM User Logic x8 x8 This interface is twice as wide as memory interface

Xilinx Reference Design: DDR SDRAM

Xilinx Reference Design: DDR SDRAM

Xilinx Reference Design: DDR SDRAM Transfer_done signals used as Clock Enable for FIFOS in second stage

Xilinx Reference Design: QDRII SRAM XAPP770c Xilinx Reference design for Local clocking for QDRII 3834 Lines of Verilog Physical layer of Read data capture just like in DDR SDRAM Reference design. Four CLB flip-flops per data bit in stage 1 Fifos in stage 2 …

Xilinx Reference Design: QDR SRAM

Xilinx Reference Design: QDRII SRAM

Xilinx Reference Design: QDRII SRAM