Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Interfaces for the PPC-H Card (Still a work in progress…)

Similar presentations


Presentation on theme: "Memory Interfaces for the PPC-H Card (Still a work in progress…)"— Presentation transcript:

1 Memory Interfaces for the PPC-H Card (Still a work in progress…)
John DeHart Applied Research Laboratory Computer Science and Engineering Department

2 Objectives Learn about how DDR SDRAM and QDR SRAM really work.
Understand the Memory interfaces on the PPH Chip so we can get the board level design right Understand the Reference Designs available from Xilinx Understand what will need to change in the RAD circuit to get it to run on the PPH Chip for the port of the IPv4 router SDRAM changes SRAM changes Confirm that the PPH Chip Design is supportable Bank differences top/bottom vs. left/right SSO (Simultaneous Switching Output) restrictions per bank

3 ATCA – Common Substrate Base Card
64Mx64 (each) 2x200 MHz 2xDDR400 DRAM PAROLI/RX 512Kx36 (each) 2x200 MHz PAROLI/TX 3xQDRII SRAM PP/H v2Pro P100 250Kx MHz TCAM . . . PPH-Serial 16x625MHz (2*18 Diff Pairs) (16 Data, 1 Clock, 1 Frame/Flow_control Sys Flash Xilinx ACE Substrate v2Pro P100 Ethernet 100/BT GPIO/SelectMap (15 single-ended signals) 1xDDR DRAM Power

4 Hardware Packet Processor: DDR SDRAM
64Mx64 (each) 2x200 MHz 2xDDR400 SDRAM 512Kx36 (each) 2x200 MHz 3xQDRII SRAM PP/H v2Pro P100 250Kx MHz TCAM

5 DDR SDRAM MICRON MT16VDDF6464H Datasheet location:
/project/techX/DataSheets/Micron/DDR_1GB3200_09_2004.pdf 512MB: 64Meg x 64 bits 4 Banks 200 MHz differential clock DDR: Double Data Rate Access on rising and falling edge of clock 200-pin SODIMM 16 devices per module Read or CAS Latency (CL) Depending on clock freq, can be set to 2, 2.5 or 3 clock cycles 75 MHz <= f <= 133MHz: CL=2 75 MHz <= f <= 167MHz: CL=2.5 133 MHz <= f <= 200MHz: CL=3 If we operate at 200MHz, CL=3 Other vendors, other options from Micron might have different CL.

6 DDR SDRAM (continued) MICRON MT16VDDF6464H Bursts
lengths of 2, 4, or 8 Types: sequential or interleaved Data retrieved is same, just order of words is different. Bidirectional Data Strobe (DQS) Intermittent (not free running) i.e. only present when data is present Edge-aligned with data for reads Center-aligned with data for writes Auto Refresh and Self Refresh modes available Auto Refresh: The normal mode of operating, like CAS#-Before-RAS# refresh Addressing generated by internal refresh controller Average refresh interval: us (maximum) What exactly is an “average maximum”? Self Refresh Used to retain data in SDRAM even if the rest of the system is powered down. No external clocking required while in Self Refresh Not what we want to use…

7 DDR SDRAM (continued) MICRON MT16VDDF6464H Auto precharge option
Performs a PRECHARGE command automatically upon completion of a READ or WRITE burst Enabled or disabled with each individual READ or WRITE burst Via A10 pin A precharge (either auto or explicit PRECHARGE) must be issued before opening a different row in the same bank. ACTIVE command activates a specific row in a bank That row remains active for accesses until a PRECHARGE is issued for that bank. Then a different row in that bank may be activated and accessed.

8 DDR SDRAM (continued)

9 DDR SDRAM (continued)

10 DDR SDRAM SODIMM Module

11 DDR SDRAM Interface Signals
DQ[0:63] FPGA Data SDRAM DQS[0:7] Data Strobes DM[0:7] Data Write Mask A[0:12] Address BA[0:1] Bank Address S[0:1] Chip Selects SA[0:2] Presence-Detect Addr SDA Serial Presence Detect data SCL Serial Clock for Presence Detect CKE[0:1] Clock Enable CK0/CK0# CK1/CK1# Clocks WE, CAS, RAS Command Inputs Total of 111 Signals

12 DDR SDRAM: Questions Why two clocks? CK0/CK0# and CK1/CK1#
Do we just tie them to the same clock? 8 Data strobes? We certainly have to provide 8 for writes. Do we use all 8 for reads? We could probably get away with using one but there may not be any point to it.

13 SDRAM In FPX/NSP Implementation
FPX/NSP PSM Implementation It looks like our PSM implementation in the NSP/FPX does not use auto precharge (it is available in the SDRAM on the FPX). It does an explicit PRECHARGE command after each READ and WRITE. Refresh: Every 200 clock ticks it sets up to do a refresh After doing the refresh it waits 6 ticks to go back to operational state Clock Rate: 62.5 MHz, Period 16ns 200 clock ticks  3.2 us 6 clock ticks  96ns Data width Address width Fully synchronous SDR vs Source Synchronous DDR Single-ended clock signal vs. Differential clock signals Initialization?

14 SDRAM State Machine in FPX/PSM
case StateMc0.PresentState is when e_Start=> if StateMc0.Run = '1' then StateMc0.NextState <= e_Act; end if; when e_Act => StateMc0.NextState <= e_Nop0; when e_Nop0 => if StateMc0.OpType = '0' then StateMc0.NextState <= e_Rd; else StateMc0.NextState <= e_Wr; when e_Rd => StateMc0.NextState <= e_Nop1; when e_Wr => StateMc0.NextState <= e_Nop1; when e_Nop1 => if StateMc0.BurstLength /= " " then StateMc0.NextState <= e_Nop2; when e_Nop2 => StateMc0.NextState <= e_Nop3; when e_Nop3 => StateMc0.NextState <= e_Nop4; when e_Nop4 => StateMc0.NextState <= e_Pch; when e_Pch => StateMc0.NextState <= e_Start; when others => StateMc0.NextState <= e_Start; end case;

15 Hardware Packet Processor: QDRII SRAM
64Mx64 (each) 2x200 MHz 2xDDR400 SDRAM 512Kx36 (each) 2x200 MHz 3xQDRII SRAM PP/H v2Pro P100 250Kx MHz TCAM

16 QDRII SRAM SAMSUNG K7R163682B Datasheet location: 2MB: 512K x 36 bits
/project/techX/DataSheets/Samsung/QDR_k7r16xx82b_rev30.pdf 2MB: 512K x 36 bits 200MHz differential clock Clock cycle time: -25: min 4.0 ns, max 6.3 ns (250 MHz – 159 MHz) -20: min 5.0 ns, max 7.88 ns (200 MHz – 127 MHz) -16 : min 6.0 ns, max 8.4 ns (167 MHz – 119 MHz) QDR II DDR with separate Read and Write data bus Common Address bus for Read and Write 165 pin FBGA Package Fixed two word bursts Support for Byte write operations

17 QDRII SRAM SAMSUNG K7R163682B Clocks: K/K#: Input Clock
Address, data inputs and all control signals are synchronized to K or K# C/C#: Input Clock for Output Data Normally: data outputs synchronized to output clocks (C and C#) If C and C# tied high, data outputs synchronized to K and K# We will do this. Any reason not to? All reference designs we have see do it. Tie them high on board or in FPGA? CQ/CQ#: Output Echo Clock Read data are referenced to echo clock outputs (CQ or CQ#). CQ/CQ# High to Data Valid (tCQHQV) (ns): 0.30(-25) (-20) (-16) CQ/CQ# High to Output Hold (tCQHQX)(ns): -0.30(-25) (-20) (-16)

18 QDR SRAM

19 QDR SRAM D0[0:35] FPGA Data Input (Write) QDR SRAM Q0[0:35]
Data Output (Read) SA[0:20] Address (with expansion addr bits) BW[0:3] Byte Write Control R#, W# Read and Write Control ZQ Output Driver Impedance Control Board connect but not to FPGA? VREF[0:1] Input Reference Voltage Doff# DLL Disable JTAG[0:3) JTAG Test Signals K, K# Input Clock C, C# Input Clock for Output Data CQ, CQ# Output Echo Clock Total of 105(+8) Signals

20 QDRII SRAM Operations Reads Writes
Initiated by activating R# at rising edge of positive input clock K. First pipelined data is transferred out of device triggered by C# clock following next K# clock rising edge Next burst data is triggered by rising edge of following C clock rising edge. Writes Initiated by activating W# at rising edge of positive input clock K Address is presented with following K# clock rising edge. First data is registered with same rising edge of K clock as W# Second data word is registered with following K# clock rising edge

21 QDR SRAM

22 QDR SRAM

23 QDR SRAM: Questions Three sets of clocks: K/K#, C/C#, CQ/CQ#
Which do we need/want to use? Reference design xapp750/xapp770c.

24 SRAM In FPX/NSP Implementation
1MB (8Mb), 256Kx36b Operates up to 166 MHz We use it at 62.5 MHz Two uses of SRAM, each with a separate interface CARL QM Burst operations available with SRAM Do we use them? Controlled by ADV/LD# signal Write operations: Assert W# and give Address on clock cycle 1 Two clock cycles later give data Read operations: Assert R and give Address on clock cycle 1 Two clock cycles later, data appears.

25 SRAM Differences for Port
SDR vs. QDR Asynchronous interface Core clock: 125 MHz Memory clock: 200 MHz DDR Burst operations Do we use them in current implementation? Write operations: Timing of data and address presentation are different Read operations: Use of Data Strobes instead of synchronous clock Each new SRAM is twice the size of old, same width

26 PPH Chip Memory Interfaces
Our PPH Chip design has 5 memory interfaces: 2 DDR 200 MHz 3 QDR SRAM @ 200 MHz Each of the above consumes a whole bank 4 banks available for these 5 memory interfaces The Interface to the Substrate (16b x 625MHz) will probably have the same bank constraint. 4 banks for 6 interfaces: Does not fit! We are still awaiting confirmation from Xilinx and information about the details of why these banks support higher speeds than other banks. Options if we can do all 5 at 200 MHz and the Substrate interface I still need to revisit the clock resources issues to see if any of these are doable. But, I believe we can. Can we run both of the SDRAM interfaces at 166 MHz? Can we run one of the SDRAM and one of the SRAM interfaces at 166 MHz?

27 Xilinx Reference Design: DDR SDRAM
XAPP678c: Xilinx Reference design for Data Capture using CLB FF For 72 bit wide data path to Memory, 144 bit wide data path to User Logic 34 VHDL files 24300 lines of VHDL (verilog version also available) 12400 of that is a chip level file with architectures, components and black boxes. So, more like about lines of VHDL Still looking through it all XMIL 007: Tool from Xilinx for generating the DDR memory interface. Generates a controller that uses methods from XAPP678c Local clock inversion DQS delay Etc. Puts it in the bank(s) you select. For designs with > 167MHz, banks 2, 3, 6 or 7 must be used! Produces a .ucf file to place and constrain things that need to be Also seems to work for QDRII. Also has the bank limitations for > 167MHz Major Challenge is the Read operation Data Strobes from DDR SDRAM: not free-running Not phase aligned with internal controller clock Edge aligned, so they need to be shifted/delayed

28 Xilinx Reference Design: DDR SDRAM
8-bit example shown in docs Implemented as two stages: Stage 1: Four clock-enabled CLB flip-flops per data bit Clocked by delayed version of data strobe (delayed_dqs) Clock-enable inputs generated by dividing strobe by 2 Data valid window for next stage increases from half a clock cycle to two clock cycles Stage 2: Four FIFOs Clocked by internal controller clock (200MHz) This is still in SDRAM Controller clock region not our internal logic Interface to user logic is twice as wide as memory interface Our memory interface is 64 bits Our interface to memory controller will be 128 bits If this is not what we want we can probably look at replacing Mux with a FIFO at interface between us and Controller Next slide . . .

29 DDR SDRAM Controller Interface Signals
User Logic SDRAM dip : in std_logic; dip : in std_logic; rst_dqs_div_in : in std_logic; rst_dqs_div_out : out std_logic; reset_in : in std_logic; user_input_data : in std_logic_vector(127 downto 0); user_output_data : out std_logic_vector(127 downto 0):=(OTHERS => 'Z'); user_data_valid : out std_logic; user_input_address : in std_logic_vector(((row_address_p + column_address_p)- 1) downto 0); user_bank_address : in std_logic_vector((bank_address_p-1) downto 0); user_config_register : in std_logic_vector(9 downto 0); user_command_register : in std_logic_vector(2 downto 0); user_cmd_ack : out std_logic; burst_done : in std_logic; init_val : out std_logic; ar_done : out std_logic; ddr_dqs : inout std_logic_vector(7 downto 0); ddr_dq : inout std_logic_vector(63 downto 0):= (OTHERS => 'Z'); ddr_cke : out std_logic; ddr_csb : out std_logic; ddr_rasb : out std_logic; ddr_casb : out std_logic; ddr_web : out std_logic; ddr_dm : out std_logic_vector(7 downto 0); ddr_ba : out std_logic_vector((bank_address_p-1) downto 0); ddr_address : out std_logic_vector((row_address_p-1) downto 0); ddr1_clk : out std_logic; ddr1_clk0b : out std_logic; ddr1_clk : out std_logic; ddr1_clk1b : out std_logic; ddr1_clk : out std_logic; ddr1_clk2b : out std_logic; clk_int : in std_logic; clk90_int : in std_logic; delay_sel_val : in std_logic_vector(4 downto 0); sys_rst : in std_logic; sys_rst : in std_logic; sys_rst : in std_logic; sys_rst : in std_logic

30 Xilinx Reference Design: DDR SDRAM

31 Xilinx Reference Design: DDR SDRAM
User Logic x8 x8 This interface is twice as wide as memory interface

32 Xilinx Reference Design: DDR SDRAM

33 Xilinx Reference Design: DDR SDRAM

34 Xilinx Reference Design: DDR SDRAM
Transfer_done signals used as Clock Enable for FIFOS in second stage

35 Xilinx Reference Design: QDRII SRAM
XAPP770c Xilinx Reference design for Local clocking for QDRII 3834 Lines of Verilog Physical layer of Read data capture just like in DDR SDRAM Reference design. Four CLB flip-flops per data bit in stage 1 Fifos in stage 2

36 Xilinx Reference Design: QDR SRAM

37 Xilinx Reference Design: QDRII SRAM

38 Xilinx Reference Design: QDRII SRAM


Download ppt "Memory Interfaces for the PPC-H Card (Still a work in progress…)"

Similar presentations


Ads by Google