ECE 448 – FPGA and ASIC Design with VHDL Lecture 10 Memories (RAM/ROM)
2 Outline Memory Distributed RAM Block RAM Instantiation versus Inference VHDL Inference Code Distributed RAM Block RAM ROM VHDL Instantiation Code
3 Memory Types
4 Memory RAMROM Single portDual port With asynchronous read With synchronous read Memory
5 FPGA Distributed Memory
6 COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE CLB Slice
7 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. ( Xilinx Multipurpose LUT
8 RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 or Distributed RAM CLB LUT configurable as Distributed RAM An LUT equals 16x1 RAM Cascade LUTs to increase RAM size Synchronous write Asynchronous read Can create a synchronous read by using extra flip-flops Naturally, distributed RAM read is asynchronous Two LUTs can make 32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM
9 FPGA Block RAM
10 Block RAM Spartan-3 Dual-Port Block RAM Port A Port B Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 104 memory blocks 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)
11 RAM Blocks and Multipliers in Xilinx FPGAs The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (
12 Spartan-3 Block RAM Amounts
13 Block RAM can have various configurations (port aspect ratios) 0 16, , , k x 1 8k x 2 4k x 4 2k x (8+1) 1024 x (16+2)
14 Block RAM Port Aspect Ratios
15 Single-Port Block RAM
16 Dual-Port Block RAM
17 RAMB4_S16_S8 Port A Out 18-Bit Width Port B In 2k-Bit Depth Port A In 1K-Bit Depth Port B Out 9-Bit Width DOA[17:0] DOB[8:0] WEA ENA RSTA ADDRA[9:0] CLKA DIA[17:0] WEB ENB RSTB ADDRB[10:0] CLKB DIB[8:0] Dual-Port Bus Flexibility Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic
18 0, ADDR[12:0] 1, ADDR[12:0] RAMB4_S1_S1 Port B Out 1-Bit Width DOA[0] DOB[0] WEA ENA RSTA ADDRA[12:0] CLKA DIA[0] WEB ENB RSTB ADDRB[12:0] CLKB DIB[0] Port B In 8K-Bit Depth Port A Out 1-Bit Width Port A In 8K-Bit Depth Two Independent Single-Port RAMs To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High Added advantage of True Dual-Port No wasted RAM Bits Can split a Dual-Port 16K RAM into two Single-Port 8K RAM Simultaneous independent access to each RAM
19 Inference vs. Instantiation
20
21
22 Generic Inferred RAM
23 Distributed versus Block RAM Inference Examples: 1.Distributed RAM with asynchronous read 2.Distributed RAM with "false" synchronous read 3.Block RAM with synchronous read 4.Distributed dual-port RAM with asynchronous read More excellent RAM examples from XST Coding Guidelines: (Click on RAMs)
24 Distributed RAM with asynchronous read
25 Distributed RAM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0)); end raminfr;
26 Distributed RAM with asynchronous read architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end process; do <= RAM(conv_integer(unsigned(a))); end behavioral;
27 Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use RAM16X4S 8 uses I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Single Port Rams (RAM16X4S): 8 Global Clock Buffers: 1 of 8 (12%) Mapping Summary: Total LUTs: 32 (2%)
28 Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of % Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 32 out of 1,536 2% Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of % Number of GCLKs: 1 out of 8 12%
29 Distributed RAM with "false" synchronous read
30 Distributed RAM with "false" synchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0)); end raminfr;
31 Distributed RAM with "false" synchronous read architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; do <= RAM(conv_integer(unsigned(a))); end if; end process; end behavioral;
32 Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: FD 32 uses GND 1 use RAM16X4S 8 uses I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 32 (2%) RAM/ROM usage summary Single Port Rams (RAM16X4S): 8 Global Clock Buffers: 1 of 8 (12%) Mapping Summary: Total LUTs: 32 (2%)
33 Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 32 out of 1,536 2% Logic Distribution: Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of % Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 32 out of 1,536 2% Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of % Number of GCLKs: 1 out of 8 12% Total equivalent gate count for design: 4,355
34 Block RAM with synchronous read (read through)
35 Block RAM with synchronous read (read through) LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; library synplify; -- XST does not need this entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0)); end raminfr;
36 Block RAM with synchronous read (read through) cont'd architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); signal RAM : ram_type; signal read_a : std_logic_vector(addr_bits-1 downto 0); attribute syn_ramstyle : string; -- XST does not need this attribute syn_ramstyle of RAM : signal is "block_ram"; -- XST does not need this begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; read_a <= a; end if; end process; do <= RAM(conv_integer(unsigned(read_a))); end behavioral;
37 Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use RAMB16_S36 1 use VCC 1 use I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Block Rams : 1 of 4 (25%) Global Clock Buffers: 1 of 8 (12%) Mapping Summary: Total LUTs: 0 (0%)
38 Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 69 out of % Number of Block RAMs: 1 out of 4 25% Number of GCLKs: 1 out of 8 12%
39 Distributed dual-port RAM with asynchronous read
40 Distributed dual-port RAM with asynchronous read library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all; entity raminfr is generic ( bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); dpra : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); spo : out std_logic_vector(bits-1 downto 0); dpo : out std_logic_vector(bits-1 downto 0)); end raminfr;
41 Distributed dual-port RAM with asynchronous read architecture syn of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end process; spo <= RAM(conv_integer(unsigned(a))); dpo <= RAM(conv_integer(unsigned(dpra))); end syn;
42 Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use I/O ports: 104 I/O primitives: 103 IBUF 39 uses OBUF 64 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Dual Port Rams (RAM16X1D): 32 Global Clock Buffers: 1 of 8 (12%) Mapping Summary: Total LUTs: 64 (4%)
43 Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of occupied Slices: 32 out of 768 4% Number of Slices containing only related logic: 32 out of % Number of Slices containing unrelated logic: 0 out of 32 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 64 out of 1,536 4% Number used for Dual Port RAMs: 64 (Two LUTs used per Dual Port RAM) Number of bonded IOBs: 104 out of % Number of GCLKs: 1 out of 8 12%
44 Specification of memory types recognized by Synplify Pro attribute syn_ramstyle : string; attribute syn_ramstyle of memory : signal is "block_ram"; attribute syn_ramstyle : string; attribute syn_ramstyle of memory : signal is “select_ram"; LUT-based Distributed Memory: Block RAM Memory: SIGNAL memory : vector_array;
45 Generic Inferred ROM
46 Distributed dual-port RAM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; entity rominfr is generic ( bits : integer := 10; -- number of bits per ROM word addr_bits : integer := 3); -- 2^addr_bits = number of words in ROM port (a : in std_logic_vector(addr_bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0)); end rominfr;
47 Distributed dual-port RAM with asynchronous read architecture behavioral of rominfr is type rom_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); constant ROM : rom_type := (" ", " ", " ", " ", " ", " ", " ", " "); begin do <= ROM(conv_integer(unsigned(a))); end behavioral;
48 Report from Synthesis Resource Usage Report for rominfr Mapping to part: xc3s50pq208-5 Cell usage: VCC 1 use LUT2 2 uses LUT3 7 uses I/O ports: 13 I/O primitives: 13 IBUF 3 uses OBUF 10 uses I/O Register bits: 0 Register bits not including I/Os: 0 (0%) Mapping Summary: Total LUTs: 9 (0%)
49 Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of 4 input LUTs: 9 out of 1,536 1% Logic Distribution: Number of occupied Slices: 5 out of 768 1% Number of Slices containing only related logic: 5 out of 5 100% Number of Slices containing unrelated logic: 0 out of 5 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 9 out of 1,536 1% Number of bonded IOBs: 13 out of %
50 FPGA specific memories (Instantiation)
51 RAM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X1_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC ); end RAM_16X1_DISTRIBUTED;
52 RAM 16x1 (2) architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; ignored during simulation attribute INIT : string; attribute INIT of RAM_16x1s_1: label is "0000"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; -- note std_ulogic not std_logic A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;
53 RAM 16x1 (3) begin RAM_16x1s_1: ram16x1s generic map (INIT => X"0000") port map (O => DATA_OUT, A0 => ADDR(0), A1 => ADDR(1), A2 => ADDR(2), A3 => ADDR(3), D => DATA_IN, WCLK => CLK, WE => WE ); end RAM_16X1_DISTRIBUTED_STRUCTURAL;
54 RAM 16x8 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X8_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0) ); end RAM_16X8_DISTRIBUTED;
55 RAM 16x8 (2) architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; ignored during simulation attribute INIT : string; --attribute INIT of RAM_16x1s_1: label is "0000"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;
56 RAM 16x8 (3) begin GENERATE_MEMORY: for I in 0 to 7 generate RAM_16x1_S_1: ram16x1s generic map (INIT => X"0000") port map (O => DATA_OUT(I), A0 => ADDR(0), A1 => ADDR(1), A2 => ADDR(2), A3 => ADDR(3), D => DATA_IN(I), WCLK => CLK, WE => WE ); end generate; end RAM_16X8_DISTRIBUTED_STRUCTURAL;
57 ROM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC ); end ROM_16X1_DISTRIBUTED;
58 ROM 16x1 (2) architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; ignored during simulation attribute INIT : string; attribute INIT of rom16x1s_1: label is "F0C1"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; signal Low : std_ulogic := '0';
59 ROM 16x1 (3) begin rom16x1s_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>Low, WCLK=>Low, WE=>Low ); end ROM_16X1_DISTRIBUTED_STRUCTURAL;
60 Block RAM library components ComponentData CellsParity CellsAddress BusData BusParity Bus DepthWidthDepthWidth RAMB16_S (13:0)(0:0)- RAMB16_S (12:0)(1:0)- RAMB16_S (11:0)(3:0)- RAMB16_S (10:0)(7:0)(0:0) RAMB16_S (9:0)(15:0)(1:0) RAMB16_S (8:0)(31:0)(3:0)
61 Component declaration for BRAM (1) -- Component Declaration for RAMB16_{S1 | S2 | S4} -- Should be placed after architecture statement but before begin keyword component RAMB16_{S1 | S2 | S4} -- synthesis translate_off generic ( INIT : bit_vector := X"0"; INIT_00 : bit_vector := X" "; ………………………………… INIT_3F : bit_vector := X" "; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"); -- synthesis translate_on port (DO : out STD_LOGIC_VECTOR (0 downto 0) ADDR : in STD_LOGIC_VECTOR (13 downto 0); CLK : in STD_ULOGIC; DI : in STD_LOGIC_VECTOR (0 downto 0); EN : in STD_ULOGIC; SSR : in STD_ULOGIC; WE : in STD_ULOGIC); end component;
62 Genaral template of BRAM instantiation (1) -- Component Attribute Specification for RAMB16_{S1 | S2 | S4} -- Should be placed after architecture declaration but before the begin keyword -- Put attributes, if necessary -- Component Instantiation for RAMB16_{S1 | S2 | S4} -- Should be placed in architecture after the begin keyword RAMB16_{S1 | S2 | S4}_INSTANCE_NAME : RAMB16_S1 -- synthesis translate_off generic map ( INIT => bit_value, INIT_00 => vector_value, INIT_01 => vector_value, …………………………….. INIT_3F => vector_value, SRVAL=> bit_value, WRITE_MODE => user_WRITE_MODE) -- synopsys translate_on port map (DO => user_DO, ADDR => user_ADDR, CLK => user_CLK, DI => user_DI, EN => user_EN, SSR => user_SSR, WE => user_WE);
63 INIT_00 : BIT_VECTOR := X"014A0C0F09170A A A01C5020A A "; INIT_01 : BIT_VECTOR := X" A A A A0C0F03AA "; INIT_02 : BIT_VECTOR := X" "; INIT_03 : BIT_VECTOR := X" "; …………………………………………………………………………………………………………………………………… INIT_0F : BIT_VECTOR := X" ") 0000 F F F F F FE 0000 FF INIT_0F ADDRESS AA 12 0C0F A E F INIT_01 ADDRESS A C0F 0E 014A 0F INIT_00 ADDRESS Addresses are shown in red and data corresponding to the same memory location is shown in black ADDRESS DATA Initializing Block RAMs 256x16
64 Component declaration for BRAM (2) VHDL Instantiation Template for RAMB16_S9, S18 and S36 -- Component Declaration for RAMB16_{S9 | S18 | S36} component RAMB16_{S9 | S18 | S36} -- synthesis translate_off generic ( INIT : bit_vector := X"0"; INIT_00 : bit_vector := X" "; INIT_3E : bit_vector := X" "; INIT_3F : bit_vector := X" "; INITP_00 : bit_vector := X" "; INITP_07 : bit_vector := X" "; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"; ); -- synthesis translate_on port (DO : out STD_LOGIC_VECTOR (0 downto 0); DOP : out STD_LOGIC_VECTOR (1 downto 0); ADDR : in STD_LOGIC_VECTOR (13 downto 0); CLK : in STD_ULOGIC; DI : in STD_LOGIC_VECTOR (0 downto 0); DIP : in STD_LOGIC_VECTOR (0 downto 0); EN : in STD_ULOGIC; SSR : in STD_ULOGIC; WE : in STD_ULOGIC); end component;
65 -- Component Attribute Specification for RAMB16_{S9 | S18 | S36} -- Component Instantiation for RAMB16_{S9 | S18 | S36} -- Should be placed in architecture after the begin keyword RAMB16_{S9 | S18 | S36}_INSTANCE_NAME : RAMB16_S1 -- synthesis translate_off generic map ( INIT => bit_value, INIT_00 => vector_value, INIT_3F => vector_value, INITP_00 => vector_value, …………… INITP_07 => vector_value SRVAL => bit_value, WRITE_MODE => user_WRITE_MODE) -- synopsys translate_on port map (DO => user_DO, DOP => user_DOP, ADDR => user_ADDR, CLK => user_CLK, DI => user_DI, DIP => user_DIP, EN => user_EN, SSR => user_SSR, WE => user_WE); Genaral template of BRAM instantiation (2)
66 Block RAM Waveforms – WRITE_FIRST
67 Block RAM Waveforms – READ_FIRST
68 Block RAM Waveforms – NO_CHANGE