Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Mason University FPGA Memories ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 10.

Similar presentations


Presentation on theme: "George Mason University FPGA Memories ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 10."— Presentation transcript:

1 George Mason University FPGA Memories ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 10

2 2 Recommended reading Spartan-6 FPGA Block RAM Resources: User Guide Google search: UG383 Spartan-6 FPGA Configurable Logic Block: User Guide Google search: UG384 Xilinx FPGA Embedded Memory Advantages: White Paper Google search: WP360

3 3 Recommended reading XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices Chapter 7, HDL Coding Techniques Sections: RAM HDL Coding Techniques ROM HDL Coding Techniques ISE In-Depth Tutorial, Section: Creating a CORE Generator Tool Module

4 4 Memory Types

5 5 Memory RAMROM Single portDual port With asynchronous read With synchronous read Memory

6 6 Memory Types specific to Xilinx FPGAs Memory Distributed (MLUT-based) Block RAM-based (BRAM-based) InferredInstantiated Memory Manually Using CORE Generator

7 7 FPGA Distributed Memory

8 8 Location of Distributed RAM Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) DSP units RAM blocks Logic resources (#Logic resources, #Multipliers/DSP units, #RAM_blocks) Logic resources (CLB slices)

9 9 Three Different Types of Slices 50%25%

10 10 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) Spartan-6 Multipurpose LUT (MLUT) 64 x 1 ROM (logic) 64 x 1 RAM 32-bit SR

11 11 Single-port 64 x 1-bit RAM

12 12 Memories Built of Neighboring MLUTs Single-port 128 x 1-bit RAM: RAM128x1S Dual-port 64 x 1-bit RAM : RAM64x1D Memories built of 2 MLUTs: Memories built of 4 MLUTs: Single-port 256 x 1-bit RAM: RAM256x1S Dual-port 128 x 1-bit RAM: RAM128x1D Quad-port 64 x 1-bit RAM:RAM64x1Q Simple-dual-port 64 x 3-bit RAM:RAM64x3SDP (one address for read, one address for write)

13 13 Dual-port 64 x 1 RAM Dual-port 64 x 1-bit RAM : 64x1D Single-port 128 x 1-bit RAM: 128x1S

14 14 Total Size of Distributed RAM

15 15 FPGA Block RAM

16 16 Location of Block RAMs Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) DSP units RAM blocks Logic resources (#Logic resources, #Multipliers/DSP units, #RAM_blocks) Logic resources (CLB slices)

17 17 Spartan-6 Block RAM Amounts

18 18 Block RAM can have various configurations (port aspect ratios) 0 16,383 1 4,095 4 0 8,191 2 0 2047 8+1 0 1023 16+2 0 16k x 1 8k x 2 4k x 4 2k x (8+1) 1024 x (16+2)

19 19

20 20

21 21 Block RAM Port Aspect Ratios

22 22 Block RAM Interface

23 23 Block RAM Ports

24 24 Block RAM Waveforms – READ_FIRST mode

25 25 Block RAM with synchronous read in Read-First Mode CE

26 26 Block RAM Waveforms – WRITE_FIRST mode

27 27 Block RAM Waveforms – NO_CHANGE mode

28 28 Features of Block RAMs in Spartan-6 FPGAs

29 29 Inference vs. Instantiation

30 30

31 31 Using CORE Generator

32 32 CORE Generator

33 33 CORE Generator

34 34 Generic Inferred ROM

35 35 Distributed ROM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; Entity ROM is generic ( w : integer := 12; -- number of bits per ROM word r : integer := 3); -- 2^r = number of words in ROM port (addr : in std_logic_vector(r-1 downto 0); dout : out std_logic_vector(w-1 downto 0)); end ROM;

36 36 Distributed ROM with asynchronous read architecture behavioral of rominfr is type rom_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); constant ROM_array : rom_type := ("000011000100", "010011010010", "010011011011", "011011000010", "000011110001", "011111010110", "010011010000", "111110011111"); begin dout <= ROM_array(conv_integer(unsigned(addr))); end behavioral;

37 37 Distributed ROM with asynchronous read architecture behavioral of rominfr is type rom_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); constant ROM_array : rom_type := (X"0C4", X"4D2", X"4DB", X"6C2", X"0F1", X"7D6", X"4D0", X"F9F"); begin dout <= ROM_array(conv_integer(unsigned(addr))); end behavioral;

38 38 Generic Inferred RAM

39 39 Distributed versus Block RAM Inference Examples: 1.Distributed single-port RAM with asynchronous read 2.Distributed dual-port RAM with asynchronous read 1.Block RAM with synchronous read (no version with asynchronous read!) More excellent RAM examples from XST Coding Guidelines.

40 40 Distributed single-port RAM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; entity raminfr is generic ( w : integer := 32; -- number of bits per RAM word r : integer := 6); -- 2^r = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(r-1 downto 0); di : in std_logic_vector(w-1 downto 0); do : out std_logic_vector(w-1 downto 0)); end raminfr;

41 41 Distributed single-port RAM with asynchronous read architecture behavioral of raminfr is type ram_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end process; do <= RAM(conv_integer(unsigned(a))); end behavioral;

42 42 Distributed dual-port RAM with asynchronous read library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all; entity raminfr is generic ( w : integer := 32; -- number of bits per RAM word r : integer := 6); -- 2^r = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(r-1 downto 0); dpra : in std_logic_vector(r-1 downto 0); di : in std_logic_vector(w-1 downto 0); spo : out std_logic_vector(w-1 downto 0); dpo : out std_logic_vector(w-1 downto 0)); end raminfr;

43 43 Distributed dual-port RAM with asynchronous read architecture syn of raminfr is type ram_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end process; spo <= RAM(conv_integer(unsigned(a))); dpo <= RAM(conv_integer(unsigned(dpra))); end syn;

44 44 Block RAM with synchronous read in Read-First Mode

45 45 Block RAM Waveforms – READ_FIRST mode

46 46 Block RAM with synchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; entity raminfr is generic ( w : integer := 32; -- number of bits per RAM word r : integer := 9); -- 2^r = number of words in RAM port (clk : in std_logic; we : in std_logic; en : in std_logic; addr : in std_logic_vector(r-1 downto 0); di : in std_logic_vector(w-1 downto 0); do : out std_logic_vector(w-1 downto 0)); end raminfr;

47 47 Block RAM with synchronous read Read-First Mode - cont'd architecture behavioral of raminfr is type ram_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (en = '1') then do <= RAM(conv_integer(unsigned(addr))); if (we = '1') then RAM(conv_integer(unsigned(addr))) <= di; end if; end process; end behavioral;

48 48 Block RAM Waveforms – WRITE_FIRST mode

49 49 Block RAM with synchronous read Write-First Mode - cont'd architecture behavioral of raminfr is type ram_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (en = '1') then if (we = '1') then RAM(conv_integer(unsigned(addr))) <= di; do <= di; else do <= RAM(conv_integer(unsigned(addr))); end if; end process; end behavioral;

50 50 Block RAM Waveforms – NO_CHANGE mode

51 51 Block RAM with synchronous read No-Change Mode - cont'd architecture behavioral of raminfr is type ram_type is array (2**r-1 downto 0) of std_logic_vector (w-1 downto 0); signal RAM : ram_type; begin process (clk) begin if (clk'event and clk = '1') then if (en = '1') then if (we = '1') then RAM(conv_integer(unsigned(addr))) <= di; else do <= RAM(conv_integer(unsigned(addr))); end if; end process; end behavioral;

52 52 Criteria for Implementing Inferred RAM in BRAMs

53 George Mason University ATHENa

54 54 Resources ATHENa website http://cryptography.gmu.edu/athena

55 55 ATHENa – Automated Tool for Hardware EvaluatioN Supported in part by the National Institute of Standards & Technology (NIST)

56 ATHENa Team Venkata “Vinny” MS CpE student Ekawat “Ice” PhD CpE student Marcin PhD ECE student Rajesh PhD ECE student Michal PhD exchange student from Slovakia John MS CpE student

57 ATHENa – A utomated T ool for H ardware E valuatio N 57 Benchmarking open-source tool, written in Perl, aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms Currently under development at George Mason University. http://cryptography.gmu.edu/athena

58 Why Athena? 58 "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship"

59 ATHENa Server FPGA Synthesis and Implementation Result Summary + Database Entries 2 3 HDL + scripts + configuration files 1 Database Entries Download scripts and configuration files8 Designer 4 HDL + FPGA Tools User Database query Ranking of designs 5 6 Basic Dataflow of ATHENa 0 Interfaces + Testbenches 59

60 60 synthesizable source files configuration files testbench constraint files result summary (user-friendly) result summary (user-friendly) database entries (machine- friendly) database entries (machine- friendly)

61 ATHENa Major Features (1) synthesis, implementation, and timing analysis in batch mode support for devices and tools of multiple FPGA vendors: generation of results for multiple families of FPGAs of a given vendor automated choice of a best-matching device within a given family 61

62 ATHENa Major Features (2) automated verification of designs through simulation in batch mode support for multi-core processing automated extraction and tabulation of results several optimization strategies aimed at finding – optimum options of tools – best target clock frequency – best starting point of placement OR 62

63 63 batch mode of FPGA tools ease of extraction and tabulation of results Text Reports, Excel, CSV (Comma-Separated Values) optimized choice of tool options GMU_optimization_1 strategy Generation of Results Facilitated by ATHENa vs.

64 64 Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools

65 65 Other (Somewhat) Similar Tools ExploreAhead (part of PlanAhead) Design Space Explorer (DSE) Boldport Flow EDAx10 Cloud Platform

66 66 Distinguishing Features of ATHENa Support for multiple tools from multiple vendors Optimization strategies aimed at the best possible performance rather than design closure Extraction and presentation of results Seamless integration with the ATHENa database of results

67 Read the Tutorial! Install the Required Tools (see Tutorial - Part 1 – Tools Installation) Run ATHENa_setup How To Start Working With ATHENa? One-Time Tasks Download and unzip ATHENa http://cryptography.gmu.edu/athena/

68 Modify design.config.txt + possibly other configuration files Run ATHENa How To Start Working With ATHENa? Repetitive Tasks Prepare or modify your source files & source_list.txt

69 design.config.txt Your Design # directory containing synthesizable source files for the project SOURCE_DIR = # A file list containing list of files in the order suitable for synthesis and implementation # low level modules first, top level entity last SOURCE_LIST_FILE = source_list.txt # project name # it will be used in the names of result directories PROJECT_NAME = SHA256 # name of top level entity TOP_LEVEL_ENTITY = sha256 # name of top level architecture TOP_LEVEL_ARCH = rs_arch # name of clock net CLOCK_NET = clk

70 design.config.txt Timing Formulas #formula for latency LATENCY = TCLK*65 #formula for throughput THROUGHPUT = 512/(TCLK*65)

71 design.config.txt Application & Optimization Target # OPTIMIZATION_TARGET = speed | area | balanced OPTIMIZATION_TARGET = speed # OPTIONS = default | user OPTIONS = default # APPLICATION = single_run | exhaustive_search | placement_search | frequency_search | # GMU_Optimization_1 | GMU_Xilinx_optimization_1 APPLICATION = single_run # TRIM_MODE = off | zip | delete TRIM_MODE = zip

72 design.config.txt FPGA Families # commenting the next line removes all families of Xilinx FPGA_VENDOR = xilinx #commenting the next line removes a given family FPGA_FAMILY = spartan3 # FPGA_DEVICES = | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_SYN_FREQ = 120 REQ_IMP_FREQ = 100 MAX_SLICE_UTILIZATION = 0.8 MAX_BRAM_UTILIZATION = 0.8 MAX_MUL_UTILIZATION = 1 MAX_PIN_UTILIZATION = 0.9 END FAMILY END VENDOR

73 design.config.txt FPGA Families # commenting the next line removes all families of Altera FPGA_VENDOR = altera #commenting the next line removes a given family FPGA_FAMILY = Stratix III # FPGA_DEVICES = | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_IMP_FREQ = 120 MAX_LOGIC_UTILIZATION = 0.8 MAX_MEMORY_UTILIZATION = 0.8 MAX_DSP_UTILIZATION = 0 MAX_MUL_UTILIZATION = 0 MAX_PIN_UTILIZATION = 0.8 END FAMILY END VENDOR

74 Library Files device_lib/xilinx_device_lib.txt device_lib/altera_device_lib.txt Files created during ATHENa setup Characterize FPGA families and devices available in the version of Xilinx and Altera tools installed on your computer Currently supported tool versions: – Xilinx WebPACK 9.1, 9.2, 10.1, 11.1, 11.5, 12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3, 14.1, 14.2, 14.3 – Xilinx Design Suite11.1, 12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3, 14.1, 14.2, 14.3 – Altera Quartus II Web Edition8.1, 8.2, 9.0, 9.1, 10.0, 10.1, 11.0, 11.1, 12.0, 12.1 – Altera Quartus II Subscription Edition9.1, 10.0, 10.1, 11.0, 11.1, 12.0, 12.1 In case a library for a given version not available yet, use a library from the closest available version

75 Library Files device_lib/xilinx_device_lib.txt VENDOR = Xilinx #Device, Total Slices, Block RAMs, DSP, Dedicated Multipliers, Maximum User I/O Pins ITEM_ORDER = SLICE, BRAM, DSP, MULT, IO FAMILY = spartan3 xc3s50pq208-5, 768,4, 0, 4, 124 xc3s200ft256-5, 1920, 12, 0, 12, 173 xc3s400fg456-5, 3584, 16, 0, 16, 264 xc3s1000fg676-5, 7680, 24, 0, 24, 391 xc3s1500fg676-5, 13312, 32, 0, 32, 487 END_FAMILY FAMILY = virtex5 xc5vlx30ff676-3, 4800, 32, 32, 0, 400 xc5vfx30tff665-3, 5120, 68, 64, 0, 360 xc5vlx30tff665-3, 4800, 36, 32, 0, 360 xc5vlx50ff1153-3, 7200, 48, 48, 0, 560 xc5vlx50tff1136-3, 7200, 60, 48, 0, 480 END_FAMILY

76 Result Files report_resource_utilization.txt xilinx : spartan3 +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ | default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 | +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+ xilinx : spartan6 +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ | default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 | +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+ xilinx : virtex5 +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ | default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 | +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+ xilinx : virtex6 +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+ | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+ | default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 | +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+

77 Result Files report_timing.txt REQ SYN FREQ- Requested synthesis clk freq.SYN FREQ – Achieved synthesis clk. freq. REQ SYN TCLK- Requested synthesis clk periodSYN TCLK – Achieved synthesis clk. period REQ IMP FREQ- Requested implement. clk freq.IMP FREQ – Achieved implement. clk. freq. REQ IMP TCLK- Requested implement. clk periodIMP TCLK – Achieved implement clk. period LATENCY- Latency [ns]THROUGHPUT – Throughput [Mbits/s] TP/Area - Throughput/Area [(Mbits/s)/CLB slicesLatency*Area – Latency*Area [ns*CLB slices] xilinx : spartan3 +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc3s200ft256-5* | 1 | default | 207.370 | default | 4.822 | default | 112.448 | default | 8.893 | 17.786 | 449.792 | 6.078 | 1316.164 | +---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : spartan6 +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc6slx9csg324-3* | 1 | default | 75.751 | default | 13.201 | default | 78.119 | default | 12.801 | 25.602 | 312.476 | 14.203 | 563.244 | +---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc5vlx20tff323-2* | 1 | default | 156.347 | default | 6.396 | default | 126.952 | default | 7.877 | 15.754 | 507.808 | 9.068 | 882.224 | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+ | default | xc6vlx75tff784-3* | 1 | default | 158.053 | default | 6.327 | default | 135.410 | default | 7.385 | 14.770 | 541.638 | 25.792 | 310.170 | +---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

78 Result Files report_options.txt xilinx : spartan3 +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ | default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+ xilinx : spartan6 +---------+------------------+-----+------------+------------------------------+---------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+------------------+-----+------------+------------------------------+---------------+--------------+ | default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | +---------+------------------+-----+------------+------------------------------+---------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ | default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ | default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | +---------+-------------------+-----+------------+------------------------------+---------------+--------------+ COST TABLE - parameter determining the starting point of placement Synthesis Options – options of the synthesis tool Map Options – Options of the mapping tool PAR Options – Options of the place & route tool

79 Result Files report_execution_time.txt xilinx : spartan3 +---------+-----------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-----------------+-----+----------------+---------------------+--------------+ | default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s | +---------+-----------------+-----+----------------+---------------------+--------------+ xilinx : spartan6 +---------+------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+------------------+-----+----------------+---------------------+--------------+ | default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s | +---------+------------------+-----+----------------+---------------------+--------------+ xilinx : virtex5 +---------+-------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-------------------+-----+----------------+---------------------+--------------+ | default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s | +---------+-------------------+-----+----------------+---------------------+--------------+ xilinx : virtex6 +---------+-------------------+-----+----------------+---------------------+--------------+ | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | +---------+-------------------+-----+----------------+---------------------+--------------+ | default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s | +---------+-------------------+-----+----------------+---------------------+--------------+ Synthesis Time- Time of Synthesis Implementation Time- Time of Implementation Elapsed Time - Total Time

80 design.config.txt Functional Simulation (1) # FUNCTIONAL_VERFICATION_MODE = FUNCTIONAL_VERIFICATION_MODE = # directory containing source files of the testbench VERIFICATION_DIR = # A file containing a list of testbench files in the order suitable for compilation; # low level modules first, top level entity last. # Test vector files should be located in the same directory and listed # in the same file, unless fixed path is used. Please refer to tutorial for more detail. VERIFICATION_LIST_FILE = # name of testbench's top level entity TB_TOP_LEVEL_ENTITY = # name of testbench's top level architecture TB_TOP_LEVEL_ARCH =

81 design.config.txt Functional Simulation (2) # MAX_TIME_FUNCTIONAL_VERIFICATION = #supported unit are : ps, ns, us, and ms #if blank, simulation will run until it finishes = # = no changes in signals, i.e., clock is stopped and no more inputs coming in. MAX_TIME_FUNCTIONAL_VERIFICATION = <> # Perform only verification (synthesis and implementation parameters are ignored) # VERIFICATION_ONLY = VERIFICATION_ONLY =

82 82 ATHENa – Database of Results ATHENa – Database of Results

83 83 ATHENa Database http://cryptography.gmu.edu/athenadb

84 84 ATHENa Database – Result View Algorithm parameters Design parameters  Optimization target  Architecture type  Datapath width  I/O bus widths  Availability of source code  Platform  Vendor, Family, Device  Timing  Maximum clock frequency  Maximum throughput  Resource utilization  Logic blocks (Slices/LEs/ALUTs)  Multipliers/DSP units  Tools  Names & versions  Detailed options  Credits  Designers & contact information

85 85 ATHENa Database – Compare Feature Matching fields in grey Non-matching fields in red and blue

86 86 Possible Future Customizations The same basic database can be customized and adapted for other domains, such as Digital Signal Processing Bioinformatics Communications Scientific Computing, etc.

87 87 ATHENa - Website

88 88 ATHENa Website http://cryptography.gmu.edu/athena/ Download of ATHENa Tool Links to related tools SHA-3 Competition in FPGAs & ASICs Specifications of candidates Interface proposals RTL source codes Testbenches ATHENa database of results Related papers & presentations

89 89 GMU Source Codes for all Round 3 SHA-3 Candidates & SHA-2 made available at the ATHENa website at: http://cryprography.gmu.edu/athena Included in this release: Basic architectures Folded architectures Unrolled architectures Each code supports two variants: with 256-bit and 512-bit output. Each source code accompanied by comprehensive hierarchical block diagrams GMU Source Codes and Block Diagrams

90 90 ATHENa Result Replication Files Scripts and configuration files sufficient to easily reproduce all results (without repeating optimizations) Automatically created by ATHENa for all results generated using ATHENa Stored in the ATHENa Database In the same spirit of Reproducible Research as: Patrick Vandewalle 1, Jelena Kovacevic 2, and Martin Vetterli 1 ( 1 EPFL, 2 CMU) Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/ J. Claerbout (Stanford University) “Electronic documents give reproducible research a new meaning,” in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92.....

91 91 Benchmarking Goals Facilitated by ATHENa 1.cryptographic algorithms 2.hardware architectures or implementations of the same cryptographic algorithm 3.hardware platforms from the point of view of their suitability for the implementation of a given algorithm, (e.g., choice of an FPGA device or FPGA board) 4.tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 12.3) Comparing multiple:

92 George Mason University Modern FPGA Families

93 93ECE 448 – FPGA and ASIC Design with VHDL Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Lattice Semiconductor Atmel Achronix Tabula Flash & antifuse FPGAs Actel Corp. (Microsemi SoC Products Group) Quick Logic Corp. ~ 51% of the market ~ 34% of the market ~ 85%

94 TechnologyLow-costHigh-performance 220 nmSpartan IIVirtex 120/150 nmVirtex II, II Pro 90 nmSpartan 3Virtex 4 65 nmVirtex 5 45 nmSpartan 6 40 nmVirtex 6 28 nmArtix 7Virtex 7 Xilinx FPGA Devices

95 Altera FPGA Devices TechnologyLow-costMid-rangeHigh- performance 130 nmCycloneStratix 90 nmCyclone IIStratix II 65 nmCyclone IIIArria IStratix III 40 nmCyclone IVArria IIStratix IV 28 nmCyclone VArria VStratix V

96 96 Resources Xcell Journal available for FREE on line @ http://www.xilinx.com/about/xcell-publications/xcell-journal.html Electronic Engineering Journal available for FREE by e-mail after subscribing @ http://www.eejournal.com/subscribe http://www.eejournal.com/subscribe or on the web @ http://www.eejournal.com/design/fpga

97 George Mason University Follow-up Courses

98 ECE Department MS in Electrical Engineering MS EE MS in Computer Engineering MS CpE COMMUNICATIONS & NETWORKING SIGNAL PROCESSING CONTROL & ROBOTICS MICROELECTRONICS/ NANOELECTRONICS SYSTEM DESIGN DIGITAL SYSTEMS DESIGN COMPUTER NETWORKS MICROPROCESSORS & EMBEDDED SYSTEMS NETWORK & SYSTEM SECURITY Programs Specializations BIOENGINEERING DIGITAL SIGNAL PROCESSING

99 DIGITAL SYSTEMS DESIGN 1.ECE 545 Digital System Design with VHDL (Fall) – K. Gaj, project, FPGA design with VHDL, Aldec/Synplicity/Xilinx/Altera 2. ECE 645 Computer Arithmetic (Spring) – K. Gaj, project, FPGA design with VHDL or Verilog, Aldec/Synplicity/Xilinx/Altera 3. ECE 586 Digital Integrated Circuits (Spring) – D. Ioannou 4. ECE 681 VLSI Design for ASICs (Fall) – H. Homayoun, project/lab, front-end and back-end ASIC design with Synopsys tools 5. ECE 682 VLSI Test Concepts (Spring) – T. Storey, homework 6. ECE 699 Digital Signal Processing Hardware Architectures (Spring) – A. Cohen, project, FPGA design with VHDL or Verilog

100 DIGITAL SIGNAL PROCESSING Concentration advisors: Aaron Cohen, Kris Gaj, Ken Hintz, Jill Nelson, Kathleen Wage 1.ECE 535 Digital Signal Processing – L. Griffiths, J. Nelson, Matlab 2.ECE 545 Digital System Design with VHDL – K. Gaj, project, FPGA design with VHDL 3.ECE 645 Computer Arithmetic – K. Gaj, project, FPGA design with VHDL 4.ECE 699 Digital Signals Processing Hardware Architectures – A. Cohen, project, FPGA design with VHDL and Matlab/Simulink 5a. ECE 537 Introduction to Digital Image Processing – K. Hintz 5b. ECE 738 Advanced Digital Signal Processing – K. Wage

101 Possible New Graduate Computer Engineering Courses 5xx Digital System Design with Verilog 6xx Reconfigurable Computing (looking for instructors)

102 NETWORK AND SYSTEM SECURITY 1.ECE 542 Computer Network Architectures and Protocols (Fall, Spring) – S.-C. Chang, et al. 2.ECE 646 Cryptography and Computer Network Security (Fall) – K. Gaj, J-P. Kaps – lab, project: software/hardware/analytical 3.ECE 746 Advanced Applied Cryptography (every 2 nd Spring, 2015) – K. Gaj, J-P. Kaps – lab, project: software/hardware/analytical 4.ECE 699 Cryptographic Engineering (every 2 nd Spring, 2014) – J-P. Kaps – lectures + student/invited guests seminars 5.ISA 656 Network Security (Fall, Spring) – A. Stavrou

103 ECE 645 Computer Arithmetic Instructor: Dr. Kris Gaj

104 Advanced digital circuit design course covering addition and subtraction multiplication division and modular reduction exponentiation Efficient architectures for Integers unsigned and signed Real numbers fixed point single and double precision floating point Elements of the Galois field GF(2 n ) polynomial base

105 At the end of this course you should be able to: Understand mathematical and gate-level algorithms for computer addition, subtraction, multiplication, division, and exponentiation Understand tradeoffs involved with different arithmetic architectures between performance, area, latency, scalability, etc. Synthesize and implement computer arithmetic blocks on FPGAs Be comfortable with different number systems, and have familiarity with floating-point and Galois field arithmetic for future study Understand sources of error in computer arithmetic and basics of error analysis This knowledge will come about through homework, project and practice exams. Course Objectives

106 1. Applications of computer arithmetic algorithms. Initial Discussion of Project Topics. INTRODUCTION Lecture topics

107 1.Basic addition, subtraction, and counting 2.Addition in Xilinx and Altera FPGAs 3. Carry-lookahead, carry-select, and hybrid adders 4. Adders based on Parallel Prefix Networks 5.Pipelined Adders 6.Modular addition and subtraction ADDITION AND SUBTRACTION

108 MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple unsigned and signed numbers

109 Unsigned Integers Signed Integers Fixed-point real numbers Floating-point real numbers Elements of the Galois Field GF(2 n ) NUMBER REPRESENTATIONS

110 LONG INTEGER ARITHMETIC 1.Modular Exponentiation 2.Montgomery Multipliers and Exponentiation Units

111 MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring 4. Multiplication in Xilinx and Altera FPGAs - using distributed logic - using embedded multipliers - using DSP blocks 5. Multiple clock systems

112 DIVISION 1.Basic restoring and non-restoring sequential dividers 2. SRT and high-radix dividers 3. Array dividers 4. Division by Convergence

113 FLOATING POINT AND GALOIS FIELD ARITHMETIC 1.Floating-point units 2. Galois Field GF(2 n ) units


Download ppt "George Mason University FPGA Memories ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 10."

Similar presentations


Ads by Google