ECE web page Courses Course web pages Introduction to VHDL Course web page: ECE web page Courses Course web pages ECE 545 http://ece.gmu.edu/courses/ECE545/index.htm
Research and teaching interests: Kris Gaj Research and teaching interests: reconfigurable computing computer arithmetic cryptography network security Contact: Science & Technology II, room 223 kgaj01@yahoo.com, kgaj@gmu.edu (703) 993-1575 Office hours: Wednesday, Thursday 7:30-8:30 PM and by appointment
MS in Computer Engineering ECE 545 Part of: MS in Computer Engineering Required course in two concentration areas: Digital Systems Design Microprocessor and Embedded Systems Elective course in the remaining concentration areas MS in Electrical Engineering Elective
Courses Design level Introduction to VHDL Computer Arithmetic VLSI Design Automation VLSI Test Concepts algorithmic ECE 645 ECE 545 register-transfer ECE 681 ECE 682 gate ECE 586 transistor Digital Integrated Circuits ECE 699 layout Mixed Signals VLSI Semiconductor Device Fundamentals MOS Device Electronics ECE 584 ECE684 devices
DIGITAL SYSTEMS DESIGN Concentration advisor: Ken Hintz ECE 545 Introduction to VHDL – K. Gaj, D. Hwang, K. Hintz, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys ECE 645 Computer Arithmetic: HW and SW Implementation – K. Gaj, project, VHDL/Verilog, Aldec/Synplicity/Xilinx and ModelSim/Synopsys ECE 586 Digital Integrated Circuits – D. Ioannou ECE 681 VLSI Design Automation – T. Storey, project/lab, back-end design with Synopsys tools
MICROPROCESSOR AND EMBEDDED SYSTEMS Concentration advisor: Ron Barnes ECE 511 Microprocessors – R. Barnes, P. Pachowicz, ECE 545 Introduction to VHDL – K. Gaj, D. Hwang, K. Hintz, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys ECE 611 Advanced Microprocessors – R. Barnes ECE 612 Real-Time Embedded Systems – H. Camp, K. Hintz, D. Hwang
Concentration Area Advisors DIGITAL SYSTEMS DESIGN: Ken Hintz COMPUTER NETWORKS: Brian Mark NETWORK AND SYSTEM SECURITY: Kris Gaj MICROPROCESSOR AND EMBEDDED SYSTEMS: Ron Barnes
Core courses There are TWO core courses common for all concentration areas: CS 571 Operating Systems – H. Aydin, S. Setia, C. Snow, project, C/C++ or Java Pros: Prerequisite for many other courses and projects HLL (High Level Language) refresher Offered regularly in Fall and Spring ECE 548 Sequential Machine Theory – K. Hintz, R. Schneider Common theoretical and mathematical foundation used in all concentrations Offered regularly in Spring Not a strong prerequisite for any other course; can be taken any time during the curriculum.
Fall 2006 Enrollment as of August 31, 2006 ENGR in IT 1 PhD in ECE 1 BS in EE 2 MS in CpE 7 Non-degree 7 MS in EE 17
Fall 2005 Enrollment as of August 31, 2005 MS in IS 1 PhD in IT 1 PhD in ECE 1 MS in CpE 13 MS in EE 12
VLSI
Courses Design level MS CpE MS EE Introduction to VHDL Computer Arithmetic VLSI Design Automation VLSI Test Concepts algorithmic ECE 645 register-transfer ECE 545 MS CpE ECE 681 ECE 682 gate ECE 586 transistor Digital Integrated Circuits ECE 699 MS EE layout Mixed Signals VLSI Semiconductor Device Fundamentals MOS Device Electronics ECE 584 ECE684 devices
CpE EE Microelectronics Digital Systems Design Core Courses Required CS 571 Operating Systems ECE 548 Sequential Machine Theory ECE 584 Semiconductor Device Fundamentals ECE 521 or 528 or 548 ECE 545 Introduction to VHDL ECE 645 Computer Arithmetic ECE 681 VLSI Design Automation ECE 586 Digital Integrated Circuits ECE 586 Digital Integrated Circuits + 3 out of 4: ECE 684 MOS Device Electronics ECE 699 Mixed Signals VLSI ECE 745 ULSI Microelectronics ECE 699 Nanoelectronics Required Courses CpE Electives including ECE 584, 684, … (technology) ECE 511, 611, … (microprocessors) ECE 646, 746, … (applications) ECE electives including ECE 545, 645 (digital design) ECE 587 (analog design) ECE 513, 563 (electromagnetics) ECE 565, 567 (optics) Electives D. Ioannou, R. Mulpuri Professors K. Gaj, J. Kaps, D. Hwang, K. Hintz, R. Barnes
Robotics
CpE EE Microprocessors and Control and Robotics Embedded Systems Core Courses CS 571 Operating Systems ECE 548 Sequential Machine Theory ECE 521 Modern Systems Theory and ECE 528 or 548 or 584 ECE 511 Microprocessors ECE 545 Introduction to VHDL ECE 611 Advanced Microprocessors ECE 612 Real Time Embedded Systems 3 out of 4: ECE 612 Real Time Embedded Systems ECE 620 Optimal Control Theory ECE 624 Control Systems ECE 673 Discrete Event Systems Required Courses CpE Electives including CS 540, 583 (languages, algorithms) CS 635 (parallel machines) ECE 542, 642, 742 (networks) ECE 645, 681 (digital design) ECE electives including ECE 670, 671 (C4I) ECE 542, 642 (communications) ECE 535, 635 (signal processing) Electives J. Gertler, G. Cook, K. Hintz, A. Levis Professors R. Barnes, P. Pachowicz, K. Hintz, D. Hwang, K. Gaj
ECE 545
ECE 545 Lecture Projects Project 1 30 % Homework Project 2p 15 % 10 % Midterm exams Midterm 1 20 % in class Midterm 2 20 % take home
Lecture (1) Lecture 1 - Introduction to VHDL for Synthesis Lecture 2 - Data Flow and Structural Modeling of Combinational Logic. Packages and Components. Hands-on Session 1: VHDL Simulators: Active HDL and ModelSim Lecture 3 – Behavioral Modeling of Sequential Logic. Registers, Counters, Shift Registers. Simple Testbenches. Lecture 4 - Introduction to FPGA Devices & Tools Hands-on Session 2: Tools for FPGA Synthesis and Implemenation Lecture 5 - Finite State Machines Lecture 6 - Algorithmic State Machines. Memories: RAM, ROM. Lecture 7 – Advanced Testbenches. File I/O. Lecture 8 - Mixed Style RTL Modeling Advanced Examples: Sorting, Average, MAX, MIN Midterm 1
Lecture (2) Lecture 9 – ASIC Logic Synthesis with Synopsys Design Compiler Hands-on Session 3: ASIC Synthesis - Synopsys Design Compiler Lecture 10 – Timing of Digital Systems Hands-on Session 4: ASIC Timing Analysis - Synopsys PrimeTime Lecture 11 - Variables, Functions and Procedures Lecture 12 – Advanced Data Types. Operators and Attributes. Lecture 13 - Behavioral Modeling - The DLX Computer System Lecture 14 – Discrete Event Simulators. VHDL vs. Verilog. Midterm Exam 2
Textbooks Required Textbooks: Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004 Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998 Supplementary Textbooks: Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, 2nd Edition, McGraw-Hill, 2005 Peter J. Ashenden, The Designer's Guide to VHDL, 2nd Edition, San Francisco:Morgan Kaufman, 1996, 2002
Midterm exam 1 2 hours 30 minutes in class design-oriented open-books, open-notes practice exams will be available on the web Tentative date: Thursday, October 26th
Saturday, Sunday, December 9-10 Midterm Exam 2 take-home full design, including logic synthesis and timing analysis for FPGAs or ASICs 48 hours Tentative date: Saturday, Sunday, December 9-10
Project technologies FPGA: Field Programmable Gate Arrays and ASIC: semi-custom Application Specific Integrated Circuits
World of Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA PAL PLA PML LUT (Look-Up Table) MUX Gates
Two competing implementation approaches FPGA Field Programmable Gate Array ASIC Application Specific Integrated Circuit designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves no physical layout design; design ends with a bitstream used to configure a device designed all the way from behavioral description to physical layout
Which Way to Go? ASICs FPGAs Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability
What is an FPGA Chip ? Field Programmable Gate Array A chip that can be configured by user to implement different digital hardware Configurable Logic Blocks and Programmable Switch Matrices Bitstream to configure: function of each block & the interconnection between logic blocks I/O Block “FPGAs are the reconfigurable top of the shelf chips. Reconfiguration technique is very similar to SRAM approach.” The FPGA architectures consist of CLBs and Programmable Switch Matrices (PSMs). The gates are placed inside CLBs in FPGAs. These info will be supported by the next slide as well. Bullet 3 is not very correct. Because FPGA is not a large array of gates with programmable interconnections. As I have mentioned above, the main power of FPGAs come from CLB and its internal components like LUTs(look up tables), Carry bit Logics, Gates, FFs(Flip Flops) and MUXs(multiplexors). Source: [Brown99]
CLB Structure The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.
CLB Slice SLICE Carry & Control Logic Carry & Control Logic COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE
LUT (Look-Up Table) Functionality Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs
Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Atmel Lattice Semiconductor Flash & antifuse FPGAs Actel Corp. Quick Logic Corp. Share over 60% of the market
Xilinx FPGA Families Old families XC3000, XC4000, XC5200 old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. Low-cost families Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Virtex-4 (0.09µm)
Design process (1) Specification Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. VHDL description (Your VHDL Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Functional simulation Synthesis Post-synthesis simulation
Design process (2) Implementation (Mapping, Placing & Routing) Timing simulation Configuration On chip testing
Design Process control from Active-HDL
Simulation Tools Many others…
Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
Synthesis Tools … and others
Features of synthesis tools Interpret RTL code Produce synthesized circuit netlist in a standard EDIF format Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist
Implementation After synthesis the entire implementation process is performed by FPGA vendor tools
Mapping LUT0 LUT4 LUT1 FF1 LUT5 LUT2 FF2 LUT3
Placing FPGA CLB SLICES
Routing FPGA Programmable Connections
Design Process control from Active-HDL
Top Level ASIC Digital Design Flow Design Inception RTL Design Synthesis Macro Development Place + Route Physical Verification Design Complete
RTL Design Design Function Digital Tool Design Inception Cadence NC Verilog Mentor Graphis ModelSim Lint Checking ( users discression) Cadence Hal FPGA Verification ( users discression) Xilinx ISE Code Coverage ( users discression) Cadence ICT Testbench Developement Cadence NC Verilog Mentor Graphics ModelSim Mixed Mode Simulation Cadence AMS Designer Formal Verification Cadence Conformal System Interface Simulation Agilent ADS Synthesis Matlab Synthesis + Macro Synthesis + Macro Development Development
Synthesis + Macro Development Design Function Digital Tool RTL RTL Synthesis Macro Generation Synopsys DC Cadence RC Artisan DFT Macro Verification Synopsys DFT Compiler Cadence RC Mentor Graphics Calibre Macro Rules Generation / Static Timing Analysis Synopsys PrimeTime Artisan / Library Generation Cadence DFII Logical Equivalency Cadence Conformal Verification Verification Gate - Level Simulation Cadence NC Verilog Mentor Graphics Modelsim Place + Route Place + Route
Place + Route Design Function Digital Tool Synthesis Synthesis Floorplan Macro Placement / Std Cell Placement Cadence Encounter Placement - Based Optimization Clock Tree Synthesis Static Timing Synopsys Analysis Prime - Time Route Cadence NanoRoute Spare Cells / Decoupling ATPG Mentor Graphics Cap Filler Cells FastScan Cadence Encounter RC Extraction Cadence Fire & Ice QX Signal Integrity Cadence CeltIC / Voltage Storm Metal Fill Cadence Encounter Verification Verification
Physical Verification Design Function Digital Tool Placed + Routed Placed + Routed Design Design GDSII Preparation / Simulation Preparation Cadence DFII Cadence DFII Schematic Preparation Back Annotated Simulation Layout Chip Finishing Cadence Virtuoso Cadence NC Verilog DRC LVS Mentor Graphics Calibre ERC Synopsys Nanosim Top - Level Simulation Cadence AMS Designer Design Complete Design Complete
CAD software available at GMU (1) VHDL simulators Aldec Active-HDL (under Windows) available in the FPGA Lab, S&T II, room 203 student edition can be purchased on an individual basis ($59.95 + S&H) ModelSim (under Unix) available from all PCs in the ECE educational labs using an X-terminal emulator available remotely from home using a fast Internet connection
CAD software available at GMU (2) Tools used for logic synthesis FPGA synthesis Synplicity Synplify Pro (under Windows) Xilinx XST (under Windows) available in the FPGA Lab, S&T II, room 203 ASIC synthesis Synopsys Design Compiler (under Unix) available from all PCs in the ECE educational labs using an X-terminal emulator available remotely from home using a fast Internet connection
CAD software available at GMU (3) Tools used for implementation (mapping, placing & routing) in the FPGA technology Xilinx ISE (under Windows) available in the FPGA Lab, S&T II, room 203
Projects – Overview Project 1 (30 points) mid-September – October (~6 weeks) Application: cryptography OR digital signal processing Technology: FPGA Target: synthesizable code, timing, resource usage Project 2a (15 points or 5 points) November (~3 weeks or 2 weeks) Application: the same as in Project 1 Technology: ASIC Target: revised synthesizable code, synthesis scripts, timing analysis, resource usage, comparison Project 2b (5 points or 15 points) November-December (~2 weeks or 3 weeks) Application: simple microprocessor/microcontroller Target: behavioral code
Projects – Overview Project 1 FPGA Project 2 Primary Secondary 30 points FPGA Project 2 Primary Secondary CpE: Digital Systems Design, EE CpE: Microprocessors and Embedded Systems behavioral ASIC 15 points behavioral ASIC 5 points
Projects 1, 2 choice between two project topics cryptography (e.g., encryption, authentication, hash) digital signal processing (e.g., digital filter, FFT, image processing, etc.) both topics specified by the instructor initial specification in the form of a - pseudocode and/or flowchart - detailed interface design and source code is required to be scalable, i.e., work for different parameters and operand sizes, specified at the time of synthesis
Decryption Encryption Example: Last year’s project – RC6 cipher Input: (A, B, C, D) Table S[0..2r+3] B = B + S[0] D = D + S[1] for i= 1 to r do { t= (B*(2B+1)) <<< log2w u= (D*(2D+1)) <<< log2w A= ((At) <<< u) + S[2i] C= ((Cu) <<< t) + S[2i+1] (A, B, C, D) = (B, C, D, A) } A = A + S[2r+2] C = C + S[2r+3] Output: (A, B, C, D) Input: (A, B, C, D) Table S[0..2r+3] C = C – S[2r+3] A = A – S[2r+2] for i= r downto 1 do { (A, B, C, D) = (D, A, B, C) u= (D*(2D+1)) <<< log2w t= (B*(2B+1)) <<< log2w C= ((C – S[2i+1]) >>> t)u A= ((A – S[2i]) >>> u)t } D = D – S[1] B = B – S[0] Output (A, B, C, D)
Encryption/decryption Required interface clock Encryption/decryption unit with control & i/o interface reset enc_dec m m data_out data_in data_available write full data_read round number round key(s) w S_i Key memory unit key_available key_read ready
Projects 1, 2 Optimization Criteria Maximum ratio Throughput / Circuit Area or Minimum product Latency Circuit Area
Primary timing parameters Latency Throughput Xi+2 Xi Xi+1 Xi Time to process a single block of data Circuit Circuit Number of bits processed in a unit of time Yi+2 Yi Yi+1 Yi Block_size · Number_of_blocks_processed_simultaneously Throughput = Latency
Infinite Impulse Response (IIR) Filter Equations (1) Transfer function
Two investigated architectures Architecture 1: Direct II Form
Cascade of second-order systems Architecture 2: Cascade of second-order systems (b) Fi(z)
Example of coefficients: Butterworth filter Order O=10, Passband Fp=0.3 Architecture 1: Direct II Form a[1..10] = b[1..10] = Architecture 2: Cascade of second-order systems
Required interface IIR Filter with control unit & i/o interface clock reset wo process data_out wi data_in wc valid a_i wc b_i ab_write ready
Project 2a from FALL 2005 to be modified in FALL 2006
Project 2a - Platform & tools Target devices: standard-cell ASICs Libraries: 90 nm TCBN90G TSMC library 130 nm TCB013GHP TSMC library Tools: VHDL Simulation: Aldec Active HDL or ModelSim VHDL Synthesis: Synopsys Design Compiler
Task 1 Adjust your synthesizable code for Project 1 in such a way that it can be synthesized using Synopsys and TSMC libraries of standard cells.
Task 2 Prepare a comprehensive testbench capable of verifying the operation of your entire circuit and run it under ModelSim. This testbench should read test vectors from a text file. All values should be stored in the hexadecimal notation. Verify the function of your circuit using this testbench.
Task 3 Synthesize your code using Synopsys for at least two sets of the circuit parameters, using the following tools and libraries: Synopsys with the 90 nm TCBN90G TSMC library Synopsys with the 130 nm TCB013GHP TSMC library Synplify Pro using the smallest device of the Xilinx Spartan 2 family capable of holding the largest of the implemented circuits. Use at least one set of parameters recommended in the specification. Analyze, compare, and discuss the obtained netlists.
Task 4 For all synthesized circuits, determine maximum clock frequency maximum throughput area ratio: maximum throughput divided by area. Compare, discuss, and explain results obtained for all analyzed cases. Explain the dependence between values of parameters (such as word size in RC6, or filter range in the IIR filter) and the area and timing of your circuit.
Task 5 Optimize your circuit for the maximum throughput to area ratio. Compare, discuss, and explain results before and after the optimization.
Project 2b from FALL 2005 to be modified in FALL 2006
Microcontroller Using high-level behavioral VHDL describe an 8-bit microcontroller MC68HC11E1, working in the expanded mode, with the following simplifications: Inputs and outputs of the microcontroller are reduced to E (clock), RESETn (reset active low), RW (read/write), AS (address strobe), ADDR15..8 (also denoted as PB7..0), ADDR7..0/DATA7..0 (multiplexed address & data, also denoted as PC7..0), PORTD and PORTE.
2. Internal registers are reduced to the registers A, IX, SP, CC (Condition Codes NZVC), and PC. 3. The only parts of 68HC11E1 implemented in your model are: a. CPU b. RAM (512 B in the range $0000-$01FF) c. parallel I/O (PORTD and PORTE) 4. Internally generated clock E has a frequency 2 MHz. 5. Internal I/O registers are limited to PORTD at the memory address $1008 DDRD at the memory address $1009 PORTE at the memory address $100A
6. Instruction set of the microcontroller is reduced to the following instructions Data transfer instructions LDAA, LDX, LDS, STAA, STX Arithmetic instructions CLRA, NEGA, ADDA, SUBA, ASRA, ASLA Logic instructions ANDA, ORAA, EORA Data test instructions CMPA, CPX, TSTA Control instructions BEQ, BGT, BHI, BSR, JSR, RTS, JMP Stack instructions PSHA, PULA, PSHX, PULX
7. Addressing modes of the microcontroller are reduced to the following modes a. immediate b. extended c. indexed d. inherent e. relative 8. Main program is stored in the external RAM starting at the address $4000. 9. After reset, PC is set to the address $0000 (internal RAM of MC68HC11) where the instruction JMP $4000 is located.
Microcontroller system The implemented microcontroller system should consist of: Microcontroller MC68HC11E1 8 kB RAM, such as 6164 74HC373 8-bit latch 74HC138 decoder chip Auxiliary gates, if needed
Write Cycle
Features of the model Your model should allow cycle accurate modeling of the circuit behavior. 2. Your model should contain debugging features equivalent to the debugging features of the DLX model, discussed in class and described in Ashenden, Chapter 15. 3. Generic parameters passed to the model should include a. name of the file with the contents of the external RAM b. clk-to-output delay c. debugging mode Your model should report all undefined opcodes, treat them as NOP, and proceed to the next RAM address.
Testing and debugging The behavior of your model should be carefully verified using a testbench instantiating your model with a. the external RAM containing a valid program composed of a substantial subset of instructions implemented in the model b. debugging mode set to the most detailed mode (trace_each_step)
Deliverables All source code files. Contents of the external RAM used for the model verification, in the hexadecimal notation, and expressed using the corresponding 68HC11 assembly language mnemonics. The detailed log/report generated by your model for a given contents of RAM, and with the debugging mode set to trace_each_step.
All Projects - Organization Projects divided into phases Intermediate code submitted through WebCT at selected checkpoints and evaluated by the instructor and/or TA Penalty points for falling behind the schedule (below 50% of the work that supposed to be done by a certain deadline) Feedback provided to students on a fair and best effort basis Final report and codes submitted by WebCT and graded using a full scale Contest for the best results (bonus points awarded to the winners) Penalty and bonus points added to the final grade
Honor Code Rules All students are expected to write and debug their codes individually Students are encouraged to help and support each other in all problems related to the - operation of the CAD tools, - basic understanding of the problem.