Intro to Timing analysis via the timequest timing analyzer By Waleed Atallah and Larry Landis
Objective You will learn the fundamental theory behind timing analysis including: Key terminology Math behind timing analysis Setup and Hold Slack equations with exercises How timing constraints are described in Quartus (.SDC files) Creating I/O Constraints with exercises Other terms and considerations
Requirements You will need: Basic background in digital logic Quartus Prime 17.0 lite or newer Materials: Lab Document: http://www.alterawiki.com/wiki/File:TimingAnalysisLab.pdf Slides: http://www.alterawiki.com/wiki/File:TimeQuest_class.pptx
Part 1 Timing analysis fundamentals Key terms, definitions, and math
Why Semiconductors Fail Many Opinions ... Here are some Functionally Incorrect Bad Timing Constraints (* or no timing constraints) Didn’t follow manufacturing guidelines (LVS, DRC) Test Escapes Too much power consumption Signal Integrity Crosstalk Wearout mechanisms
What effects circuit timing? Length of wire R and C of wire Logic depth of the path Size of the transistors Process - deposition Voltage Temperature Some logic depicting different depths of paths
“Timing Arc” It takes a finite time for an input to effect an output change Very often tphl and tplh can be different amounts of time
Terminology Types of paths that TimeQuest analyzes Data Paths: Paths a signal travels between inputs, sequential elements, and outputs Clock Paths: Paths from device ports or internally generated clocks to the clock pins of sequential elements Asynchronous Paths: Paths between input port and asynchronous set or clear pin of a sequential element. Used in recovery and removal checks
Timing Paths Three Data Path Types From input to a sequential element From sequential element to sequential element From sequential element to output Data Path
Setup and Hold Time Setup: The minimum time the data signal must be stable BEFORE the clock edge Hold: The minimum time the data signal must be stable AFTER the clock edge Data Valid Window: the range of time around the clock edge in which data must remain stable to be properly captured A register samples the input on the rising edge of a clock signal and updates its output with the value sampled. For sampling to take place correctly, the input data must be stable for a period of time before and after the rising edge of the clock
Example of Data Settling in Gate Level Simulation
Launch and Latch Edge Launch Edge: the clock edge that activates or launches the source register Latch Edge: the clock edge that latches the data into the destination register Data
Data and Clock Arrival Time Data Arrival Time (DAT): the time it takes for the data to arrive at the destination register input Clock Arrival Time (Tclk) : the time it takes for the clock to arrive at the destination register Clock to Out Time (tco) : the time it takes for a signal to propagate out after a clock edge
Data Arrival Time tdata tdata
Clock Arrival Time
Data Required Time Data Required Time (setup): the minimum time required BEFORE the latch edge for data to get latched into the destination register = Clock Arrival Time – Setup Time Data Required Time (hold): the minimum time required AFTER the latch edge for the data to remain valid for successful latching = Clock Arrival Time + Hold Time Image Caption 10pt gray text
Data Required Time (setup) Data Required Time (setup) = Clock Arrival Time – Setup Time
Data Required Time (hold) Data Required Time (hold) = Clock Arrival Time + Hold Time
Setup Slack Data Valid Tclk1 tco tdata Setup slack = minimum data required time – max data arrival time Dependent on frequency!
Hold Slack tdata Hold slack = minimum data arrival time – max data required time tdata NOT dependent on frequency!
Math Test Yourself! Setup slack = (a) min data req time (setup) – max data arrival time (b) max data req time (setup) – min data arrival time (c) min data req time (hold) – max data req time (setup) (d) max data arrival time – min data req time (hold)
Math Test Yourself! Setup slack = (a) min data req time (setup) – max data arrival time (b) max data req time (setup) – min data arrival time (c) min data req time (hold) – max data req time (setup) (d) max data arrival time – min data req time (hold)
Review Remember these three lines and you’ll always know how to calculate slack Slack = min <something> - max <something> Setup = min DRT – max DAT Hold = min DAT – max DRT
Example: Setup T = 20 ns 50 MHz Setup slack: min delay along clock path – max delay along data path = (min data required time) (max data arrival time)
Example: Setup T = 20 ns 50 MHz Setup slack: min delay along clock path – max delay along data path = Setup slack: min delay along clock path – max delay along data path = (20+2+5+2-4) - (min data required time) (max data arrival time)
Example: Setup T = 20 ns 50 MHz Setup slack: min delay along clock path – max delay along data path = (20+2+5+2-4) – (2+11+2+9+2) = 25-26 = -1ns Setup slack: min delay along clock path – max delay along data path = (min data required time) (max data arrival time)
Example: Setup Calculate Maximum Frequency Setup Slack = -1 ns Make the period 1 ns longer, 20+1 = new period = 21 ns Tmin = DATmax + tsu – tclk,min = 26+4-9=21 ns = Data Arrival Time + setup time of destination reg – clock delay to destination reg fmax = 1/Tmin = 1/21 = 47.6 MHz T = 20 ns 50 MHz Setup slack: min delay along clock path – max delay along data path = 25 – 26 = -1 ns You fail setup timing when the clock is too fast compared to the arrival of the data. To fix this, one “nuclear” option is to slow the clock down. Slow it down by increasing the period.
Example: Hold T = 20 ns 50 MHz Hold slack: min delay along data path – max delay along clock path = (min data arrival time) (max data required time)
Example: Hold T = 20 ns 50 MHz Hold slack: min delay along data path – max delay along clock path = (1+9+1+6+1) – (min data arrival time) (max data required time)
Example: Hold T = 20 ns 50 MHz Hold slack: min delay along data path – max delay along clock path = (1+9+1+6+1) – (3+9+3+2) = 18-17 = 1 ns (min data arrival time) (max data required time)
Exercise one End of Part 1
Problem Exercise 1(a) Setup Slack = min DRT – max DAT = T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) – (1 + 2 + 5) = 2 ns T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) – (1 + 2 + 5) = 2 ns Hold Slack = min DAT – max DRT = T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) – (1 + 2 + 5) = 2 ns Hold Slack = min DAT – max DRT = T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) – (1 + 2 + 5) = 2 ns Hold Slack = min DAT – max DRT = (1 + 2 + 3) T = 10 ns
Solution Exercise 1(a) Setup Slack = min DRT – max DAT = (10 + 1 + 1 - 2) – (1 + 2 + 5) = 2 ns Hold Slack = min DAT – max DRT = (1 + 2 + 3) – (1 + 1 + 1.5) = 2.5 ns T = 10 ns
Solution Exercise 1(b) Setup Slack = min DRT – max DAT = Hold Slack = min DAT – max DRT = T = 6.6 ns
Solution Exercise 1(b) Setup Slack = min DRT – max DAT = (6.6 + 1 + 1 - 2) – (1 + 2 + 5) = -1.4 (Fails!) Hold Slack = min DAT – max DRT = T = 6.6 ns
Solution Exercise 1(b) Setup Slack = min DRT – max DAT = (6.6 + 1 + 1 - 2) – (1 + 2 + 5) = -1.4 (Fails!) Hold Slack = min DAT – max DRT = (6.6 + 1 + 2 + 3) – (6.6 + 1 + 1 + 1.5) = 2.5 ns (Doesn’t change!) T = 6.6 ns
Part 2 Describing timing conStraints Synopsys Design Constraints, set_input_delay, set_output_delay
Synopsys Design Constraints (SDC) Synopsys Design Constraints are the industry standard for describing timing constraints and exceptions Quartus Prime uses .SDC files to define clocks, I/O constraints, etc... The fitter needs this information to make the best decisions You can make and edit these using the TimeQuest GUI or by editing the .sdc file in a text editor
I/O Constraints Using SDC, specify set_input_delay and set_output_delay Set_input_delay (min, max) Set_output_delay (min, max) NOTE: TW is wire delay, or board delay TCLK is clock delay Every value has both a maximum and minimum
I/O Constraints set_input_delay Use this command to specify external delays feeding into the FPGA’s input ports set_input_delaymin = min clock-to-out of ext chip + min board delay – max clock skew = TCO,MIN + TW1,MIN – (TCLK1,MAX – TCLK2,MIN) set_input_delaymax = max clock-to-out of ext chip + max board delay – min clock skew = TCO,MAX + TW1,MAX – (TCLK1,MIN – TCLK2,MAX) set_input_delay [-add_delay] -clock <name> [-clock_fall] [-fall] [-max] [-min] [-reference_pin <name>] [-rise] [-source_latency_included] <delay> <targets>
I/O Constraints set_output_delay Use this command to specify external delays leaving the FPGA’s output ports set_output_delaymin = -(min hold time of ext chip) + min board delay – max clock skew = -Th,MIN + TW2,MIN – (TCLK3,MAX – TCLK1,MIN) set_output_delaymax = max setup time of ext chip + max board delay – min clock skew = TSU,MAX + TW2,MAX – (TCLK3,MIN – TCLK1,MAX) set_output_delay [-add_delay] -clock <name> [-clock_fall] [-fall] [-max] [-min][-reference_pin <name>] [-rise] [-source_latency_included] <delay> <targets>
I/O Constraints Check the datasheets
External Signals and your FPGA Check the datasheets!
I/O Constraints Example set_input_delaymin = TCO,MIN + TW1,MIN – (TCLK1,MAX – TCLK2,MIN) = Tw1 = 4 ± 1.0 ns Tw2 = 3 ± 0.5 ns Tsu = 2ns Th = 3 ns Tco = 3 ±1.0 ns Tclk1 = 1.0 ± 0.5 ns Tclk2 = 1.5 ± 0.5 ns Tclk3 = 1.5 ± 1.0 ns
I/O Constraints Example set_input_delaymin = TCO,MIN + TW1,MIN – (TCLK1,MAX – TCLK2,MIN) = 2 + 3 – (1.5 – 1) = 4.5 ns Tw1 = 4 ± 1.0 ns Tw2 = 3 ± 0.5 ns Tsu = 2ns Th = 3 ns Tco = 3 ±1.0 ns From data sheet! Tclk1 = 1.0 ± 0.5 ns Tclk2 = 1.5 ± 0.5 ns Tclk3 = 1.5 ± 1.0 ns
I/O Constraints Example set_input_delaymin = TCO,MIN + TW1,MIN – (TCLK1,MAX – TCLK2,MIN) = 2 + 3 – (1.5 – 1) = 4 ns set_input_delaymax = TCO,MAX + TW1,MAX – (TCLK1,MIN – TCLK2,MAX) = 4 + 5 – (0.5 – 2) = 10.5 ns Tw1 = 4 ± 1.0 ns Tw2 = 3 ± 0.5 ns Tsu = 2ns Th = 3 ns Tco = 3 ±1.0 ns From data sheet! Tclk1 = 1.0 ± 0.5 ns Tclk2 = 1.5 ± 0.5 ns Tclk3 = 1.5 ± 1.0 ns
I/O Constraints Example set_output_delaymin = -Th,MIN + TW2,MIN – (TCLK3,MAX – TCLK1,MIN) = -3 + 2.5 – (2.5 – 0.5) = -2.5 ns Tw1 = 4 ± 1.0 ns Tw2 = 3 ± 0.5 ns Tsu = 2ns Th = 3 ns Also from data sheet! Tco = 3 ±1.0 ns Tclk1 = 1.0 ± 0.5 ns Tclk2 = 1.5 ± 0.5 ns Tclk3 = 1.5 ± 1.0 ns
I/O Constraints Example set_output_delaymin = -Th,MIN + TW2,MIN – (TCLK3,MAX – TCLK1,MIN) = -3 + 2.5 – (2.5 – 0.5) = -2.5 ns set_output_delaymax = TSU,MAX + TW2,MAX – (TCLK3,MIN – TCLK1,MAX) = 2 + 3.5 – (0.5 – 1.5) = 6.5 ns Tw1 = 4 ± 1.0 ns Tw2 = 3 ± 0.5 ns Tsu = 2ns Th = 3 ns Also from data sheet! Tco = 3 ±1.0 ns Tclk1 = 1.0 ± 0.5 ns Tclk2 = 1.5 ± 0.5 ns Tclk3 = 1.5 ± 1.0 ns
Exercise Two End of Part 2
Solution Exercise 2(a) (assumes no delay between wires!) set_input_delaymin = clock-to-out delay of driving chip + board delay = set_input_delaymax = clock-to-out delay of driving chip + board delay =
Solution Exercise 2(a) (assumes no delay between wires!) set_input_delaymin = clock-to-out delay of driving chip + board delay = 0 ns + 0 ns = 0 ns set_input_delaymax = clock-to-out delay of driving chip + board delay = 4 ns + 0 ns = 4 ns
Solution Exercise 2(a) (assumes no delay between wires!) set_output_delaymin = -(hold time of the receiving chip) + board delay = set_output_delaymax = setup time of the receiving chip + board delay =
Solution Exercise 2(a) (assumes no delay between wires!) set_output_delaymin = -(hold time of the receiving chip) + board delay = -1 ns + 0 ns = -1 ns set_output_delaymax = setup time of the receiving chip + board delay = 2.1 ns + 0 ns = 2.1 ns
Solution Exercise 2(b) Setup slack = (Latch edge + Tclk1,min – Tsu) – (Launch edge + Tclk2,max + Tco + TW1,max + TPD,max) = (10+0-2)
Solution Exercise 2(b) Setup slack = (Latch edge + Tclk1,min – Tsu) – (Launch edge + Tclk2,max + Tco + TW1,max + TPD,max) = (10+0-2) – (0+0+4+0+5) = 8 – 9 = -1 ns (fails timing)
Solution Exercise 2(b) Hold slack = (Latch edge + Tclk2,min + Tco + TW1,min + TPD,min) – (Latch edge + Tclk1,max + Th) = (10 + 0 + 0 + 0 + 3)
Solution Exercise 2(b) Hold slack = (Latch edge + Tclk2,min + Tco + TW1,min + TPD,min) – (Latch edge + Tclk1,max + Th) = (10 + 0 + 0 + 0 + 3) – (10 + 0 + 1.5) = 1.5 ns (passes timing)
Solution Exercise 2(c) Setup FAILS by 1 ns! The clock period must be increased by 1ns to meet setup requirements. 10+1=11 fmax = 1/11ns = 90.91 MHz
Solution Exercise 2(d) Tclk1 = Tclk2 = Tclk3 = 1ns ± 0.5 ns TW1 = 1.5 ± 1 ns TW2 = 2 ns ± 0.5 ns set_input_delaymin = TCO,MIN + TW1,MIN – (TCLK1,MAX – TCLK2,MIN) = 0 + 0.5 - (1.5 – 0.5) = -0.5 ns set_input_delaymax = TCO,MAX + TW1,MAX – (TCLK1,MIN – TCLK2,MAX) = 4 + 1.5 - (0.5 – 1.5) = 6.5 ns
Solution Exercise 2(d) Tclk1 = Tclk2 = Tclk3 = 1ns ± 0.5 ns TW1 = 1.5 ± 1 ns TW2 = 2 ns ± 0.5 ns set_output_delaymin = -Th,MIN + TW2,MIN – (TCLK3,MAX – TCLK1,MIN) = -1 + 1.5 - (1.5 – 0.5) = -0.5 ns set_output_delaymax = TSU,MAX + TW2,MAX – (TCLK3,MIN – TCLK1,MAX) = 2.1 + 2.5 - (0.5 – 1.5) = 5.6 ns
Part 3 DEMONSTARTION Using the timequest timing analyzer TimeQuest GUI, following the DRT and DAT on TimeQuest
Part 4 Advanced considerations False path, multicycle paths,
Exceptions False Path Paths in a design that physically exist, but will never be exercised by the circuit These paths have no functional purpose and do not need to be constrained Specifying a false path removes it from timing analysis Examples: A start up configuration routine that only runs once on power-on A test mode that won’t be used when the device is running
set_false_path –from [get_pin {B}] –to [get_pin {OUT}] Exceptions False Path TimeQuest will evaluate both the functional path and the test path, despite the guarantee that the test path will NEVER be exercised while the circuit is operational! set_false_path –from [get_pin {B}] –to [get_pin {OUT}] MUX 1 A B Some Logic RUN Z OUT A TON of logic and test logic TEST S
Multicycle Path Default Behavior 0.5 to 4.5 clock cycles of delay X D Q 1 Default Behavior Default setup and hold check points are shown Data can take as long as 4.5 clocks cycles to settle Setup check violates timing – check at wrong cycle Z D Q S EN CLK1 Setup Check Hold Check CLK1 0 1 2 3 4 5 X Violation! EN Setup default, hold default
Multicycle Path Default Behavior 0.5 to 4.5 clock cycles of delay X D Q 1 Default Behavior set_multicycle_path 5 -setup -end -from CLK1 –to CLK1 Move setup check 5 cycles over Automatically moves hold check N-1 cycles over (wrong cycle!) Z D Q S EN CLK1 Setup Check Hold Check CLK1 0 1 2 3 4 5 X Violation! EN Setup = 5, hold default
Multicycle Path Default Behavior 0.5 to 4.5 clock cycles of delay X D Q 1 Default Behavior set_multicycle_path 5 -setup -end -from CLK1 –to CLK1 set_multicycle_path 4 -hold -end -from CLK1 –to CLK1 Moves hold check back 4 cycles Z D Q S EN CLK1 Setup Check Hold Check CLK1 0 1 2 3 4 5 X EN Setup = 5, Hold = 4
Multicycle Path Effects of PVT 0.5 to 4.5 clock cycles of delay X D Q 1 Effects of PVT X_FAST shows data valid window for the same system with increased voltage The data settles to its final value quicker because the entire system runs quicker The data valid window for setup gets wider. The data valid window for hold moves earlier (to the left). Similar effects can be seen for decreasing temperature Z D Q S EN CLK1 Setup Check Hold Check CLK1 0 1 2 3 4 5 X X_FAST EN
Multicycle Path Definition When specifying a multicycle path, From, Through, To defines the path Value is the number of clock cycles, N Analysis type is set to either setup or hold, depending on which check you want to edit Reference clock describes how check points for setup and hold timing are moved backwards or forwards Start (launch clock) End (latch clock) Reference clock table describing direction the check would be moved given the settings
Multicycle Path Test your Knowledge D Q D Q Design 1: Design 2: Test your Knowledge Which of the two designs requires a multicycle path? 50 MHz 100 MHz D Q D Q 100 MHz 50 MHz 100 MHz 0 1 2 3 4 5 50 MHz
Multicycle Path Solution Design 1 100MHz on the receiving flip flop X D Q D Q Solution Design 1 100MHz on the receiving flip flop This requires a setup MCP of 2 and a hold MCP of 1 to shift back to the proper cycle check 50 MHz 100 MHz 50 MHz 100 MHz 0 1 2 3 4 5 X
Multicycle Path Solution Design 2 X D Q D Q Solution Design 2 Receiving register is half as fast, will only record data every other clock signal No MCP required – default checks are in the proper cycle 100 MHz 50 MHz 100 MHz 0 1 2 3 4 5 50 MHz X
Network Latency Removed Clock Effects Clock Latency There are two types of clock latency Network Latency: the delay between the clock definition and register clock pins Source Latency: the delay between the clock definition and its source The two sources of latency can make IO timing difficult to meet. Use a PLL inside the FPGA to remove the network latency component (derive_pll_clock command in TimeQuest) Clock Source PLL Clock Source Source Latency Network Latency Network Latency Removed
Clock Effects Clock Uncertainty Ideally, the clock signal arrives at every clock pin instantaneously (assuming perfectly balanced clock) This is not the case. Reasons for this include Skew: the difference in arrival time of a clock signal to different components in the design Jitter: deviations from the exact defined period
Clock Effects Clock Pessimism The absolute amount of extra clock skew introduced by the fact that source and destination clocks are reported with different types of delay even on their common circuitry Clock Source PLL Same buffers, so must have same delay component
Timing Closure Fixing Setup Violations A setup violation occurs when the data arrives too late compared to the destination register’s clock speed Much harder to fix a setup violation compared to a hold violation Methods: Tell fitter to try harder or confine logic to smaller area (logic lock) Rewrite code to remove logic stages Add pipelining Slow down clock (undesirable)
Timing Closure Fixing Hold Violations Occurs when fast data path and clock latch register has latency To solve this, introduce delay into the data path by adding buffers or pairs of inverters Methods Quartus will attempt to fix automatically by adding delay in the data path On some occasions fixing hold breaks setup time
Summary Don’t take timing analysis for granted – can be very expensive if not dealt with in design stage Understand scenarios and think about your constraints Many different scenarios not discussed here such as double data rate, source synchronous IO This class should serve to help you understand digital electronics timing fundamentals so you can design reliable FPGAs and ASICs