1 Performance Analysis (Clock Signal)
2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock period
3 Flip-flop-based system performance analysis
4 Flip-flop-based system model Clock signal is perfect (no rise/fall), period P Clock event on rising edge Setup time s –Time from arrival of combinational logic event to clock event Propagation time p –Time for value to go from input to output (t co ?) Worst-case combinational delay C –Time from output of flip-flop to input
5 Clock period constraint P >= p + C + s. s p C
6 Clock parameters
7 Clock with rise/fall t r is large because the clock wire is long and has high capacitance.
8 Rise/fall clock period constraint P >= t r + p + C + s s p C trtr
9 Skew Skew: relative delay between events. Clock skew: can harm any sequential system.
10 Clock skew Clock must arrive at all memory elements in time to load data.
11 Clock skew in system DQDQ logic
12 Clock skew and qualified (gated) clocks
13 Clock skew analysis model s 12 = 1 – 2 s 21 = 2 – 1 Assume 1 > 2 (s 12 > 0) φ
14 Skew and clock period Assume that each flip-flop operates instantaneously: If clock arrives at FF1 after FF2, then there is less time for the signal to propagate through the combinational logic. Given clock period, determine allowable skew: P >= 2 + s 12
15 Clock distribution Often one of the hardest problems in clock design. –Fast edges. –Minimum skew.
16 Clock skew example 10 ps 20 ps 30 ps DQDQDQ DQ
17 Clock H-Tree
Digital Clock Manager (DCM) A hard block in FPGAs –Gets clock input –Generates daughter clocks FPGA can have multiple DCMs Provides clocks to –internal circuitry –external devices on board 18
DCM 19
DCM Functions Jitter removal 20
DCM Functions Frequency synthesis: –Clock frequency generated outside ≠ frequency needed in our FPGA multiplies/divides it to generate daughter clocks Can generate even other ratios (3/4, 4/5 of original) –In Spartan 6: n times or n/2 times (n ∈ [1,16]) 21
DCM Functions Phase shifting: –Some designs need phased-shifted clocks Some DCMs allow common values –120, 240: for 3-phase clocking scheme –90, 180, 270: for 4-phase clocking scheme Some DCMs allow to set exact values 22
DCM Functions Clock deskewing: –DCM gets a special input –DCM compares the two signals –Adds additional delay to the daughter clock to align with the main clock Two types: –PLL (phase-locked loop): analogue –DLL (digital-delay locked loop): digital 23
DCM Functions Auto skew correction: 24
DCM You can enter DCM parameters in tools: –Frequency –Duty cycle –Phase shift –… See Xilinx ISE In-Depth Tutorial, UG695 (v 14.1),
26 Case Study 16 x 16 multiplier example.
27 The FPGA design process Xilinx ISE (Integrated Synthesis Environment) –Translation from HDL –Logic synthesis –Placement and routing –Configuration generation
28 Design experiments Synthesize with no constraints. Synthesize with timing constraint. –Tighten timing constraint. Synthesize with placement constraints. Power: –Many tools don’t allow us to directly specify power consumption –Some tools allow us to specify power as an objective –May need to rewrite our h/w description for better power consumption characteristics.
Commercial Tools XST “-power” option reduces dynamic power consumption. Xilinx MAP and PAR“-power” option reduces dynamic power –But increases runtime and decreases design performance. Quartus-II has Power-Driven Synthesis and Place & Route. 29
30 Post-translation simulation model No timing or area constraints HDL model in terms of FPGA primitives. Example: X_LUT4 \p12_Madd__n0015_Mxor_Result_Xo 1 (.ADR0(x_7_IBUF),.ADR1(y_13_IBUF),.ADR2(c12[7]),.ADR3(row12[8]),.O(row13[7]) );
31 Mapping report Design Summary Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of 4 input LUTs: 501 out of 1,024 48% Logic Distribution: Number of occupied Slices: 255 out of % Number of Slices containing only related logic: 255 out of % Number of Slices containing unrelated logic: 0 out of 255 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 501 out of 1,024 48% Number of bonded IOBs: 64 out of 92 69% Total equivalent gate count for design: 3,006 Additional JTAG gate count for IOBs: 3,072 Peak Memory Usage: 64 MB
32 Static timing analysis report Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" uS ; items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors) Maximum delay is ns After Mapping: estimated delays (no information about interconnects)
33 Static timing report: delays along paths Data Sheet report: All values displayed in nanoseconds (ns) Pad to Pad Source Pad |Destination Pad| Delay | x |p | 5.824| x |p | | x |p | | x |p | |
34 Routing report Phase 1: 1975 unrouted; REAL time: 11 secs Phase 2: 1975 unrouted; REAL time: 11 secs Phase 3: 619 unrouted; REAL time: 12 secs Phase 4: 619 unrouted; (0) REAL time: 12 secs Phase 5: 619 unrouted; (0) REAL time: 12 secs Phase 6: 619 unrouted; (0) REAL time: 12 secs Phase 7: 0 unrouted; (0) REAL time: 12 secs The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0 REAL time: Routing algorithm run time.
35 Static timing after routing Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" uS ; items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors) Maximum delay is ns (vs ns in mapping report) Because of interconnect delays.
36 Timing constraint Use timing constraint editor:
37 Post-map static timing report Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 32 nS ; items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors) Maximum delay is ns. Pad to pad Hasn’t changed since this design has limited opportunities for logic synthesis to change delays by restructuring logic.
38 Post-routing static timing report Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 32 nS ; items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors) Maximum delay is ns. Tools generally try to meet the delay goal as closely as possible to minimize area.
39 Tighter timing constraints Tighten requirement to 25 ns. Post-place-route timing report: Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 25 nS ; items analyzed, 11 timing errors detected. (11 setup errors, 0 hold errors) Maximum delay is ns.
40 Report on a violated path Slack: ns (requirement - data path) Source: y (PAD) Destination: p (PAD) Requirement: ns Data Path Delay: ns (Levels of Logic = 31) Modify the logic and/or physical design to improve the delay.
41 Power report Power summary: I(mA) P(mW) Total estimated power consumption: Vccint 1.50V: 0 0 Vccaux 3.30V: Vcco V: Inputs: 0 0 Logic: 0 0 Outputs: Vcco Signals: Quiescent Vccaux 3.30V: Quiescent Vcco V: 1 3 Thermal summary: Estimated junction temperature: 36C Ambient temp: 25C Case temp: 35C Theta J-A: 34C/W Helps us determine whether we need additional cooling.
42 Improving area Floorplanner window: –Floorplanner View/edit placed design LEs Chip floorplan Green rectangles: mapped components to CLBs
43 Rat’s nest wiring If you click on a component in the deign hierarchy window, its rat’s nest is shown.
44 Routing editor view FPGA Editor View/Edit Routed Design
45 Editing constraints Use constraints editor to place constraints: –This tool allws you to constrain 1. placement of logic 2.assignment of chip I/Os to IOBs (e.g useful for PCB design)
46 Design browser pane
47 Drag and drop constraints
48 Change the shape of constraints
49 Full set of placement constraints We place the rows of the multiplier one below the other to create the row structure of the floorplan.
50 Placement results
51 New timing report After placement constraints: items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors) Maximum delay is ns. Compares to ns for unconstrained placement.