Traditional SOC Design Flow Key Problem: Timing assumption during prelayout synthesis widely differs from the post layout reality. This happens because the interconnect delay dominates the overall propagation delay in DSM (Deep Sub-Micron) technologies. As a result getting a timing closure becomes a challenge. Source: Advanced ASIC Chip Synthesis. 2nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers
Set Design Constraints Design Rule Constraints set_max_transition set_max_fanout set_max_capacitance Design Optimisation Constraints Create_clock set_clock_latency set_propagated_clock set_clock_uncertainty set_clock_transition set_input_delay set_output_delay set_max_area Define Design Environment Set_operating_conditions Set_wire_load_model Set_drive Set_driving_cell Set_load Set_fanout_load Set_min_library Develop HDL files Specify Libraries Library Objects link_library target_library symbol_library synthetic_library Read Design analyze elaborate read_file Select Compile Strategy Top Down Bottom Up Optimize the Design Compile Analyze and Resolve Design Problems Check_design Report_area Report_constraint Report_timing Save the Design database write
Design Compiler Setup Files .synopsys_dc.setup Library paths Company wide, project wide design environment related variables and commands UNIX variables Three files at three locations. All three are read in the following order Synopsys root - $SYNOPSYS/admin/setup Affects all users. Only system adminstrator can modify this. In small startups with only single ASIC project, this serves as the place to enforce project wide discipline. Home Directory Content affects all DC activities. Project wide enforcement could happen at these level if the designer is involved in a single project (less likely). Working Directory Affects the current invocation of DC. If a person is working on more than one Synopsys projects (more likely), then the project wide enforcement should happen at this level. One working directory for each project. Repeated commands are overridden
Libraries & Search Path Technology Library Created by ASIC vendor in Synopsys format – which is now an open standard. Cells are defined by their names, function, timing, net delay, parasitic information, units for time, resistance, capacitance etc. Target Library a technology library that Design Compiler maps to during optimization. Link Library The technology library that contains the definition of the cells used in the mapped design. In principle should be the same as target_library unless a technology translation is being performed. Symbol Library Definition of graphics symbols. Cells in Symbol Library must match DesignWare Library A DesignWare component library is a collection of reusable circuit-design building blocks that are tightly integrated into the Synopsys synthesis environment. GTECH Library The GTECH library is the Synopsys generic technology library. It is technology-independent and included with Design Compiler software. GTECH parts are Synopsys unmapped representations of Boolean functions (library cell placeholders). GTECH instantiation allows for a technology-independent HDL description and the accuracy of instantiation. Search_path If the library variables only specify file names, search_path is used to locate libraries. By default points to current working directory and $SYNOPSYS/libraries/syn
Synopsys Design Objects A circuit that performs one or more logical functions Cell An instance of a design or library primitive within a design Reference The name of the original design that a cell instance points to Port The input or output of a design Pin The input or output of a cell Net A wire that connects ports to ports or ports to pins Clock A timing reference object to describe a waveform for timing analysis
Synopsys Design Objects - Schematic
Synopsys Design Objects - VHDL
Synopsys Design Objects - VHDL
Reading Assignment Read about these commands from Synopsys Documentation Find and Filter Read / Analyze / Elaborate Compile Report_timing Also read about what are Attributes and Variables
Outline of this course module Synopsys Design Environment Essentials CMOS essentials for logic synthesis Constraint Classification Load and Drive Constraints Clocking constraints Operating Conditions Constraints Static Timing Analysis Chip Level Timing and Multiple Clock Domains
MOSFET Transistor Source: MIT. Course 6.375. Lecture L06. 2006
Key qualitative Characteristics of MOSFET transistors Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
RC Model of an inverter Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Wires Source: MIT. Course 6.375. Lecture L06. 2006
Distributed RC wire model This is also known as Elmore Delay model Source: MIT. Course 6.375. Lecture L06. 2006
Manual insertion of Repeaters Source: MIT. Course 6.375. Lecture L06. 2006
Lumped RC wire model Source: MIT. Course 6.375. Lecture L06. 2006
Estimate the rise time Source: MIT. Course 6.375. Lecture L06. 2006
The factor 2.2 comes from 90% Vdd swing loge(0.9Vdd / 0.1Vdd) Width of transistor is found by multiplying the scaling factor (16/8/2/1) with the minimum width of transistor which is 0.5 mm. Multiply Cg,N/Cg,P/Cd,N/Cd,P with the width of the transistor to get the drain/gate capacitances for P and N transistors. Wider transistor more capacitance Divide Reff,N/Reff,P with the width of the transistor to get the Resistance for the N and P transistors. Wider Transistor Less resistance The factor 2.2 comes from 90% Vdd swing loge(0.9Vdd / 0.1Vdd) The sheet resistance (0.07) is for unit square. Since the wire width is 0,25mm. resistance for 1 mm X 0.25 mm wire is 0.07/0.25. This factor is multiplied by the length 250 mm The wire capacitance is made up of two parts: Bottom (area) capacitance found using 250 X 0.25 (area) X CA,M2. Side capacitance is found by multiplying length 250 XCL,M32 Source: MIT. Course 6.375. Lecture L06. 2006
Constraints Technology, Operating and Manufacturing Constraints Max rise time, max capacitance Operating Conditions – Vdd, Temperature Drive current, Load Process Variations Fast corner, Slow corner Physical Design Antenna rules Optimisation Constraints Performance – clock Area Power
Generic Synthesis Flow Design Create a solution Technology, Operating & Manufacturing Constraints Optimisation Constraints Evaluate the solution Analysis Constraints Met
Static Timing Analysis (STA) Exhaustively verifies that the timing constraints (clock) are met for a design for given technology (Standard Cell Library) and a set of specified operating conditions Limitations of the alternative – Simulation Not Exhaustive Accuracy RTL Gate Level SDF back annotation Dependent on STA Circuit Level SPICE simulation are impractical Time (STA also takes time, but is bounded) PROCESS (clk) BEGIN IF rising_edge (clk) THEN s <= a * b; END IF; END
Timing Models - Accuracy Untimed Transaction Level - SystemC Multiple Cycles Bus Transactions, Transmit/Receive, Encode/Decode Cycle Accurate – RTL What happens in each clock cycle is accurately known Gate Level – Event Driven Physical details of computation, storage and interconnect operations known Delay in wire is not known Clock is ideal Layout Level Delay in wire known Clock is real Relative position of standard cell is known
Delay Parameters – Intrinsic Delay & Slew B Z Vdd 0.5Vdd t1 t2 P Q R y z x t1 t2 0.3Vdd 0.7Vdd Vdd
Path Delay Calculation The intrinsic delays and the slews are characterised using SPICE simulation by sweeping many parameters that affects the Intrinsic delay and Slew All the paths are exhaustively covered Library and Design Delay Computation Through Gate Through Wire Delay and Slew At Gate Output At Next Gate Input D B A C Environment Conditions for Analysis
Paths & Path Groups Paths Start point: Input ports or clock pins of sequential devices and End point: Output ports or Data input pins of sequential devices. Path groups Paths are organised in groups identified by clocks controlling their endpoints.
Timing Arcs positive unate timing arc: negative unate timing arc: Combines rise delays with rise delays, and fall delays with fall delays. An example is an AND gate cell delay or an interconnect (net) delay. negative unate timing arc: Combines incoming rise delays with local fall delays, and incoming fall delays with local rise delays. An example is a NAND gate. nonunate timing arc: Combines local delay with the worst-case incoming delay value. Nonunate timing arcs are present in logic functions whose output value change cannot be predicted by the direction of the change on the input value. An example is an XOR gate. Accuracy of estimates is critical Intrinsic Delays are accurate after logic synthesis Slew and Net Delays are estimated and known accurately only after physical synthesis
Factors Affecting Delay and Slew B P1 P2 N1 N2 Z Discrete Factors: Geometry & Dimension Specific Path Transition Direction Related Pin 4 Input NAND gate
Factors Affecting Delay and Slew Load on the Gate Load of all the inputs that this output has to drive Load of the interconnect wires Tri-stated wires Input Slew Transition time at the previous gate The interconnect Primary input – drive strength, driver cell
Constraints Technology Constraints Design Constraints Max Transition Max Fanout Max Capacitance Min Capacitance Design Constraints Set Load Set Drive (inverse of resistance)
If load is not specified, the synthesis tool assumes zero load Technology Constraint; Cannot be relaxed Design Constraint A Z3 Z2 Z1 5 set_load or set_drive set_driving_cell If drive or driving cell is not specified, the synthesis tool assumes infinite drive strength If load is not specified, the synthesis tool assumes zero load
Interpolation and Extrapolation Piece Wise Linear Model Slew Load S1 S2 L1 L2 D11 D12 D21 D22 L S D1 D2 D
Process, Voltage, Temperature (PVT) Variation & Operating Conditions Delay best nominal worst Voltage Temperature Operating Conditions Name Library Process Temp Volt Interconnect Model WCCOM my_lib 1.50 70 1.1 worst_case_tree WCIND my_lib 1.50 80 1.1 worst_case_tree WCMIL my_lib 1.50 125 1.0 worst_case_tree BCCOM my_lib 1.50 0 1.2 best_case_tree BCIND my_lib 1.50 -40 1.2 best_case_tree BCMIL my_lib 1.50 -55 1.3 best_case_tree
PVT Variation: An Example Consider a minimum size NMOS device in a 1.2 mm CMOS process. VGS =VDS = 5V The nominal saturation current for the device size W = 1.8 mm, Leff = 0,9 um Now consider the variation in the following parameters: 25 % variation in Threshold voltage – Vt 10 % variation in transconductance k’n mainly due to variation in oxide thickness. ±0.15mm (about 10 %) variation in W and L. Variations in W and L are uncorrelated as they are ±0.5V (10%) variation in power supply voltage Speed of device is proportional to the drain current and can thus result in variation of the speed of the circuit.
Derating Libraries are characterized for various operating conditions Further characterisation is done to see how the delay model responds to change in process, voltage and temperature. This is done by holding two parameters constant and sweeping the third. This yields derating factors for Process, Voltage and Temperature
Sequential Arcs Timing relationship between two input pins two consecutive events on the same input pin Pulse Width Setup Hold Recovery Removal
Pulse Width Width of High and low phases of clocks Width of Active level of asynchronous inputs like reset rst_n Pulse Width Requirement Not met. Reset may have no effect
Setup Data should be stable setup time before the arrival of clock edge. What happens if the setup time is violated ? clk Setup Requirement Not met. New data may not get latched data
Hold Data should be stable hold time after the arrival of clock edge. What happens if the Hold time is violated ? clk Not met. Old data may not get latched data Hold Requirement
Recovery and Removal Minimum time between de-assertion of an asynchronous control signal and the next active clock edge Minimum time between an active clock edge that an asynchronous control signal should remain asserted rst_n Recovery Requirement Not met. clk may not have effect clk clk Removal Requirement Not met. clk may override rst_n rst_n Can be formulated as a setup check Can be formulated as a hold check
What is the reason for setup and hold b c Vin1, Vout2 Vin2, Vout1 Vin2 = Vout1 Vin1 Vout1 Vin2 Vout2 c b a Vin1 = Vout2
Transistor Level Schematic of a D-Flop http://www. edn
Working of the D-Flop work at Transistor Level http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Setup and Hold Time at Circuit Level The time it takes data D to reach node Z is called the setup time. The time it takes data D to reach node W is called the hold time. http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Negative Hold Time http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Generalizing Setup & Hold Constraints Setup Constraint Boundary of the Flop Assume C1 is zero clk reaches F1 before data has arrived at F1 and registers wrong data To avoid this, data should stabilize D1 time before the arrival of clk. In reality, C1 is never zero, so data should stabilize D1-C1 time before the arrival of clk. As there are multiple D1 paths and multiple C1 paths, the complete and safe setup constraint is max (data path delays) – min (clock path delays) Delay D1 data F1 Delay C1 clk Hold Constraint Assume D1 is zero Data reaches F1 before clk has arrived at F1. When the clk arrives, new data has overwritten the previous data. To avoid this, data should remain stable C1 time after the arrival of clk. In reality, D11 is never zero, so data should remain stable C1-D1 time after the arrival of clk. The complete and safe hold constraint is max (clock path delays) – min (data path delays)
Negative Hold data clk F1 Delay D1 Delay C1 Boundary of the Flop Typically clock paths are well buffered and faster There can be substantial data path delay, especially in scan flops max (data path delays) – min (clock path delays) is always positive. This implies that Setup constraint is never negative max (clock path delays) – min (data path delays) can be negative. This implies that Hold constraint can be negative clk Negative Hold – Seen At Device Interface At Device Interface At Latching Element data Stable New Setup + Hold (cannot be negative) = Max(clock path) + Max(data path) – Min(clock path) – Min(data path)
Specifying Input Delay Good design practice mandates that inBlock does not have a combinatorial logic (”m”) driving output These days ”m” is more likely to be the result of global interconnect delay. Early floorplanning is a good way to estimate the delay due to ”m” If floorplanning is not done a good bet is 50-60% of the clock cycle Characterize command automatically calculates input delay from parent design set_input_delay -clock Clock 8 “data_in_2”
Specifying Output Delay set_output_delay -clock Clk -max -fall 10 {"Z<0>" "Z<1>"}
General Timing Constraints clk F1 C1 F3 F2 C0 C2 C3 O1 C4 I2 O2 O2 = TI2 + C4 Four kinds of path groups exist: Input to Output, e.g., I2 to O2 Input to Register, e.g, I1 to F1 Register to Register F1 to F2 Register to Output F3 to O1 TI1 + C0 ≤ P – S1 TI1 + C0 ≥ H1 Setup Slack: P- S1- TI1- C0 Hold Slack: TI1 + C0 - H1 Setup and Hold Slacks should be positive TI1, TI2 are input delays DQ1, DQ2 and DQ3 are clk-to-Q delays S1, S2 and S3 are setup constraints H1, H2 and H3 are hold constraints C0-C3 combinatorial delays P is the clock Period DQ1 + C1 ≤ P – S2 DQ2 + C1 ≥ H2 Setup Slack: P - S2 - DQ2 - C1 Hold Slack: DQ2 + C1 – H2
Gate Level Simulation Gate Level Design Simulator Timing Analysis Tool Simulation Library Timing Library SDF File
Clock Distribution Source: MIT. Course 6.375. Lecture L06. 2006
Clock Skew Clock Skew in Alpha Processor The basic assumption in synchronous system is that all the sequential elements in the design sample their input at the same time, marked by a clock signal. In reality, the clock signal does not arrive at the sequential elements at the same time. The difference in time between the reference clock signal and the local clock signal at a sequential element is called the clock skew. In fact clock skew would not be a problem if the clock signal was uniformly delayed at all the sequential elements. It is the non-uniform delay of the clock signal that creates the problem. The delay depends on the distance of the sequential element from the clock source and the local load. The primary reason for the delay is the large amount of load seen by the clock signal. The load consists of all the sequential elements in the design and clock net itself which behaves as a distributed RC line (or higher order models ) and can be several cms long in a large chip. The total capacitance of a single clock line easily measures hundreds of pF and can easily reach into nF range. The total clock capacitance of the Alpha processor equals 3.25 nF, which is 40% of the total switching capacitance of the entire chip.
Clock Skew Source: MIT. Course 6.375. Lecture L06. 2006
Clock Jitter Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Skew and Sequential Circuit Performance Each synchronous module is composed of combinational logic CL and a Flop and is characterised by six timing parameters: The min. and max. propagation(pg) delays of the register: tr,min, tr,max and combinational logic: tl,min, tl,max. The propagation delay of the interconnect ti and the local clock skew tf. The max pg. delay corresponds to the time taken by the slowest output to respond to any transition at input. This delay constraints the max. allowable clock speed. The min pg. delay corresponds to the time taken by atleast one output to start responding to a transition at input. This delay is typically much smaller than the max delay and determines the amount of skew a circuit can tolerate before race condition occurs. If d is greater tr,min + ti + tl,min than inputs at R2 can change before the previous inputs are latched. tf” tf’ + tr,min + ti + tl,min OR d tr,min + ti + tl,min tf” + T tf’ + tr,max + ti + tl,max OR T tr,max + ti + tl,max - d
Positive and Negative Clock Skew Positive Skew: d > 0: In this case the clock is routed in the same direction as the data and the first equation needs to be satisfied. Violating it will result in malfuntioning of circuit. Observe that slowing down the clock period does not help. The positive skew actually helps improve the clock speed as it is a negative factor in the constraint on clock period T. Negative Skew: d < 0: The negative skew occurs when the data is routed in the direction opposite to the clock signal. The first equation is unconditionally satisfied and the circuit works correctly independent of the skew. Unfortunately, negative skew will limit the clock speed and thus lower the performance, as predicted by the second equation: the skew reduces the time available for computation by |d|.
c a b d a b a b c d Setup time met Hold time met Launch Clock Capture a b c d Setup time met Hold time met Launch Clock a b c Capture Clock a b d
c a b d a’ b’ a b c d Setup time violated Hold time violated Launch b c d Setup time violated Hold time violated Launch Clock a b c Capture Clock a’ b’ d
c a b d a b c d Setup time violated Hold time met Launch Clock Capture a b c d a b Setup time violated Hold time met c Launch Clock Capture Clock d
Setup Violations result from worst case timing FF 1 logic FF 2 startpoint endpoint setup relationship hold Setup Violations result from worst case timing Hold Violations result from best case timing
Chip Level Timing Issues Blocks 4 & 8 communicate and need their clocks to be skew alligned The data signals between Blocks 4 & 8 could take more than one clock cycle and can get routed through blocks 5 and 6 1 CGU 2 3 4 6 5 7 8 1 CGU 2 3 4 6 5 8 7 This makes chip level timing closure difficult and sensitive to geometry. A hierarchical design style, where each chiplets are timing closed independently and chip can be composed from such chiplets. Solution: Latency insensitive design.
Categories of Synchronization Clock Based Data Based GS Double Latch GALS Handshake: 2 Phase, 4 Phase GRLS (KTH Technology) Asynchronous – 2 Clock FIFO The techniques for synchronization can be divided into two broad categories, one that is based on clock and the other that is based on data. On the clock based side we have the Globally Synchronous style which inspite of all the problems and criticism is the most deeply entrenched and is unlikely that we will completely abandon it anytime soon. The Globally Asynchronous and Locally Synchronous or GALS style has become the most talked about synchronisation style and in its purest form involves some fancy techniques for stretching the clock until the data is safely exchanged. As it involves some non standard design techniques the purest form of GALS is yet to become mainstream but there are other GALS style that does the same thing in spirit but uses standard tools flow and this is what I would like to emphasize in the next few slides. On the data side, we have the ever popular double latching which once again inspite of its weakness is going to remain at the heart of most clock domain crossing techniques including the one is used in the Islands of Synchronicity. 2 and 4 phase handshake is also used in IOS and the one that in many respects is at the heart of IOS methodology as far clock domain crossing techniques ambiguity Latency Data based synchronization Clock based synchronization Constraints Complexity
Send and Forget – Double Latching ACL: Asynchronous Communication Link Source Destination ACL S D CLKs CLKD PD PS D Q Ps PD CLKD Double latching may not be the ideal island hopping technique for several reasons but it is still very useful. The problem it tries this technique tries to solve is to avoid metastability and increase the MTBF to acceptable levels. To look at why this technique works and why it fails, let us consider a simple model that describes the dynamics of voltage rampup in a simple synchronizer. There are two factors that decides the ramp up. One is a constant Metastable value Vms at midpoint between the high the low region. The second factor is the difference between Vms and the v(0) the instantaneous voltage sampled by the clock at time 0 that is exponentially amplified. So if this difference in the second factor is zero or close to it the ramp up or down would be very slow but it will eventually settle to one or zero. So we want to give this dynamics the maximum time to settle down before consuming it in the synchrnous world en entire clock period is the maximum one has and that is all what double latching is about. So what is the problem, there are two issues, the first one is that if the same signal is double latched at different places and if because of skew they sample different values, i.e. V(0) is on different sides of the Vms value, then the double latching will saturate to different values in different parts of design and that can be potentially disastrous.
Send and Forget – Double Latching The second problem is that to be absolutely safe the destination frequency has to be slightly lower then the payload frequency to factor in the sample and hold window and the jitter. The third issue with double latching is that there is no flow control and that is the reason why it is called send and forget But all in all it is a good robust low cost method for clock domain crossing method that is widely used to single bit control data that is often used to qualify the data transfer which is typically not double latched. Data can also be double latched if it is grey coded. The MTBF lowers as the frequency increases without double latching can be down to thousands of seconds at relatively modest frequency without double latching as you can see from this data from TI. Advantages Good choice for single bit control data Grey coded multi bit data payloads are also target Disadvantages No Flow Control Send and Forget Metastable signal to multiple targets could resolve to different values
Handshake ACL Asynchronous Communication Link CLKs CLKD PD PS RS AS RD AD D Q AS Ps PD FSM AD RD RS CLKs CLKD Pd: Destination Payload Ps: Source Payload Handshaking for data transfer is a very reliable clock domain crossing method and comes in two variants the two phase handshake that requires detection of polarity of signal and the four phase protocol that relies on edge detection and returns the control signals to zero. The logic involved in generation of the handshake signals request and acknowledge often involve use of Muller C Element that is easily realised using an SR Flop The flow control signals could suffer from metastability but the protocol guarantees a clean transfer of data And to reduce the metastability in the flow control signals they are double latched.
Source: 27 MHz, Destination: 200 MHz Data payload frequency must be less than the worst-case round trip delay of the flow control 2-phase 3Ts + 3Td ≥ TPs 4 phase 6Ts + 6Td ≥ TPs While the flow control adds robustness, there is performance penalty to be paid. The data payload frequency must be less than the worst case round trip delay of the flow control and in case of 2 phase protocol with double latching of the flow control signals involves 3 cycles on the source side and 3 cycles on destination side and the round trip delay for the 4 phase protocol is even force involving 6 cycles on each side. If you translate this to a real life scenario where we have an isochronous traffic coming in at 27 MHz and being consumed by a bursty destination at 200 MHz and if we decide to use the 2 phase protocol the maximum payload rate we can sustain would be down to about 8 MHz. Example: Source: 27 MHz, Destination: 200 MHz Maximum isochronous data rate using 2 phase protocol 3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
2-phase 4 phase 3Ts + 3Td ≥ TPs 6Ts + 6Td ≥ TPs 2-phase 4-phase The period for which data remains valid/asserted 4 phase 6Ts + 6Td ≥ TPs Note that TPs does not decide data payload frequency. TPs is less than the round trip delay to enable the next payload to be transferred immediately after the round trip delay is over. The period (TPL)corresponding to the data payload frequency has to be more than the worst case round trip delay i.e. 3Ts + 3Td ≤ TPL and 6Ts + 6Td ≤ TPL for 2 and 4 phase protocols respectively. This is illustrated in the example below Data payload frequency must be less than the worst-case round trip delay of the flow control 2-phase 3Ts + 3Td 4-phase 6Ts + 6Td Example: Source: 27 MHz, Destination: 200 MHz Maximum isochronous data rate using 2 phase protocol 3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
2 Clock Asynchronous FIFO An improvement in terms of data transfer rate that can be sustained is the concept of 2 clock asynchronous fifo. The asynchronous FIFO completely decouples the source and destination islands. The source island can write data to a non –full FIFO and the read island can read data from a non empty FIFO. Each island can process data on every clock cycle and no interaction is necessary between two islands to pass data. Flow control is achieved by monitoring the full and empty flags. Alternatively, the FIFO levels can be monitored and action can taken at different trigger points. The write interface consists of grey code counter that generates the write pointer where the newest element is written. The read pointer is synchronised by double latching before processing it and together with the write pointer helps decide the FIFO being full and the write level. The read interface is very similar to the write interface and the read pointer indicates the oldest element in FIFO. A flush signal loads the read pointer with the synchronised write pointer value thereby clearing the FIFO. The design is tolerant to metastability because either the current or the previous value will be sampled. Which means that the FULL flag may be active inspite there being room to write, but atleast nothing gets overwritten. Similarly the empty flag may be active inspite of FIFO not being empty but once again this is better than reading an empty FIFO. While this FIFO provides for a robust clock domain crossing there are some issues with when deploying it as an island hopping technique because typically the read side must be able to send the readaddress to the storage FIFO and get read data back in one cycle. So the solution is to move the storage data to the read island but then we have the issue of write data and write clock having to travel arbitrary distance between islands and we do not want write clock originating from one island to be used as it is in another island as we will run into the problems of Skew and clock tree balancing that we started out to avoid. Fail Safe, Self Correcting: Write logic could think the FIFO is full when it is not Read logic could think that the FIFO is empty when it is not Not suitable for Island hopping: Storage in Write Island is a problem Typically the read side needs to be read every cycle
GALS Globally Asynchronous Locally Synchronous Source: ETH, Zurich
GALS
Clocking and Communication Schemes Synchronous Design – phase and skew alligned Mesochronous Design – same clk freq and phase alligned Ratiochronous Design Different Clock freqs but have rational relationship – phase alligned KTH research Pleisochronous No rational clock relationship – phase relationship drifts Asynchronous
Ideal vs Real Clock During the initial phase of synthesis clock is ideal set_auto_disable_drc_nets command should be used to prevent DC from wasting time on fixing DRC violations on high fanout nets like Resets and Clocks Model skew and jitter effects using the set_clock_uncertainity command Model clock network latency using set_clock_latency command Once clock tree has been inserted use the set_propagated_clock command to use the actual clock. Back annotation using read_sdf command is required
Modelling Clock Skew