Reconfigurable Computing - Verifying Circuit Performance! John Morris Chung-Ang University The University of Auckland ‘Iolanthe II’ in a good breeze on.

Slides:

Advertisements

Similar presentations

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.

Advertisements

Lecture 13: Sequential Circuits

Registers and Counters

EXTERNAL COMMUNICATIONS DESIGNING AN EXTERNAL 3 BYTE INTERFACE Mark Neil - Microprocessor Course 1 External Memory & I/O.

ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.

October 16, 2002Flip-flops1 Summary : Latches A sequential circuit has memory. It may respond differently to the same inputs, depending on its current.

Flip-Flops Last time, we saw how latches can be used as memory in a circuit. Latches introduce new problems: We need to know when to enable a latch. We.

The 8085 Microprocessor Architecture

1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.

Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.

Reconfigurable Computing - Clocks John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia.

Assume array size is 256 (mult: 4ns, add: 2ns)

Reconfigurable Computing - Verifying Circuits John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

1 Simple FPGA David, Ronald and Sudha Advisor: Dave Parent 12/05/2005.

Lecture 4: Computer Memory

Programmable logic and FPGA

Comparators  A comparator compares two input words.  The following slide shows a simple comparator which takes two inputs, A, and B, each of length 4.

Sequential Circuit  It is a type of logic circuit whose output depends not only on the present value of its input signals but on the past history of its.

Counters and Registers

Chapter 7 Counters and Registers

Introduction to FPGA Design Illustrating the FPGA design process using Quartus II design software and the Cyclone II FPGA Starter Board. Physics 536 –

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.

CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.

Khaled A. Al-Utaibi  Interrupt-Driven I/O  Hardware Interrupts  Responding to Hardware Interrupts  INTR and NMI  Computing the.

Reconfigurable Computing - Verifying Circuits Performance! John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn.

Reconfigurable Computing - Assignment Feedback John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

Chapter 6-1 ALU, Adder and Subtractor

Project Presentation: Physical Unclonable Functions

Microprocessor-Based System. What is it? How simple can a microprocessor-based system actually be? – It must obviously contain a microprocessor otherwise.

Digital Logic Computer Organization 1 © McQuain Logic Design Goal:to become literate in most common concepts and terminology of digital.

J. Christiansen, CERN - EP/MIC

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.

Reconfigurable Computing - Type conversions and the standard libraries John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots.

SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 

Basic Sequential Components CT101 – Computing Systems Organization.

 Counters are sequential circuits which "count" through a specific state sequence. They can count up, count down, or count through other fixed sequences.

ENG241 Digital Design Week #8 Registers and Counters.

Reconfigurable Computing - FPGA structures John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.

EE3A1 Computer Hardware and Digital Design

Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.

Computer Architecture Lecture 3 Combinational Circuits Ralph Grishman September 2015 NYU.

1 System Clock and Clock Synchronization.. System Clock Background Although modern computers are quite fast and getting faster all the time, they still.

Lecture 21: Registers and Counters (1)

Reconfigurable Computing - Pipelined Systems John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

Introduction to Microprocessors - chapter3 1 Chapter 3 The 8085 Microprocessor Architecture.

Reconfigurable Computing - Verifying Circuits Performance! John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn.

CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.

M211 – Central Processing Unit

REGISTER TRANSFER LANGUAGE (RTL) INTRODUCTION TO REGISTER Registers1.

Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.

EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.

Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,

5-1-2 Synchronous counters. Learning Objectives: At the end of this topic you will be able to: draw a block diagram showing how D-type flip-flops can.

COMBINATIONAL AND SEQUENTIAL CIRCUITS Guided By: Prof. P. B. Swadas Prepared By: BIRLA VISHVAKARMA MAHAVDYALAYA.

Reconfigurable Computing - Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

Sequential Logic Design

Basic Computer Organization and Design

The 8085 Microprocessor Architecture

The 8085 Microprocessor Architecture

Registers and clocking issues

Reconfigurable Computing - Verifying Circuits

Instructor: Alexander Stoytchev

Instructor: Alexander Stoytchev

The 8085 Microprocessor Architecture

Instructor: Michael Greenbaum

Presentation transcript:

Reconfigurable Computing - Verifying Circuit Performance! John Morris Chung-Ang University The University of Auckland ‘Iolanthe II’ in a good breeze on the Bay of Islands

Measuring Circuit Performance  Don’t believe the simulators!  Although some experience has shown that predictions can be reasonably accurate …  Potential for gross error is very large  A large number of small values need to be summed  Possibility of large statistical errors  Professional engineers always check That’s what makes them professional!  Scientists always want to be able to repeat an experiment  That’s a principle of scientific theory  Don’t accept anything as fact unless you can repeat it!  Whatever your background or reason …  Measurement on an actual device needed  You can use the simulator’s numbers for guidance though!

Measuring Circuit Performance  Use the simulator’s results as a guide  But what does it tell you?  It calculates propagation delays from inputs to outputs along various circuit paths  Simulators try to identify the longest (in time) path for you  In a simple combinatorial block that’s fine eg a one-stage (no registers) adder Should identify the carry chain in a ripple carry adder or its equivalent in a more complex adder a single-stage parallel array multiplier Again – in all types of multipliers – there’s a carry chain that limits performance  In a pipelined circuit, you want the longest path between two clocked flip-flops In principle, easy for the simulator to find! In practice, you may need to spend more time checking that it selected the right path!

Measuring Circuit Performance  Checking the simulator’s predictions  Do a sanity check!  Using the manufacturer’s published propagation delays for individual circuit elements Estimate the path delay yourself Count the number of logic blocks needed for the computation Will additional multiplexers be needed for steering or selection logic? Are I/O buffers needed? These typically have a considerable delay (relative to other circuit elements)

Measuring Circuit Performance  Using the manufacturer’s published propagation delays for individual circuit elements  Estimate the path delay yourself  …  You can use the synthesizer to help you here  Its count of the number of the total number of logic blocks will be 100% accurate  From this, you infer the number of logic blocks in a path eg  For a 32-bit adder, you can obviously start by dividing the total number of logic blocks by 32  Then try to estimate how many logic blocks are needed for overheads, eg Multiplexers needed in a carry select adder  For FPGAs, remember …

Measuring Circuit Performance  Using the manufacturer’s published propagation delays for individual circuit elements  Estimate the path delay yourself  For FPGAs, remember … 1.Look up tables (LUTs) are usually used for boolean logic  This means that Using Xilinx’s 9-input CLBs  y <= a AND b probably takes about the same time as  y <= a AND b AND c AND d AND … (up to 9 inputs)  Beyond 9 inputs, add a considerable delay to connect to a neighbouring CLB Using Altera’s 4-input logic elements  y <= a AND b probably takes about the same time as  y <= a AND b AND c AND d (up to 4 inputs) Beyond 4 inputs, add a small delay to use the fast cascade chain logic

Measuring Circuit Performance  Using the manufacturer’s published propagation delays for individual circuit elements  Estimate the path delay yourself  For FPGAs, remember … 2.Paths between logic blocks may have large numbers of transmission gates on them!  As noted before, there’s a considerable advantage to being able to keep critical logic on one logic block But Altera’s cascade chains attempt to mitigate the penalty for not fitting critical logic into a single logic element And all manufacturers now provide for fast adder carry chains!  This makes estimation of path delays difficult  Nevertheless, you should make a rough estimate!!

Measuring Circuit Performance  Estimate the path delay yourself  If your estimate matches that from the synthesizer, then we’re in good shape  ‘Matches’ here can be interpreted liberally  If the synthesizer reports 50ns and you calculate 30ns then this is a reasonable match You probably didn’t count enough transmission gates, etc, on the connections between logic blocks! You don’t need to do a very precise calculation The synthesizer has done that for you! Your aim is to ensure that you are reading the correct number from the synthesizer’s report!  With a reasonable match (say within 50% - either way), believe the synthesizer and continue …  With a serious mismatch 1.Read the synthesizer’s report more carefully You may be looking at the wrong figure! 2.Check your estimate more carefully

Now we believe we know how fast the circuit is …  What does this speed mean in practice?  You have a longest delay of x ns  A synchronous (clocked) circuit can run at 1/x GHz ?  Almost!  Don’t forget to allow for 1.Propagation delay in the registers 2.Temperature Circuits run slower at high T Make sure that your estimate of t pd is a good one for the highest temperature your circuit will need to withstand Don’t think that this will be low! Try touching a modern high performance processor! (Make sure you have some burn cream nearby!) or simply work out that all those fans hiding that chip aren’t there for decoration! 3.Chip-to-chip variations in fabrication …  32-bit adder – inputs a, b, c  Naïve approach - Test all possibilities a – 4  10 9 ( all possible 32-bit numbers ) b – 4  10 9 ( do ) c – 2 ( 0 or 1 ) Total 4  4  2  = 1.6 x GHz machine – 10 9 cases / sec (optimistic!) 1.6  seconds – about 6 months will do it! What about the rest of the machine? -, x, /, ^, v, >, … We should be finished in about 5 years Hmmmm … our 4 GHz machine should be about 30 GHz now!  Clearly we need to be more efficient about testing!

Now we believe we know how fast the circuit is …  What does this speed mean in practice?  You have a longest delay of x ns  A synchronous (clocked) circuit can run at 1/x GHz ?  Almost!  Don’t forget to allow for 1.Propagation delay in the registers  More on pipelines later! 2.Temperature 3.Chip-to-chip variations in fabrication The gates will only be nominally 0.18  ! Some may actually be 0.15  and others 0.25  …  A maximum clock frequency of 1/(x+  ) GHz   may be quite large!  Now you’re ready to design an experiment to verify that the circuit does actually run as predicted!

 A word of warning!  Experimental design!  If you don’t make an estimate of what you expect to measure before starting  You will waste a lot of time doing the experiment!  Working out the expected delay time is formally equivalent to setting out a hypothesis for the experiment  The simulator says the delay will be x ns so I hypothese (predict) that we will measure a delay of about x ns  This (simple) hypothesis guides your experimental design and set up!  For example, assume you have a 150MHz oscilloscope available …

Experimental hypothesis  Experimental hypothesis  The simulator says the delay will be x ns so I hypothese (predict) that we will measure a delay of about x ns  This (simple) hypothesis guides your experimental design and set up!  For example, assume you have a 150MHz oscilloscope available  You try to make measurements of the delay, but are surprised to find that there appears to be no delay at all!  Somebody then remembers to go back and read the synthesis report.. Which tells you to expect a 5ns delay –  or one that will be difficult to measure on a slow ‘scope!

Experimental Hypothesis  The simulator says the delay will be x ns so I hypothese (predict) that we will measure a delay of about x ns  This (simple) hypothesis guides your experimental design and set up!  You now know that you have to design your experiment differently, eg 1.Build a wider adder So that the delay is long enough to measure easily 2.Work out how to measure n repeats of the calculation So that 5  n > 20ns (or some time that you can be certain to measure accurately!) 3.Devise an entirely new technique Which doesn’t require direct measurement of such a small delay

Measuring the delay  Usual strategy  Design a test bench that will drive the component you are testing – the Component Under Test (CUT)  This test bench will be different from the one used to verify the component’s correctness!  First task:  What will the test bench do?  What is the worst case input(s)?  Ie the ones that will take the longest time to produce a result!  For an adder, there are several possibilities: ?  Set up the test bench to produce one of these inputs  Make a small state machine

Measuring the delay  You’ve identified a suitable worst case …  Set up the test bench to produce these inputs  Make a small state machine 2 states may be adequate: State 1: Clear the outputs State 2: Apply the test case  How will you know that the worst case has completed computation?  In the case of an adder, it’s easy  For other circuits, you may need to add some ‘probe’ circuitry For example, the worst case is when TWO outputs go high Add an AND gate to your driver and route the output of this gate to an external pin  Set up your scope to measure the delay  from the start of the clock cycle  to the output signalling completion

Measuring the delay  You’ve identified a suitable worst case …  Set up the test bench to produce these inputs  Make a small state machine 2 states may be adequate: State 1: Clear the outputs State 2: Apply the test case PROCESS( clk ) BEGIN IF clk’EVENT AND clk=‘1’ THEN CASE state IS WHEN state1 => a a <= one; b <= minus_one; state <= state1; start <= ‘1’; END CASE; END PROCESS;  Set up your scope to measure the delay  from the start of the clock cycle (or the start signal)  to the output signalling completion (carry out for an adder)

Measuring a delay  Second strategy  Use the FPGA to do everything!  Drive a fast counter with the fastest clock available  Stop the counter when the operation is complete  Requires fast input clock  Resolution of this clock determines timing accuracy  Can use FPGA PLLs to multiply clock PROCESS(clk) BEGIN CASE state IS WHEN state1 => counter <= zero; a <= zero; b <= zero; IF c_out = ‘0’ THEN state <= state2; END IF; WHEN state2 => a <= one; b <= minus_one; counter_enable <= ‘0’; state <= state3; WHEN state3 => IF c_out = ‘1’ THEN counter_enable <= ‘1’; --- transfer counter output to --- LED display, etc state <= state1; END IF; END CASE; END PROCESS;

Measuring Delay  Strategy three  If the circuit is clocked too fast, it won’t operate correctly ie it won’t complete the computation before the next clock edge arrives eg in the adder example, next clock edge arrives before carry has rippled through to carry out, so it never becomes ‘1’  Set up test circuit as first case, but gradually increase clock speed until carry_out never becomes ‘1’  Use secondary clock derived from (fast) master clock whose frequency is gradually increased until circuit stops operating correctly ( ie never produces a completion signal) Count pulses of master clock with a loadable counter Secondary clock is derived from counter completion signal Reduce loaded count value to reduce secondary clock cycle time (reciprocal scale)

Measuring the delay  Timing accuracy  Don’t forget the I/O buffer delay!  A signal from either your driver or the CUT has to go through a pin – which implies passing through an I/O buffer  Don’t  Put one probe on the clock input and one on an output Your result will be increased by the I/O buffer delay!  Instead Feed the clock back off the FPGA through an I/O buffer Now both the clock and the completion signal will be delayed by one I/O buffer Still some possibility of error Both I/O buffer will not have exactly the same delay But this error is likely to be of the same magnitude as other unavoidable errors, so …  In general, make sure that both your ‘start’ and ‘stop’ signals have very similar delays

Measuring the delay  Timing accuracy  Don’t forget the I/O buffer delay!  In general, make sure that both your ‘start’ and ‘stop’ signals have very similar delays  If you had to combine your ‘stop’ signals in some logic, eg one logic block to AND two signals together  Then pass the start signal through an artificial similar delay Your completion signal is the AND of two outputs complete <= a_out AND b_out So your start signal should be similarly delayed start_delayed <= start AND start ‘start’ will often be the clock driving your test circuit  Be careful: Some compilers are clever enough to realize that this is a ‘do nothing’ piece of logic and will remove it!!

CUT a b c_in sum c_out Clock FPGA