May. 2009, Wu Jinyuan, Fermilab IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan Fermilab IEEE Real Time Conference Short Course May, 2009

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 2 Outline Counting:  Example: LED brightness and DAC  Simple Sequencing Bandwidth and Noise Issues:  General Remarks on Sampling Theorem and Dithering.  Example: Huffman Coding  Example: Decimation & Dynamic Decimation After-fact Calibration:  Several Topics on FPGA Based TDC  Serial Communication with Independent Crystals  Minimum Synchronization

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 3 Flashing LED, The First Thing First Counter Q[23..0] At least design an LED for an FPGA. When a board is first powered up, first test the LED flashing function. Many things have to be right so that the LED flashes:  Power pins must be all connected.  Configuration devices must be in correct mode.  Design software must be correct.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 4 FP LED Brightness Variation Counter Q[23..0] A B A<B LUT Counter Q[23..0] A B A<B The LED brightness is varied by changing the output pulse duty-cycle. Comparator input A is the brightness and B is the clock cycle count. Look-up table can be added to input A for different brightness variation curve.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 5 FP LED Brightness Exponential Drop  Counter Q A B A<B CO Q SET D if (CO==1) {Q = Q - Q/32;} Narrow pulse are typically stretched for LED display with fix brightness. The circuit here provides gradually dim of the LED for better visual effect. Possible Student Lab

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 6 Exponential Sequence Generator  Q SET D if (CO==1) {Q = Q - Q/32;} An exponential sequence is generated using an accumulator shown above. Note that not even one multiplier is used. Other function sequences: sine, co-sine, tangent, co- tangent etc. can also be generated similarly.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 7 Duty-Cycle Based Single-Pin DAC (1) The duty-cycle or pulse width of the comparator output is proportional to the DAC input at port A. Use external RC as low-pass filter. Output voltage of an ideal LP filter is proportional to the DAC input. Counter Q A B A>B DAC Input

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 8 Duty-Cycle Based Single-Pin DAC (2)  Q CO D DAC Input Possible Student Lab Use carry-out of the accumulator as the output. The number of pulses is proportional to the DAC input. Rounding error is carried to later cycles. Output is smoother.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 9 The Frequency Spectrum of DAC (2)  Q CO D DAC Input The first harmonic may be suppressed. Works better with regular low-pass filters.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 10 The Frequency Spectrum of DAC (1) Counter Q A B A>B DAC Input The first harmonic has dominate concentration. Works better with notch filter.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 12 ST CLK QA[5] QA[4..0] 0103130 Start, Count: A Single Layer Loop The ST signal start the sequence Counting is enabled Counting stops

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 13 A Double-Layer + Single-Layer Sequencer BAAA 001234255 101234 201234 301234 401234 01234 00 A double-layer loop is followed by a single-layer loop. 10 20 31 42 255253 0254 0255 00 Inner Loop Outer Loop State Control

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 14 An Array Adder

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 16 Cares Must Be Taken Outside FPGA (1) DAC FPGA ADC Shaper LP Filter Band Limiting Band Limiting Spectrum of Original Signal Spectrum of DAC Output LP filter ADC Input Sampling In ADC Aliasing w/o LP Filtering Output of LP filter Nyquist Frequency < (1/2) Sampling Frequency

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 17 The “Trend” vs. The Sampling Theorem There will be no hardware analog processing. Everything is done digitally in software. It sounds very stylish A shaper/low-pass filter is a minimum requirement.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 18 Cares Must Be Taken Outside FPGA (2) DAC FPGA ADC Shaper LP Filter n Dither Resolution finer than the ADC LSB can be achieved by adding noise at ADC input and digital filtering.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 19 Adding Noise for Finer Resolution Photo Credit: www.telegraph.co.uk, trinities.org Mechanical pressure gauges usually do not track small pressure changes well. The gauge readers may lightly tap the gauges to get more accurate reading. The idea of dithering at ADC input is similar.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 20 Some Notes on Philosophy Wideband Low Noise Narrowband Noisy GoodBad Something good in one condition can be bad in another condition. And vise versa.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 21 Why Band Limiting & Dithering are Ignored? Pre-amplifiers usually have a naturally limited bandwidth and an intrinsic noise larger than the LSB of the ADC. So a lot of time, band limiting and dithering can be “safely” ignored since they are satisfied automatically. High bandwidth, low noise devices now become easily accessible. A design can be too fast and too quiet.  Do not forget to review the band limiting and dithering requirements for each design.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 23 Data Reduction on Liquid Argon TPC Data Hit waveforms in TPC carry useful information. Digitizing the waveforms creates large volume of data. Data reduction without losing useful information is necessary. Drift Time Wire Number Data from BO detector of FNAL

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 24 Slow Variation of Raw Data More than 99% points differ from previous points by -1, 0 or +1. Huffman Coding can be applied to the differences of the data points. DFF Q A B A-B U(n+1) D U(n+1)-U(n)

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 25 The Huffman Coding The U(n+1)-U(n) value with highest probability is assigned to shortest code, i.e., single bit 1. Values with lower probabilities are assigned with longer codes, e.g., 01, 001, 0001 etc. Huffman coded words and regular words are distinguished by bit-15. U(n+1)- U(n) Code -4 and others Full 16 bits word -3000001 -20001 01 01 +1001 +200001 +30000001 1 00 ADC value (13-bit) Regular ADC data for first point or when U(n+1)-U(n) is outside +-3 Huffman Coded 000+1+2 Padding or Continue to Next Word In this example, 6 differences of the data samples are packed in the 16-bit data word. 111111000000000

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 26 The Huffman Coding Block The block is able to operate at up to 250MHz clock in Altera Cyclone III FPGA devices. The block uses 245 logic cells, taking 0.6% in an EP3C40F484C6 device ($129) containing 39600 logic cells. Raw Data Huffman Coded Data 245 Logic Cells (245/39600)*$129 = $0.80 1 00 ADC value (13-bit) 000+1+2 111111000000000

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 27 The Schematics of the Huffman Coding Block Difference of Data Points Huffman Code Lookup Table Huffman Code Composer Huffman Code or Raw Data Selector

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 28 The Compress Ratio of Huffman Coding On typical TPC events a compression ratio of about 10 can be achieved. Compression ratio is sensitive to high frequency noise. N N/(10.7)

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 30 A “Mystery” of Huffman Coding Ratios on Down Sampled Data The 5MHz data is down sampled to 1MHz. The Huffman Coding compress ratio drops from 10.7 to 7.5 when the data is down sampled. N N/(10.7) (N/5) (N/5)/(7.5)

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 31 Averaging in Decimation: A Re-discovery Simple “down-sampling” is not good. When the decimation factor is D, an averaging over D samples is good either. An averaging over 2*D samples is necessary. There is still aliasing with averaging over 2*D samples but it is less severe than averaging over D samples. Nyquist Frequency < (1/2) Sampling Frequency

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 32 Weighted Average, The CIC-2 Filter Filter performance can be further improved with weighted average over 4*D samples. The filter is called Cascade-Integrate-Comb filter of order 2 (CIC-2). The CIC-1 filter is the moving average.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 33 Huffman Coding Ratios for 5MHz to 1MHz The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 34 Dynamic Decimation (DD) Only small time intervals, i.e., region of interest (ROI) must be sampled at high rate. Most time intervals can be sampled with lower rate, without losing useful information.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 35 A Mystery of Dynamic Decimation & Huffman Coding Dynamic Decimation reduces number of samples by factor of 10. Huffman Coding reduces number of bits from raw data by factor of 10. When cascaded, the combination reduces number of bits by factor of 60. Dynamic Decimation Huffman Coding NN/10.6 Dynamic Decimation Huffman Coding N N/60 NN/10.7

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 36 Huffman Coding Ratios for Dynamic Decimation The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 37 Any Differences ? Raw With Dynamic Decimation

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 39 TDC Using FPGA Logic Chain Delay This scheme uses current FPGA technology Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68) Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip). IN CLK

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 40 Two Major Issues In a Free Operating FPGA 1. Widths of bins are different and varies with supply voltage and temperature. 2. Some bins are ultra-wide due to LAB boundary crossing

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 41 Digital Calibration Using Twice-Recording Method IN CLK Use longer delay line. Some signals may be registered twice at two consecutive clock edges. N 2 -N 1 =(1/f)/  t The two measurements can be used:  to calibrate the delay.  to reduce digitization errors. 1/f: Clock Period  t: Average Bin Width

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 42 Digital Calibration Result Power supply voltage changes from 2.5 V to 1.8 V, (about the same as 100 o C to 0 o C). Delay speed changes by 30%. The difference of the two TDC numbers reflects delay speed. N2N2 N1N1 Corrected Time Warning: the calibration is based on average bin width, not bin-by-bin widths.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 43 Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update LUT every 16K events) DNL Histogram In (bin) LUT  Out (ps) 16K Events

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 44 Good, However Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins 

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 45 Cell Delay-Based TDC + Wave Union Launcher Wave Union Launcher In CLK The wave union launcher creates multiple logic transitions after receiving a input logic step. The wave union launchers can be classified into two types: Finite Step Response (FSR) Infinite Step Response (ISR) This is similar as filter or other linear system classifications: Finite Impulse Response (FIR) Infinite Impulse Response (IIR)

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 46 Wave Union Launcher A (FSR Type) In CLK 1: Unleash0: Hold Wave Union Launcher A

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 47 Wave Union Launcher A: 2 Measurements/hit 1: Unleash

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 48 Sub-dividing Ultra-wide Bins 1: Unleash 1 2 12 Device: EP2C8T144C6 Plain TDC:  Max. bin width: 160 ps.  Average bin width: 60 ps. Wave Union TDC A:  Max. bin width: 65 ps.  Average bin width: 30 ps.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 49 Measurement Result for Wave Union TDC A Histogram Raw TDC + LUT 53 MHz Separate Crystal -- Wave Union Histogram Plain TDC:  delta t RMS width: 40 ps.  25 ps single hit. Wave Union TDC A:  delta t RMS width: 25 ps.  17 ps single hit.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 50 More Measurements Two measurements are better than one. Let’s try 16 measurements?

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 51 Wave Union Launcher B (ISR Type) Wave Union Launcher B In CLK 1: Oscillate0: Hold

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 52 Wave Union Launcher B: 16 Measurements/hit 1 Hit 16 Measurements @ 400 MHz VCCINT =1.20V VCCINT =1.18V

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 53 Delay Correction Delay Correction Process: Raw hits TN(m) in bins are first calibrated into TM(m) in picoseconds. Jumps are compensated for in FPGA so that TM(m) become T0(m) which have a same value for each hit. Take average of T0(m) to get better resolution. The raw data contains: U-Type Jumps: [48-63]  [16-31] V-Type Jumps: other small jumps. W-Type Jumps: [16-31]  [48-63] The processes are all done in FPGA.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 54 The Test Module Two NIM inputs FPGA with 8ch TDC Data Output via Ethernet BNC Adapter to add delay @ 150ps step.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 55 Test Result NIM Inputs 0 12 RMS 10ps LeCroy 429A NIM Fan-out NIM/ LVDS NIM/ LVDS - 140ps Wave Union TDC B + + BNC adapters to add delays @ 140ps step.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 56 Multi-Sampling TDC FPGA c0 c90 c180 c270 c0 Multiple Sampling Clock Domain Changing Trans. Detection & Encode Q0 Q1 Q2 Q3 QF QE QD c90 Coarse Time Counter DV T0 T1 TS Ultra low-cost: 48 channels in $18.27 EP2C5Q208C7. Sampling rate: 360 MHz x4 phases = 1.44 GHz. LSB = 0.69 ns. 4Ch Logic elements with non-critical timing are freely placed by the fitter of the compiler. This picture represent a placement in Cyclone FPGA

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 57 Issues of Coarse Time Counter There are some common misunderstandings on coarse time counters in a TDC:  Tow coarse time counters are needed, driven by clocks with 180 degree phase difference.  The coarse time counter should be a Gray code counter. Dual counters and/or Gray code counters are only needed in one ASIC TDC architecture. In the architectures used by FPGA TDC and some ASIC TDC, only one plain binary counter is needed as coarse time counter. Coarse Time Counter Coarse Time Counter Coarse Time Counter Gray Code Counter 000 001 011 010 110 111 101 100

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 58 Delay Line Based TDC Architectures HIT CLK HIT CLK HIT CLKHIT CLK Delay HitDelay CLKDelay Both CLK is used as clock HIT is used as clock Only this architecture needs dual coarse time counters.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 59 Implementation of Coarse Time Counter Coarse Time Counter Fine Time Encoder In CLK ENA Fine Time Coarse Time Data Ready Hit Detect Logic

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 61 Classical Picture of Serial Communications The parallel data is converted to serial bits driven by crystal oscillator X1 in the transmitter device. The serial data stream is used to generate a recovered clock at the receiver device with a phase lock loop (PLL). The recovered clock is used to drive the serial-to-parallel converter and store the data into a first-in-first-out (FIFO) buffer. The FIFO buffer is used to transfer data from the recovered clock domain to the local clock domain generated by crystal oscillator X2. Parallel -to-Serial Converter FIFO Serial-to -Parallel Converter PLL X1X2 Local Logic Recovered Clock

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 62 Serial Data Receiving Without PLL etc. Generating recovered clock with PLL, VCO, VCXO etc. is an analog process and it is not convenient to generate in an FPGA, especially for applications with multiple receiving channels. There are pure digital methods to receive the serial data.  Digital Phase Follower: 1bit/CLK  The Two-Cycle Serial IO: 1bit/(2CLK)  FM Encoder and Decoder: 1bit/(2-16CLK)  Clock-Command Combined Carrier Coding (C5): 4bits/(20CLK) The transmitter and receiver can be driven by two independent free running crystal oscillators. Parallel -to-Serial Converter Digital Serial-to -Parallel Converter X1X2 Local Logic

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 63 Digital Phase Follower The input data rate is 1bit/clock cycle. Four clock phases, c0, c90, c180 and c270 are used to detect input transition edge. The phase for data sample follows the variation of the transition edge.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 64 Schematics of Digital Phase Follower CLK: 375MHz Data Rate: 375Mbits/s

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 65 The Two-Cycle Serial IO This scheme is slower than digital phase follower but the logic is simpler. The CLK1 and CLK2 can be generated with two free running crystal oscillators. CLK1 Data Out Transmitter Receiver start bit = 1 b15b14b15 start bit = 1 Xb14X CLK2 Data In One data bit is transmitted every 2 clock cycles. A logic transition is detected between these two falling edges. Input data are stable at these clock edges.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 66 Schematics of the Two-Cycle Serial IO CLK: 200MHz Data Rate: 100Mbits/s

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 67 The FM coding A bit is transmitted in two unit time intervals, usually in two internal clock cycles at frequency f. For bit=1, the output toggles each cycle, i.e., with frequency (f/2) and for bit=0, the output toggles every two cycles, i.e., with frequency (f/4). When not transmitting data, the output toggles at frequency (f/4), until seeing the start bit. The data stream is naturally DC balanced suitable for AC coupled transmission. The polarity of the interconnection doesn’t matter. 0start bit = 10011

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 68 Schematics of FM Decoder CLK: 212MHz Data Rate: 26.5Mbits/s The ratio 8 CLK cycles/bit in this design is not an intrinsic limit.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 69 The Clock-Command Combined Carrier Coding (C5) A data train contains 5 pulses and each pulse is transmitted in four unit time intervals, usually in four internal clock cycles at frequency f. Information is carried with wide, normal and narrow pulses and the first pulse is always wide or narrow. When not transmitting data, all pulses have normal width. The data stream is DC balanced over 5 pulses suitable for AC coupled transmission. All leading edges are evenly spread so that the pulse train can be used directly drive the receiver side logic or PLL.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 70 Schematics of C5 Decoder Data Rate: 36ns/bit or 27.7Mbits/s Internal clock: 111MHz

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 72 Fixed Latency Everywhere? In classical trigger system, all cables must have fixed propagation delay. Serial links intrinsically do not have fixed latency. Do we need fixed latency at all? No. Front End Trigger Front End Front End Front End Trigger Front End Front End SER DESER SER ? Timing Reference

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 73 Hit Time Coding and Transmitting Hits in each channel are coded as bits representing small time intervals. Bit patterns are merged in a front-end module. Detector Processing Board Hit 5ns 40ns 01000001 0000001 0100000 000000 0000001 0 0 00 0 CLK&CMD

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 74 Cable Delay Self Timing At system initialization, all the Detector Processing Boards send out a special word in the same clock cycle as start mark. At the receiving end, the absolute arrival time from each board can be unknown and different. However, the start mark is recognized and stored in the addresses 0 of the corresponding receiving buffer. The words after the start mark are stored in sequence. Processing Support Board Detector Processing Board Detector Processing Board Detector Processing Board

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 75 An Example Initial Marker Data Initial Marker Data

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 76 Hit Merging and Coincidence Hits from different inputs in the Processing Support Board are merged together with an OR function and sent out as a serial data stream. The Coincidence Module re-align the different stream in the receiver buffers. Inside the Coincidence Module, the coincidence is searched as AND functions of the hit streams from opposite detector sectors. Very likely, a boundary coverage logic is applied, e.g.:  Trigger T[N] = HA[N]&&(HB[N] || HC[N]). The boundary coverage for time domain is also necessary. This is satisfied by checking adjacent bits in the buffered words, e.g.:  Trigger T[N] = (HA[N+1] || HA[N] || HA[N-1])&&(HB[N] || HC[N]). Processing Support Board Coincidence Module

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 77 Post-Scripts Some Extra Words for the Young & Old

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 78 About FPGA: Myths & Thinking We commonly heard about FPGA:  FPGA is cheap.  FPGA is fast.  FPGA is large.  FPGA can do anything. Not really. At least it is not always the case. Good design tricks are needed in order to take full advantages of FPGA devices and to avoid drawbacks of FPGA devices. FPGA: $16-$1500, Micro-Processor: $100-$500. FPGA: 500MHz, Micro-Processor: 1-3GHz. FPGA logic consumes more transistors. Only if the information is collected in FPGA.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 79 Moore’s Law Number of transistors in a package: x2 /18months Taken from www.intel.com

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 80 Status of Moore’s Law: an Inconvenient Truth # of transistors  Yes, via multi-core. Clock Speed  ? Taken from www.intel.com

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 81 Complexity in FPGA Designs Excessive Complexity in FPGA Designs = Fevers of Moore’s Law + Myths + No Thinking Complexity causes higher FPGA cost. Complexity creates indirect costs such as PCB layout, assembly, power consumption, cooling etc. Complexity confuses people, including designers.

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 82 Indirect Cost of Complexity If something like this can do the job… … why do these?

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 83 The Winning Line of FPGA Design We commonly heard:  FPGA devices contains millions gate.  High parallelism can be implemented in FPGA.  FPGA cost drops by half every 18 months. We want to emphasize, especially to our young students: 1. Creativity, 2. Creativity, 3. Creativity, on Arithmetic ops, on Algorithms, on Architectures & on All Aspects. O Freunde, nicht diese Töne!

May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 84 The End Thanks

May. 2009, Wu Jinyuan, Fermilab IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan.

Similar presentations

Presentation on theme: "May. 2009, Wu Jinyuan, Fermilab IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

May. 2009, Wu Jinyuan, Fermilab IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan.

Similar presentations

Presentation on theme: "May. 2009, Wu Jinyuan, Fermilab IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan."— Presentation transcript:

Similar presentations

About project

Feedback