1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002
2 Introduction n L1PPI Tasks n Common mode suppression algorithm n Cluster collection on the read out board n Prototype implementation n FPGA technology available and suitable for L1PPI
3 L1 Pre preprocessing tasks ADC To L1T Cluster encapsulation Event and board information Cluster encoding Parameter, alignment table Hit detection Hit threshold table Common mode suppression Parameter Channel reordering Pedestal subtraction Pedestal table Input to the processor Faulty channel masking Channel mask
4 Common mode suppression algorithms n LCMS u The LCMS algorithm is the baseline version for the CM suppression. It has been used in several simulation for the trigger performance and has a very good performance. (See “LHCb VeLo Off Detector Electronics Preprocessor and Interface to the Level 1 Trigger” LHCb VeLo ) n Full precision LCMS u The LCMS algorithm has also been tested with floating point precision. Its performance improvement compared to the 8-bit version is not significant. n Regions (RCMS) u These are the same algorithm as LCMS but the linear correction are applied to 8 or 16 detector channels only. Its performance is of course better for non linear CM noise suppression. n FIR filter u Its performance for suppressing non linear noise is by its nature much better than the LCMS type. With a FIR filter algorithm fine tuning can be done in order to suppress certain CM noise.
5 RMS value RMS calculation Calculate the slope Correct the slope Hit detected Hit detection
6 Principle of the LCMS algorithm algorithm Input data after pedestal correction Mean value Mean value calculation Mean value correction
7 Correct the mean value Correct the slope Calculate the slope again Calculate mean value Set hit to zero for second iteration Set detected hit to zero
8 Apply strip individual hit threshold mask This is also a hit Insert the previous hit Insert the hits pervious set to zero
9 FPGA vs DSP implementation n The experience with the DAQ interface implementation (based on DSP) and the prototype version of the L1PPI indicates that the data rate at the L1PPI needs an FPGA based implementation. u Data rate is 1 event / 900ns n Even with the Pipelined processing, the data latency for the LCMS algorithm is very small (~10 s).
10 Pipelined processing of the LCMS first iteration
11 Pipelined processing of the LCMS second iteration
12 Generated hits(pure signal) n Illustration of a 36 tap FIR filter based CM correction algorithm n This kind of algorithm could be implemented with a similar amount of gates on a FPGA FIR filter based CM suppression
13 Linear CM Sinusoidal CM Random noise
14 Signal + CM + NoiseFilter output FIR36
15 Prototype test n The LCMS algorithm has been implemented in a ALTERA APEX FPGA n The data processing has been verified bit accurate with: u the FPGA simulation u and a C coded algorithm
16 Prototype control access with RB2 n RB2 is used as control interface via VME bus n Processing on L1PPI can be done with required speed (900ns per event) throughput rate Host PC VME Master VME Slave RB2 Parallel interface master Parallel interface slave L1PPI Daughter card L1PPI Input FIFOOutput FIFO
17 RB2 with L1PPI daughter card VME SLAVE Parallel interface L1PPI
18 RB3 RB3 with L1PPI on the motherboard L1 Pre Processor L1 Link FPGA L1 Link card
19 L1PPI cluster collection ADC 1 FE CHIP (128 DCH) L1PP Cluster FIFO Theoretical max. 64 clusters / event, value settable (0..64) Cluster FIFO LINK-FPGA Theoretical max clusters / event, value settable (0..255) Off detector electronics (ODE) 64 ADC x16 Cluster encapsulation Event FIFO Link Buffer FIFO for 8 worst case events 16 FE To L1T
20 n APEX 20KE and 20KC u same logic structure as APEX 20K u increased speed especially the 20KC (copper) F 20KC is very expensive u increased I/O standard support F QDR supported, good for high speed interconnect on board to reduce I/O count on devices FPGA technology available on the market n APEX 20K u Standard PLD (FPGA) u LE according to device size u ESB (128 x 16-bit) dual port SRAM blocks, the number is according to device size u 2.5V core voltage F relative high power consumption u no low voltage level I/O standards supported F no QDR support
21 n APEX II u essentially the same logic structure as APEX 20K u ESB (256 x 16-bit) u SOPC (Nios core or ARM processor) F not needed u advanced support for high speed I/O F not needed n STRATIX u LC’s u enhanced embedded memory architecture F very useful for LCMS implementation u embedded DSP blocks (MAC’s) F very useful for LCMS (multiplier take a lot of LC’s) n Low cost version STRATIX (not yet anounced) u reduced high speed I/O support u no large SRAM blocks best good
22 n VIRTEX E u corresponds to ALTERA APEX 20KE or 20KC n VIRTEXII u embedded multiplier blocks F very useful for LCMS (multiplier take a lot of LC’s) n VIRTEXIIPRO u embedded multiplier blocks F very useful for LCMS (multiplier take a lot of LC’s) u embedded power PC F not used n SPARTAN u low cost version of the high end devices, usually less high speed I/O and less on chip memory n VIRTEX u essentially the same structure as the ALTERA APEX 20K best good
23 LCMS in FPGA implementation LE : Logic Element (basic logic cell) ESB : Embedded system block (dual port SRAM block) 1 Estimation of the necessary resources for L1PPI for 1 FE chip (128 detector channel) 8 Resource statistic for a L1PPI FPGA that processes 8 FE Low quantity chip price
24 Summary n L1PPI has been prototyped with a the linear common mode correction algorithm on a FPGA n Increasing speed and size and additional DSP capability of the future FPGA generation offers very high integration on the read out board n A complete LCMS based L1PPI will be available on RB3 n The data latency caused by the L1PPI is <17 s which is sufficiently small.