Alberto Stabile 1
Overview This presentation describes status of the research and development of main boards for the FTK project. We are working for a new Associative Memory Mother-Board, a new LAMB and for a new transmission protocol between the clustering mezzanine and the Mother-Board. New AM M-BoardNew LambClustering Mezzanine 2 Alberto Stabile
3
M-Board main functions & requirements 4 Alberto Stabile
Re-design P3 region: Baseline solution Alberto Stabile 5 To use TLK2521 SERDES for the 8 road buses: Temporary store roads by means of SPARTAN FIFOs Serialize 18:1 toward aux board To use TLK2521 SERDES for the 8 hit buses: Temporary store hits by means of SPARTAN FIFOs De-serialize 1:18 toward LAMBs Spread hits in all LAMBs 18
P3 region floor-plan TLK mm × 12,2mm TQ 64 1,2 mm thick XC6SLX16 CPG196 8x8 mm2 1,2 thick 2,5 cm ,2 cm ~6 cm 1,2 cm ~7 cm Road FIFOs ~3.5 cm Hit Fifos
SI simulation results for P3 region at 1 GHz working frequency a pattern data generated with 200 PRBS samples Sig-Explorer EMS2d with 128 point for frequency starting from a frequency of 1 MHz to 20 GHz waveforms have been obtained in the worst speed case connector impedance have been emulated with a couple 50 resistors placed in series for the P3 connector we have used a common LVDS IBIS model. In particular this model have a V cm equal to 1250 mV Lane impedance and propagation delay have been extracted from layout geometries for TLK2521 we have used its ibis model from Texas Instruments LVDS line have been obtained with and impedance equals to Alberto Stabile 7 In this work we simulate the 16 LVDS lanes from/to TLK2521 from/to P3 Connector. Simulation have been done in all worst cases. Simulations have been done:
SI simulation results Exhaustive simulation in all worst case demonstrate that signals from/to TLK2521 from/to P3 connector are integer Some reflessions are due to irregular geometries of routing, however do not compromise the integrity of signals Eye diagram are plotted for each lane Alberto Stabile 8
For example: Receiver eye diagram Alberto Stabile 9
For example: Trasmitter eye diagram Alberto Stabile 10
Re-design P3 region: Upgrade solution Alberto Stabile 11 XC6SLX150T 8 GTPs XC6VLX75T 8 GTXs To use a FPGA for the 8 road buses: Temporary store roads by means of FIFOs Serialize 18:1 toward aux board To use a FPGA for the 8 hit buses: Temporary store hits by means of FIFOs De-serialize 1:18 toward LAMBs Spread hits in all LAMBs Use Micrel drivers
P3 region data signal floor-plan 12 Alberto Stabile Virtex 6 FPGA XC6SLX150T 8 GTPs Virtex 6 FPGA XC6VLX75T 8 GTXs LAMBLAMB LAMBLAMB LAMBLAMB LAMBLAMB LAMBLAMB LAMBLAMB LAMBLAMB LAMBLAMB P3 connector AUXBOARDAUXBOARD AUXBOARDAUXBOARD Gbit/s 8 18 x 2 (2 words) 18 x = (15 bits x 8 layers) + 2 ctrl bits Micrel reveivers Micrel drivers
13 Alberto Stabile
The old project 14 Alberto Stabile Each INDI spreads 1 hit bus among 1 half-LAMB Boundary scan chip with few pins Glue with parallel connection toward m_board AM
Chip pinout 15 Alberto Stabile Each INDI spreads 1 hit bus among 1 half-LAMB Boundary scan chip with few pins Glue with parallel connection toward m_board AM OldNew Number of hit buses 68 Hit bits for bus 1815 Number of road buses 88 Road bits for bus 1814 (20 [addr]+8 [bitmap] /2)
Glue 16 Alberto Stabile Each INDI spreads 1 hit bus among 1 half-LAMB Boundary scan chip with few pins AM Glue with serial link using a SPARTAN FPGA with 2 or 4 GTPs (1.6 GHz or 0.8GHz) FPG A AM
Boundary scan 17 Alberto Stabile Each INDI spreads 1 hit bus among 1 half-LAMB AM Glue with serial link using a SPARTAN FPGA with 2 or 4 GTPs (1.6 GHz or 0.8GHz) FPG A AM bga dense number of pin and logic
INDI 18 Each INDI spreads 1 hit bus + 3 bits of bus6 or bus 7 among 1 half-LAMB AM Glue with serial link using a SPARTAN FPGA with 2 or 4 GTPs (1.6 GHz or 0.8GHz) FPG A AM bga dense number of pin and logic
19 Alberto Stabile
Context & goal The clustering algorithm has several functions Improves resolution (e.g. spatial or other parameters) Reduce the amount of data (N hits 1 cluster) 20 Alberto Stabile Level-2 Event buffers Fast TracKer input stage PixelDetector clustering device 50~100 kHz event rate Detector interface Detector interface 132 S-links
Algorithm 21 Alberto Stabile FIFO input Row, Col, ToT Row, Col FSM & control logic 328x8 processing matrix RAM store ToT ~21kbits (328x8x8bits) Row, Col by cluster Average calculator Output average (x,y) cluster centers ToT ToT = time over threshold
Core logic Logic functions 1. Load hits 2. Select left-top-most hit 3. Propagate “selected” 4. Readout cluster Column index Row index Load hits regardless of readout order. Any readout order is allowed. Pisa Meeting, May 27th, Alberto Annovi - INFN Frascati
Core logic Logic functions 1. Load hits 2. Select left-top-most hit 3. Propagate “selected” 4. Readout cluster Column index Row index Find left-most top-most hit Priority logic Control logic Control logic Pisa Meeting, May 27th, Alberto Annovi - INFN Frascati
Core logic Logic functions 1. Load hits 2. Select left-top-most hit 3. Propagate “selected” Black pixel 4. Readout cluster Column index Row index Propagate “selected”: local logic Pisa Meeting, May 27th, Alberto Annovi - INFN Frascati
Core logic Logic functions 1. Load hits 2. Select left-top-most hit 3. Propagate “selected” Black pixel 4. Readout cluster Black pixels Column index Row index Readout cluster Priority logic Control logic Control logic (3) Select Propagation in parallel with (4) Cluster Readout (3) Select Propagation in parallel with (4) Cluster Readout
How to save logic resources? Take advantage of partial readout ordering Use a sliding window to process the complete module Use a 328x8 matrix Full r length (can be squeezed) Larger than maximum cluster size (5) Clock freq. 58MHz 3 cycles/hit ~ 20MHz hit proc. rate Area usage 32% (xc5vlx155) Use 2 matrixes 64% of xc5vlx155 safely process one 40MHz Slink Sliding window Far away clusters are unrelated Pisa Meeting, May 27th, 2009Alberto Annovi - INFN Frascati
27 Clustering by 328x8 slices? Module data Eta direction --> Shift of hits comes for free (no extra time)! Just use the slice as a circular buffer in the eta direction. Then hits are shifted by redefining the first column. SLIDING WINDOW: with one xc5vlx155 process one S-Link Implement 2 processing matrixes. Process hits at 40MHz rate. Pisa Meeting, May 27th, 2009Alberto Annovi - INFN Frascati Fill 328x8 slice like this Read out 1st cluster Read out 2nd cluster And so on Fill 328x8 slice like this Module data Eta direction -->
Implemented & simulated Using xilinx virtex5 (xc5vlx330) Timing (cycles): 2/hit + 2/cluster could be reduced to: 2/hit Pisa Meeting, May 27th, Alberto Annovi - INFN Frascati
29 Resources and clock speed FPGA usage and clock period increase for large matrixes. For a 328x144 matrix, area usage ~250%. Now what? Take advantge of readout order (depend on actual detector). Pisa Meeting, May 27th, 2009Alberto Annovi - INFN Frascati xc5vlx330
Conclusion 30 Alberto Stabile
31 Alberto Stabile
The old project 32 Alberto Stabile Each INDI spreads 1 hit bus among 1 half-LAMB Boundary scan chip with few pins Glue with parallel connection toward m_board AM
Buses organization Roads 20 bits for addresses on AM chips, to address patterns that flow in 4chips pipelines + 3 bits in the LAMB GLUE to distinguish the 8 pipelines + 2 bits in the M-Board to distinguish the 4 LAMBs + 8 bits for bitmap signals + 3 bits for control signals (EE, EP, WR_EN) = Total 36 bits We can split this 36 in 2 words of 18 bits [First word] = WR_EN + 17 address [Second word] = EE + EP + 8 address + 8 bitmap 33 Alberto Stabile Hits 15 bits for data 2 bits for controls signals (EE, WR_EN)