A Low-Power High-Speed Serializer for the LHCb Pixel Detector Antonio Pellegrino & Vladimir Gromov Nikhef, Madrid INFIERI Workshop, 21-01-2014 Introduction on LHCb Pixel Detector Pixel Readout ASIC Data Transmission serializater and wireline driver power consumption status and outlook
LHCb Forward Spectrometer ~10m ~20m 10– 300 mrad 10 – 250 mrad November 14, 2018 Antonio Pellegrino
VErtex LOcator (VELO) Installation foreseen during LHC LS2 26 stations ~ 1 m 26 stations total active area 1237 cm2 (= size of A3 sheet of paper) November 14, 2018 Antonio Pellegrino
Sensor Layout Distance to the closest chip 5 mm. Beam ASIC ~43mm Sensor tile : Distance to the closest chip 5 mm. Beam 4 sensors per side 3 chips per sensor 256256 pixels per chip November 14, 2018 Antonio Pellegrino
Pixel and Readout Sensor tiles: 3 readout VeloPix ASICs on a sensor: ~15mm ASIC ~43mm sensor Sensor tiles: 3 readout VeloPix ASICs on a sensor: 55 x 55 m2 pixels 4 sensor tiles, 2 on each side of substrate power and readout traces on kapton Whole VELO ~41 million pixels Si substrate with micro-channels cooling material in active region ~ 0.8 % X0 Si Substrate 400mm Top Sensor 200 mm ASIC 200 mm Bot Sensor 200 mm ASIC200 mm Cooling In/outlets Glue 50mm Micro channels 200 mm x 120 mm ...................................... ....................... ............... sensor ASIC glue November 14, 2018 Antonio Pellegrino
Start from Existing TimePix3 130 nm CMOS, 8 metal layers, 170 million transistors CERN, Nikhef and Bonn university 8 serial output links running at 0.64 Gbps SPIDR readout using Xilinx Virtex-7 FPGA November 14, 2018 Antonio Pellegrino
Timepix3 Velopix Readout ASIC So Timepix3 is a very nice start, but… Increase hit rate capabilities by factor 8 grouping of pixel hits (24 super pixels) 30% data reduction optimize buffering increase output bandwidth Output bandwidth of VeloPix > 13 Gbit/s (average, 20 Gbit/s peak) Shorter term future: 4 links at ~5 Gbit/s Designing a chip for these readout rates is already a huge challenge Higher speeds are not really possible in this 130 nm technology Longer term future: overcome intrinsic limitations 65 nm CMOS < 3 Watts per chip @ 1.5V (1.5 W/cm2) November 14, 2018 Antonio Pellegrino
Overview of Data Transmission Data volume of whole VELO ~2.5 Tbit/s Electrical to optical conversion outside of vacuum tank Lower radiation level + more easily accessible LHCb common DAQ boards (TELL40) 4 mezzanines with powerful (Stratix V or VI) FPGA 24 optical links in, max. 12 10 Gigabit Ethernet out TELL40 (ATCA) max. 24 optical links FPGA differential copper links max. 24 optical links FPGA vacuum feedthrough electrical -> optical CPU farm vacuum feedtrhough max. 24 optical links differential copper links FPGA max. 24 optical links FPGA ~1 m ~60 m November 14, 2018 Antonio Pellegrino
Intermezzo on Gigabit Copper link Electrical to optical conversion outside of vacuum tank Lower radiation level + more easily accessible Cu link must be radhard, low outgassing, flexible Using Dupont Pyralux AP-plus ‘kapton’ Specially designed for HF applications Measurements compared to simulations with 3D ADS momentum simulator Transmission looks promising for 0.5-1 m of cable Eye diagram for 100 cm length November 14, 2018 Antonio Pellegrino
Data Rates In hottest area O(600) million hits/s per chip 30 bits per hit ~18 Gbits/s per chip 4 5 Gbits/s links per chip Incidentally, radiation levels also high: order of 400 MRad in 10 year life time and about 8.1015 1 MeV neq November 14, 2018 Antonio Pellegrino
The GBT chip set is developed at CERN, http://cern.ch/proj-gbt Is GBT a Viable Option? The GBT chip set is developed at CERN, http://cern.ch/proj-gbt GBT serializer uses: 120 bits register PLL and Very fast clocks (1.6 and 4.8 GHz) November 14, 2018 Antonio Pellegrino
The GBT chip set is developed at CERN GBT vs Our Design (GWT) The GBT chip set is developed at CERN http://cern.ch/proj-gbt GBT-serialiser Input format 120 bits @40 MHz, 4.8 Gbps per serializer With four links, total bandwidth 19.2 Gbps metal stacks (LM) different from that (DM) used in VeloPix High power consumption, ~1.44 W per chip Our Design (GWT, lower power, line driver with pre-emphasis) input format 16 bit @320 MHz DDR, effective 5.12 Gbps per serializer With four links, total bandwidth 20.48 Gbps Same metal stack (DM) as used in VeloPix Low power consumption 0.3 W per chip November 14, 2018 Antonio Pellegrino
GWT : a 5.12 Gbps byte-interleaved serializer / Wireline Transmitter Our Design (“GWT”) Reg <1> MUX 16 phases 8bit @ 320MHz Driver data @ 5.12 Gbps negedge 50Ω 50Ω low-mass flex cable pre-emphasis Reg <0> 8bit @ 320MHz out+ posedge 100Ω out- Multi-phase DLL ∆U = ± 350mV 20mA clock (320MHz) 16 x 195ps = 1 / 320MHz GWT : a 5.12 Gbps byte-interleaved serializer / Wireline Transmitter November 14, 2018 Antonio Pellegrino
Our Design (“GWT”) GWT is not a “better copy” of GBT, but a different design no shift register, but selection from multiplexer different multiplexer design no super-fast (1.6, 4.8 Hz) clocks (max 320 MHz) DLL instead of PLL wireline transmitter included (GBT only laser driver) only DM metal stack ( also usable in 65 nm process) November 14, 2018 Antonio Pellegrino
Voltage-Controlled DLL 16 x delay cell dummy_PD dummy_PD Vcntr dummy_PD dummy_PD Vcntr 16.5/0.2 16.5/0.2 2 x 4.96/0.12 16.5/0.2 VN IN OUT 4 x 0.81/0.12 15/0.2 6/0.2 VN 15/0.2 ph_0 ph_1 ph_2 ph_15 output phases 4.5/0.12 1.5/0.12 current-starved delay cells full CMOS signals on each output phase not sensitive to the clock duty cycle (only posedge delay is important) November 14, 2018 Antonio Pellegrino
Differential 16-to-1 Multiplexer sel_0 data_0 sel_0 _data_0 sel_1 data_1 sel_1 _data_1 sel_2 data_2 sel_2 _data_2 sel_3 data_3 sel_3 _data_3 sel_4 data_4 sel_4 _data_4 sel_5 data_5 sel_5 _data_5 sel_6 data_6 sel_6 _data_6 sel_7 data_7 sel_7 _data_7 mux_pos mux_neg sel_8 data_8 sel_8 _data_8 sel_9 data_9 sel_9 _data_9 sel_10 data_10 sel_10 _data_10 sel_11 data_11 sel_11 _data_11 sel_12 data_12 sel_12 _data_12 sel_13 data_13 sel_13 _data_13 sel_14 data_14 sel_14 _data_14 sel_15 data_15 sel_15 _data_15 November 14, 2018 Antonio Pellegrino
low-mass flex cable (lossy) Wireline Driver Electrical to optical conversion outside of vacuum tank wireline driver (GBT only has a laser driver) 5.12 Gbps low-mass flex cable (lossy) out+ out- Udiff = ± 350mV Velopix due to the losses in the transmission line (dielectric, resistive, skin-effect) the successive symbols (bits) will “blur” together causing errors in the data Inter - Symbol Interference (ISI) after transmission in lossy cables November 14, 2018 Antonio Pellegrino
Our Wireline Driver Design To mitigate the issue of inter-symbol interference in lossy cables we use pre-emphasis with AC coupling Good performance of signal transmission wide opening of the eye-diagram (receiver end) : ± 350mV November 14, 2018 Antonio Pellegrino
Preliminary Layout (450 210 m2) Low-pass filter capacitance array Charge pump MUX Wireline driver in register VCDL LD PD ref. clock pad ref. clock pad ref. clock pad output pad output pad November 14, 2018 Antonio Pellegrino
BEOL Metallizations (cmrf8sf) MA 3-2-3 stack MA 4-1-3 stack MA (4) Alum MA (4) Alum F1 (4) F1 (4) different different E1 (3) E1 (3) LM 6-2 stack FT (4) FT (4) LM 0.55 LY (0.46) Alum VQ (0.65) LY (0.46) Alum FY (1.4) MQ (0.55) FY (1.4) MG (0.55) VL (0.65) VQ (0.65) M6 (0.32) MQ (0.55) MQ (0.55) V5 (0.35) M5 (0.32) VL (0.65) VL (0.65) V4 (0.35) M4 (0.32) M4 (0.32) V3 (0.35) V3 (0.35) M3 (0.32) M3 (0.32) M3 (0.32) V2 (0.35) V2 (0.35) V2 (0.35) V1 (0.35) M2 (0.32) V1 (0.35) M2 (0.32) V1 (0.35) M2 (0.32) identical M1 (0.29) M1 (0.29) M1 (0.29) identical Front-end-of-line layers Test chip (MOSIS) GBTX Serializer VELOpix Serializer N.B. since only DM stack is used, GWT topology can directly be “exported” to 65 nm process (higher speed, lower power, etc.) November 14, 2018 Antonio Pellegrino
Summary & Conclusions we are designing a 130 nm readout ASIC for the LHCb pixel detector designing for on-chip rates of ~20 Gbps is very hard importing GBT scheme to our ASIC is an option, but not our favorite designed alternative “GWT” 5.12 Gbps serializer / wireline transmitter 128 bits frame @ 40 MHz (120 for pixel data) input format 16 bit @320 MHz DDR, 5.12 Gbps per serializer 4 GWTs per chip, total bandwidth 20.48 Gbps DM metal stack used in design low power consumption 0.3 W Simulations and functional studies done, schematics ready for submission on multi-wafer project in a few weeks November 14, 2018 Antonio Pellegrino