Development of a low power 5.12Gbps Data Serializer and Wireline Transmitter circuit for the VeloPix chip Vladimir Gromov Vladimir Gromov1 , Vladimir Zivkovic1, Martin van Beuzekom1, Xavi Llopart2 , Ken Wyllie2, Jan Buytaert2 , Michael Campbell2, Tuomas Poikela2,3, Massimiliano de Gaspari2 1 National Institute for Subatomic Physics (Nikhef), Amsterdam, the Netherlands 2 CERN, Geneve, Switzerland, 3 University of Turku, Finland TWEPP 2014, Aix-en-Provence, France. September 25, 2014
Outline VeloPix pixel readout chip for VELO detector upgrade in LHCb experiment data serializer circuit : a shift-register-free topology the circuit design aspects and experimental results take-aways TWEPP 2014 V.Gromov 25/09/14 2
Upgrade of VELO detector in LHCb LHCb upgrade: long shutdown 2 (LS2) (2018) luminosity of 2 x 1033cm-2s-1 (5x present) VELO: hybrid pixel detector trigger-less 26 stations (layers) total active area 1237cm2 (A3 size) vacuum compatible low material budget lowest possible power electronics radiation hard TWEPP 2014 V.Gromov 25/09/14 3
VELO detector: station layout ASIC specifications: 2 modules per station 1 tile = 1 sensor + 3 ASICs (VeloPix ) 4 tiles on both sides of the module planar silicon sensor , electron collection 5.1mm from the beam to the sensor 500Mhits/sec/cm2 55µm x 55µm pixel size 50khits/sec/pixel (~ HL-LHC in 2025) sensor tile ~15mm ~43mm beam cross section connector top sensor 200um ASIC150um substrate 400um bottom sensor 200um ASIC 150um Cooling channel TWEPP 2014 V.Gromov 25/09/14 4
Data rate per chip [Gbps] Data rates and VeloPix - successor of Timepix3 - pixel size: 55µm x 55µm pixels: 65 536 (256 x 256) area: 1.4cm x 1.4cm binary readout (no ToT) resolution/range: 25ns, 9b power: < 1.5W/cm2 up to 400Mrad, SEU tolerant technology: 130nm CMOS output data rate: >15Gbps (5x HL-LHC) VELO total : up to 2.9Tbps highly non-uniform radiation pattern Data rate per chip [Gbps] TWEPP 2014 V.Gromov 25/09/14 5
VeloPix readout chip analog FE pile-up losses < 1.6% 128 Double columns (14.08 mm) SP63 SP63 analog FE pile-up losses < 1.6% pixels grouping for sharing BX ID & SP ID to reduce data rate (30%) fast and efficient readout architecture (losses < 1%, latency < BX ID range) data traffic equalization output electrical link: 4 x 5.12Gbps Super Pixel logic Analog Front-end Pixel processor SP4 SP4 SP3 SP3 256 rows (14.08 mm) DC bus: 23bit @ 40MHz SP2 SP2 SP1 SP1 SP0 SP0 Double Column buses bus EoC1 EoC2 EoC3 EoC4 EoC5 EoC6 EoC63 EoC128 Packet Router ( 4 x 4 crossbar ) bus: 30bit @ 160MHz Data Fabric (left) Periphery (2-3mm) Packet Converter Packet Converter Packet Converter Packet Converter 8bit DDR @ 320MHz 8bit DDR @ 320MHz 8bit DDR @ 320MHz 8bit DDR @ 320MHz GWT Serializer GWT Serializer GWT Serializer GWT Serializer Driver Driver Driver Driver 1bit @ 5.12 GHz 1bit @ 5.12 GHz 1bit @ 5.12 GHz 1bit @ 5.12 GHz TWEPP 2014 V.Gromov 25/09/14 6
significant power consumption due to: Conventional Data Serializer (example GBTX) data_in 16bit @ 320MHz Reg <0:15> data_out 1bit @ 5.12GHz Shift Register D0 MUX D1 MUX D2 MUX D3 MUX D4 MUX D14 MUX D15 Q0 Q1 Q2 Q3 Q4 Q14 Q15 Sel PLL 320MHz → 5.12GHz clock (5.12GHz) clock (320MHz) significant power consumption due to: a shift register driven by a high-frequency clock (5.12GHz) on-board PLL to generate the high-frequency clock TWEPP 2014 V.Gromov 25/09/14 7
serialized output data GWT : a low power 5.12 Gbps byte-interleaved serializer / wireline transmitter data 8bit @ 320MHz Reg <8:15> MUX 16 phases posedge 0.05mW data 8bit @ 320MHz Reg <0:7> negedge 0.05mW 2mW Multi-phase DLL Edge- combiner serialized output data 1bit @ 5.12 Gbps ph_0 sel_0 ph_1 sel_1 ph_2 ph_3 sel_15 ph_14 16 x 195ps = 1 / 320MHz ph_15 clock (320MHz) 0.6mW 10mW clock (320MHz) MUX selection phases 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 data_in_reg < 3 : 10 > data_in_reg < 11 : 2 > data_out_serializer D<11> D<12> D<13> D<14> D<15> D<0> D<1> D<2> D<3> D<4> D<5> D<6> D<7> D<8> D<9> D<10> D<11> D<12> D<13> D<14> D<15> D<0> D<1> D<2> D<3> D<4> D<5> D<6> D<7> D<8> D<9> D<10> D<11> D<12> D<13> D<14> D<15> D<0> D<1> D<2> D<3> TWEPP 2014 V.Gromov 25/09/14 8
GWT : a low power 5.12 Gbps byte-interleaved serializer / wireline transmitter data 8bit @ 320MHz Reg <8:15> MUX 16 phases Driver Serialized data @ 5.12 Gbps posedge 50Ω 50Ω 1m low-mass flex cable 0.05mW pre-emphasis data 8bit @ 320MHz Reg <0:7> negedge 100Ω 0.05mW 2mW Multi-phase DLL Edge- combiner ∆U = ± 450mV ph_0 sel_0 ph_1 20mA sel_1 ph_2 45mW ph_3 sel_15 ph_14 16 x 195ps = 1 / 320MHz ph_15 0.6mW 10mW low-power topology : serializer: 15mW , wireline transmitter: 45mW delay-locked loop (DLL) – based topology : - lower phase noise (no jitter accumulation) - lower power - harmonic (false) locking (solved by design) - sensitive to the noise of the reference clock TWEPP 2014 V.Gromov 25/09/14 9
Voltage-Controlled Delay Line (DLL) 16 x delay cell dummy_PD dummy_PD Vcntr dummy_PD Vcntr 15/0.2 15/0.2 6/0.2 VN IN 6/0.12 OUT 12/0.2 6/0.2 6/0.12 VN 15/0.2 ph_0 ph_1 ph_2 ph_15 output phases current-starved delay cells full CMOS signals on each output phase full clock period delay in the VCDL unlike that in the VCO duty cycle breakdown frequency: 160MHz (50% ref. freq.) internal time jitter : < 3ps RMS output phase mismatch : systematic 10ps p-p, stochastic 30ps p-p = 20% of the Unit Interval (UI=195ps) TWEPP 2014 V.Gromov 25/09/14 10
Edge Combiner circuit _ph_0 sel_0 _ph_1 sel_1 _ph_2 _ph_15 sel_15 _ph_0 custom-tailored gates (inv, nor) to get a proper output signals output signal mismatch : systematic 16ps p-p, stochastic 32ps p-p ≈ 20% UI TWEPP 2014 V.Gromov 25/09/14 11
16-to-1 Differential Multiplexer negligible mismatch of the internal delay (130ps ± 7ps p-p) TWEPP 2014 V.Gromov 25/09/14 12
5.12 Gbps Wireline Transmitter 570Ω 570Ω 170Ω 170Ω 50Ω 50Ω out_pos out_neg 100Ω ∆U = ± 450mV 1.4mA 5mA 20mA pre_emp_en Ron UP = 40 Ω in_pos 600fF Ron DOWN = 40 Ω in_neg pre-emphasis high-frequency boosting by a pair of feed-through capacitors (optional) TWEPP 2014 V.Gromov 25/09/14 13
Velo_GWT test chip 1mm x 2mm test chip in MOSIS MPW run (18/02/2014) 3-2-3 MA metal stack on-chip Test Data Generator (repetitive pattern / PRBS 216), 320MHz reference clock : external / ePll [1] (on-chip) [1] F. Tavernier “A Radiation-Hard PLL for Frequency Multiplication with Programmable Input Clock and Phase-Selectable Output Signals in 130nm CMOS” , TWEPP2012, Oxford, UK TWEPP 2014 V.Gromov 25/09/14 14
VELO_GWT chip bonded on the pcb Velo_GWT evaluation set-up 12.5GHz Agilent BERT power : 60mW (1.5V @ 40mA) VELO_GWT chip bonded on the pcb evaluation board TWEPP 2014 V.Gromov 25/09/14 15
Phase mismatch measurements 0101010101 data pattern both outputs @ 5.12Gbps, 100mv/div, 500ps/div, clock external 320MHz phase_12 phase_13 phase_14 phase_15 phase_10 phase_1 phase_3 phase_5 phase_7 phase_9 phase_11 phase_12 phase_13 phase_14 phase_0 phase_2 phase_4 phase_6 phase_8 phase_15 phase_0 phase_1 phase_2 phase_3 phase_4 phase width mismatch : ~ 50ps p-p (25% UI) TWEPP 2014 V.Gromov 25/09/14 16
Eye diagram measurements external (clean) ref. clock (320MHz) 0101010101 pattern ePll generated (noisy) ref. clock (320MHz) 0101010101 pattern ePll generated (noisy) ref. clock (320MHz) PRBS pattern eye diagram opening : ~ 60ps @ ± 200mV (30% UI) GWT internal phase noise is low severe impact of the jitter on the ref. clock (ePll-generated) TWEPP 2014 V.Gromov 25/09/14 17
Operation bandwidth limits 0101010101 data pattern @ external ref. clock Eye opening, [% Unit Interval ] data pattern PRSG @ ePll generated ref. clock nominal : 5.12Gbps bandwidth [Gbps] GWT can operate in range from 2.56Gbps to 6.24Gbps corresponding to the reference clock range 160MHz to 390MHz TWEPP 2014 V.Gromov 25/09/14 18
Operation power supply voltage limits 0101010101 data pattern, eye diagram opening ± 200mV Eye opening, [% Unit Interval ] VDD, [Volt] VDD must not be lower than 1.2V TWEPP 2014 V.Gromov 25/09/14 19
Take-aways GWT is a 5.12Gbps Data Serializer and Wireline Driver circuit being developed for VeloPix chip a shift-register-free topology has been chosen for the serializer to provide low power consumption (15mW) and avoid a high-speed PLL four GWT units on each VeloPix will transmit a large amount of data (> 15Gbps) over a 1 meter low-mass copper cable contributing only 10% to the chip power budget ( 240mW = 4 x {15mW + 45mW}) measurements of the prototypes submitted in 130nm technology demonstrate expected results the circuit will be re-designed in TSMC 130nm technology Vertex2011 V.Gromov 24/06/11 20
Spare slides