THEATRE OF DREAMS: LHCB BEYOND THE PHASE 1 UPGRADE, April 7 th 2016, Manchester Niko Neufeld, CERN/EP (with a lot of help & input from Ken Wyllie)

Slides:

Advertisements

Similar presentations

Device Tradeoffs Greg Stitt ECE Department University of Florida.

Advertisements

E-link IP for FE ASICs VFAT3/GdSP ASIC design meeting 19/07/2011.

LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.

The LHCb DAQ and Trigger Systems: recent updates Ricardo Graciani XXXIV International Meeting on Fundamental Physics.

Virtual Memory Deung young, Moon ELEC 5200/6200 Computer Architecture and Design Lectured by Dr. V. Agrawal Lectured by Dr. V.

5 th LHCb Computing Workshop, May 19 th 2015 Niko Neufeld, CERN/PH-Department

LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.

Cloud Computing Economics Ville Volanen

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

Niko Neufeld CERN PH/LBC. Detector front-end electronics Eventbuilder network Eventbuilder PCs (software LLT) Eventfilter Farm up to 4000 servers Eventfilter.

XP Practical PC, 3e Chapter 17 1 Upgrading and Expanding your PC.

Computing Hardware Starter.

Niko Neufeld, CERN/PH-Department

 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.

Calorimeter upgrade meeting - Wednesday, 11 December 2013 LHCb Calorimeter Upgrade : CROC board architecture overview ECAL-HCAL font-end crate  Short.

VELO upgrade Front-end ECS LHCb upgrade electronics meeting 12 April 2012 Martin van Beuzekom on behalf of the VELO upgrade group Some thoughts, nothing.

GBT Interface Card for a Linux Computer Carson Teale 1.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

PicoTDC Features of the picoTDC (operating at 1280 MHz with 64 delay cells) Focus of the unit on very small time bins, 12ps basic, 3ps interpolation Interpolation.

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

IT253: Computer Organization

Status of the Beam Phase and Intensity Monitor for LHCb Richard Jacobsson Zbigniew Guzik Federico Alessio TFC Team: Motivation Aims Overview of the board.

Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,

Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.

Muon Electronics Upgrade Present architecture Remarks Present scenario Alternative scenario 1 The Muon Group.

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Niko Neufeld PH/LBC. Detector front-end electronics Eventbuilder network Eventbuilder PCs (software LLT) Eventfilter Farm up to 4000 servers Eventfilter.

DAQ for 4-th DC S.Popescu. Introduction We have to define DAQ chapter of the DOD for the following detectors –Vertex detector –TPC –Calorimeter –Muon.

Latest ideas in DAQ development for LHC B. Gorini - CERN 1.

LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.

Guido Haefeli CHIPP Workshop on Detector R&D Geneva, June 2008 R&D at LPHE/EPFL: SiPM and DAQ electronics.

01/04/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.

Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.

© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.

 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.

Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.

Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.

Links from experiments to DAQ systems Jorgen Christiansen PH-ESE 1.

Future experiment specific needs for LHCb OpenFabrics/Infiniband Workshop at CERN Monday June 26 Sai Suman Cherukuwada Sai Suman Cherukuwada and Niko Neufeld.

1 Preparation to test the Versatile Link in a point to point configuration 1.Versatile Link WP 1.1: test the Versatile Link in a point to point (p2p) configuration.

Could we abandon the HW LLT option and save money (and allow possible further improvements to the upgraded muon system)? Alessandro Cardini of behalf of.

Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.

Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Niko Neufeld LHCC Detector Upgrade Review, June 3 rd 2014.

Update on Optical Fibre issues Rainer Schwemmer. Key figures Minimum required bandwidth: 32 Tbit/s –# of 100 Gigabit/s links > 320, # of 40 Gigabit/s.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

MUX-2010 DISCUSSION : W HAT DO WE NEED NEXT IN HEP.

Geoff HallLECC LHC detector upgrades Introductory comments on electronic issues Organisation of this short session.

Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.

29/05/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.

Some thoughs about trigger/DAQ … Dominique Breton (C.Beigbeder, G.Dubois-Felsmann, S.Luitz) SuperB meeting – La Biodola – June 2008.

Conclusions on CS3014 David Gregg Department of Computer Science

LHCb and InfiniBand on FPGA

FPGAs for next gen DAQ and Computing systems at CERN

Workshop Concluding Remarks

D. Breton, S. Simion February 2012

Challenges in ALICE and LHCb in LHC Run3

Niko Neufeld LHCb Upgrade Online Computing Challenges CERN openlab Workshop on Data Center Technologies and Infrastructures, Mar 2017.

Electronics Trigger and DAQ CERN meeting summary.

for the Offline and Computing groups

RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Architecture & Organization 1

HADES goes SIS-100* SIS-18 DAQ upgrade possibilities

VELO readout On detector electronics Off detector electronics to DAQ

Architecture & Organization 1

Network Processors for a 1 MHz Trigger-DAQ System

ETD parallel session March 18th 2010

The LHCb Front-end Electronics System Status and Future Development

Presentation transcript:

THEATRE OF DREAMS: LHCB BEYOND THE PHASE 1 UPGRADE, April 7 th 2016, Manchester Niko Neufeld, CERN/EP (with a lot of help & input from Ken Wyllie)

Don’t believe anybody who claims to be able to predict electronics and computing beyond 6 years from now I am going to give you some guesses based on what is known, and on R&D programmes which have started already Since the decisions on electronics have to be taken much earlier than on computing, the landscape for future Front- und Back-end electronics is a bit more clear  This is IMHO the most relevant part today and in this one I follow very closely Ken Wyllie (although misunderstandings are of course my fault) N. Neufeld, LHCb Upgrade^2, Manchester

Thanks to Ken Wyllie, Marco Cattaneo for comments on the slides Special thanks to Ken for all the material in the electronics part And to the online-team (Beat, Rainer, etc…) for discussions And apologies for this rather dry, almost picture- free presentation 3 N. Neufeld, LHCb Upgrade^2, Manchester 2016

We (the Online) strongly believe that the Run3 system will work – the same system will be fine for Run 4, 5 and 6 No reason to (re-)introduce a trigger to save the DAQ Acceleration / Assists with CPUs, GPGPUs, FPGAs or whatever should be done *after* event-building using COTS hardware For Run5 Continue with full event-building at 40 MHz Output rate (to storage) only limited by what we can afford to store long-term Any increase in number of channels & occupancy can be (relatively) easily absorbed with more and/or faster links 4 N. Neufeld, LHCb Upgrade^2, Manchester 2016

Even with another factor 5 to 10 in luminosity will leave most (all) of our FEE considerably below of the (dose) rates anticipated for the GPDs So what works for them will certainly work for us 6 N. Neufeld, LHCb Upgrade^2, Manchester 2016

The main R&D is the massive amount of work targeting LS3 for ATLAS/CMS phase 2 upgrades (also a lot of work for ATLAS/CMS phase 1 upgrades, but these are for LS2 (LHCb upgrade #1) so will be already obsolete for LHCb upgrade 2 ) There are many developments that could prove useful for front-end electronics on the detectors, depending on what is envisaged for LHCb upgrade 2. Also remember that much (all) of the infrastructure will be obsolete and have to be replaced (power supplies, optical fibres, monitoring etc…) Continued widespread use of optical links (huge numbers in ATLAS & CMS trackers). Mainstream is again to use common projects driven by CERN: lpGBT (10Gb up-link, 2.56Gb down-link) Versatile Link+ (4/8-way TX array) 1.28 Gb electrical links developed for FE chips to connect to lpGBT. N. Neufeld, LHCb Upgrade^2, Manchester

65 nm will be mainstream (in HEP!): has clear density benefits for smaller pixels with more functionality (eg timing), but radiation tolerance is not obvious and requires careful thought & mitigation. 130 nm will continue for some time…… (but will be VERY old at time of LHCb upgrade #2) um currently used by Calo/RICH will be VERY VERY old at time of LHCb upgrade #2 (obsolete?) Attempts being made to share blocks more coherently (IP sharing). 8 N. Neufeld, LHCb Upgrade^2, Manchester 2016

3 ps resolution TDC targeted on 64-ch chip (looks very nice for TORCH – are the TORCH people actively discussing with this project…?) But so far no proposal for new amplifier/discriminator chip (like NINO) In-pixel timing already demonstrated with Timepix3 (1.56ns bins). More precision certainly possible. Being discussed for helping with pile-up in ATLAS/CMS. 9 N. Neufeld, LHCb Upgrade^2, Manchester 2016

ESA/ESTEC have funded design of a radiation- tolerant FPGA, but only to space levels of radiation  so not an obvious advantage over commercial FPGAs Commercial vendors will continue to improve mitigation of SEUs in configuration memory. Unfortunately, it’s not clear that they will improve radiation performance of IP blocks like PLLs used for serial links. 10 N. Neufeld, LHCb Upgrade^2, Manchester 2016

ATLAS is already using a track-trigger, using back-end cards using associated- memory ASICs ( very difficult to emulate in software ) This adds track-triggering and -information at Level-2 (first level software trigger in ATLAS) CMS tracker phase-2 upgrade will do more in front-end (pixel-strip & strip- strip modules giving ‘stubs’ for triggering) This is trigger is part of (the improved) Level-1 (corresponds to Level-0 in current LHCb- speak) This is a project driving many interconnect/module developments (flexes, TSVs……) for communicating hit info between tracker layers. High-granularity calorimetry: Lots of developments for ILC and now huge project for CMS end-cap based on Si pads. Big challenges for PCB & module construction, and requires low-power ADC & 50ps timing 11 N. Neufeld, LHCb Upgrade^2, Manchester 2016

Power: Serial powering unbeatable for low-mass stave-based detectors, will be used for CMS pixels -> future UT? New DC-DC convertors in 130 nm New bulk power supplies for long-distance powering Monitoring: First discussions on new ELMB based on GBT-SCA. No details yet. Do we need this? N. Neufeld, LHCb Upgrade^2, Manchester

Moore’s law... holds at the time of this writing: the number of transistors at equal cost still doubles every x months Intel has recently increased x from 24 to 30 After 14 nm today there will be 10 nm, 7 nm, 5 nm, … there will be probably an end to this shrinking (at least for CMOS in Si) by the time of LS4 Does not necessarily mean that Moore’s law in the sense above ends at the same time! In many commercially relevant areas this progress is invested not in more cores but in lower power consumption 14 N. Neufeld, LHCb Upgrade^2, Manchester 2016

“It’s an unwritten law in engineering that every generation thinks their challenges are the most difficult. Although new technical challenges continue to emerge, as they have every generation since the first VLSI chips were created, the outlook for Moore’s Law remains the same as it did twenty years ago; the path for the next few generations is visible, and after that, it gets hazy until we move forward.” Mark Stettler (Director of Process Technology Modeling), Intel 15 N. Neufeld, LHCb Upgrade^2, Manchester 2016

In the consumer area: portable devices and wear-ables --> low power In the server area: hyper-scale cloud providers: Google, Amazon, Facebook, Microsoft  scale, power consumption, high network bandwidth 16 N. Neufeld, LHCb Upgrade^2, Manchester 2016

There seems to be a slowing down of the number of “cache- coherent” cores  most codes (not only in HEP) cannot really use a coherency domain with more than 32 cores Does not necessarily mean less cores  Cavium ARM v8 server has 96 real cores (on two sockets) The ever increasing number of transistors is invested in co- processors, interconnects, assists to accelerate fixed functions (where it makes sense) and integrated FPGAs E.g. motion co-processor in Apple’s A series for the i-devices or Intel’s integration of the high-speed interconnect in the CPU Commercial clouds will offer select heterogeneous compute elements. Current best candidate: FPGAs  we should 17 N. Neufeld, LHCb Upgrade^2, Manchester 2016

The current server-world is x86 only It could be that there are more architectures Impossible to say today if and which, but we should (already today!) be ready to run on something else Crucial to test for reproducibility between architectures and with and without accelerators 18 N. Neufeld, LHCb Upgrade^2, Manchester 2016

There are many technologies being proposed already today. Time will tell which one is the most successful one We should use it in a commercially available form  do *not* build our own boards (e.g. if we want to use FPGAs) Should always look at the overall cost-effectiveness  multi-purpose has almost always the lowest Total Cost of Ownership (FPGAs over ASICs) Power-cost will be and should be a concern (FPGA over GPGPUs…. ASIC over FPGA) When choosing if and which accelerator to use in HLT or offline we should orient ourselves on industry == the commercial cloud providers, because this gives confidence in reproducibility, good software support and portability 19 N. Neufeld, LHCb Upgrade^2, Manchester 2016

20 All available today Network speed is typically 4x the SerDes N. Neufeld, LHCb Upgrade^2, Manchester 2016

Video drives the global internet (Netflix…) It is safe to assume that this will continue Bandwidth required for the DAQ should be extremely cheap by LS4, even if we add a factor 5 or so With a very modest data-reduction (factor 3, 2 or even 1) it should be possible, and probably commercially reasonable, to send assembled data off-site  could use commercial or scientific (private) clouds instead of running our own HLT farm Trade in CAPEX for OPEX: what do the funding agencies prefer? 21 N. Neufeld, LHCb Upgrade^2, Manchester 2016

22 N. Neufeld, LHCb Upgrade^2, Manchester 2016

23 Detector front-end electronics Eventbuilder network Eventbuilder PCs (software LLT) Eventfilter Farm ~ 80 subfarms Eventfilter Farm ~ 80 subfarms UX85B Point 8 surface subfarm switch TF C x 100 Gbit/s subfarm switch Online storage Clock & fast commands ~ 9000 Versatile Links for DAQ ~ 9000 Versatile Links for DAQ throttle from PCIe40 Clock & fast comman ds 6 x 100 Gbit/s ECS N. Neufeld, LHCb Upgrade^2, Manchester 2016

The DAQ is designed to easily scalable Just add Eventbuilder servers with PCIe40 cards By LS3 we will have: Lp-GBT (10 Gbit/s), PCIe40 Gen 2+ > 200 Gbit/s, Cheap 400 Gbit/s links in DAQ network Cost of eventbuilder should have come down by at least 50% so we could double the 2020 system at equal cost, part of this cost will be covered by natural refresh (servers after 6 – 7 years, network equipment after 8 – 10 years) 24 N. Neufeld, LHCb Upgrade^2, Manchester 2016

The current LHC computing grid model does not make much economical sense – it is a political wish Commercial clouds (hyper-scale providers) will drive the server market and the smaller we are relative to these the less economical it will be to run our own infrastructure My personal guess: Tier2 and Tier3 centres wil have disappearing by end of Run4, Tier1s to be seen, storage is still expensive in cloud and as of today there is no commercial driver for massive storage 25 N. Neufeld, LHCb Upgrade^2, Manchester 2016

For electronics a lot will be done for/with us by ATLAS and CMS, need to carefully justify own developments over adaptations where necessary, and in case start not too late! For DAQ there is no problem (up to a factor 10 at least), in particular no problem to integrate a 25-50% increase for Run4 HLT resources could be (partially) off-site  depends on the data-volume Computing will be more heterogeneous in the next few years, difficult to know who will be the “winners” at Run5 Accelerators look attractive today to save power and potentially also cost. Watch out for COTS and TCO 26 N. Neufeld, LHCb Upgrade^2, Manchester 2016