Download presentation
Presentation is loading. Please wait.
Published byJessie Hart Modified over 8 years ago
1
Niko Neufeld, CERN
2
Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated throughput, about 500 data-sources with 100 Gigabit/s each More than 10000 optical fibres from the detector At least 2000 servers openlab technical WS 6/11/15 - Niko Neufeld 2
3
Detector front-end electronics Eventbuilder network Eventbuilder PCs/PCIe40 Eventfilter Farm ~ 80 subfarms Eventfilter Farm ~ 80 subfarms UX85B Point 8 surface subfarm switch TFC 500 6 x 100 Gbit/s subfarm switch Online storage Clock & fast commands ~ 9000 Versatile Links for DAQ ~ 9000 Versatile Links for DAQ throttle from PCIe40 Clock & fast commands 6 x 100 Gbit/s ECS openlab technical WS 6/11/15 - Niko Neufeld 3 300 m
4
Arria10 FPGA PCIe Gen3 x 16 == 100 Gbit/s Up to 48 optical input links Will have > 500 in experiment Used also by ALICE, and … maybe … who knows… 4 openlab technical WS 6/11/15 - Niko Neufeld
5
6.4 PB net storage / 12.8 raw openlab technical WS 6/11/15 - Niko Neufeld 5
7
480 - 960 optical fibres (on 40 – 80 MPO12) 10 2 U I/O servers with 2 x 100 Gbit/s interface 36 compute servers taking between 20 to 40 Gbit/s (each) 1 – 2 PB of storage ~ 40Tbit/s network I/O to network (full duplex) openlab technical WS 6/11/15 - Niko Neufeld 7
8
Vendor neutral Public tender every time Long lived facility > 10 years Has to grow “adiabatically” – unlike a super-computer we can’t throw away things after 4 years Upgradeable Cost, Cost, Cost Tight cost-efficient integration of compute, storage and network Should be flexible to also accommodate accelerators (Xeon/Phi, FPGA) if they prove efffective Power electricity at CERN is cheap, but we want to be green and reduce running costs openlab technical WS 6/11/15 - Niko Neufeld 8
10
Need temporary storage to wait for calibration and alignment – and to profit from no-beam time Current model: completely local storage as a software RAID1 of 4 TB on each node File management by scripts and control software No common name-space 100% overhead Capacity oriented Streaming I/O only, single reader / single writer, typically max 4 streams / RAID set, aggregated I/O low 10 – 20 MB/s openlab technical WS 6/11/15 - Niko Neufeld 10
11
Operational: No common name-space Disk-failure during data-taking can cause several problems Controller or both disks failed node needs to be excluded from data-taking Disk does not actually fail but becomes “slow” because of errors node accumulates backlog of unprocessed data Rebuild can affect performance Inaccessible data (even temporary) block all data from further processing (because offline data-sets are treated as a “whole”) openlab technical WS 6/11/15 - Niko Neufeld 11
12
Basically disk and I/O requirements / node go up by 10x Need cost-efficient solution Still looks attractive to have disks in each node vs NAS / rack, disaggregated shelves see challenge 1 Can we have better efficiency RAID5,6,7? Would love to have common name-space, posix or not? openlab technical WS 6/11/15 - Niko Neufeld 12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.