Download presentation
0
ATLAS DAQ/HLT Infrastructure
H.P. Beck, M. Dobson, Y. Ermoline, D. Francis, M. Joos, B. Martin, G. Unel, F. Wickens. Acknowledgements: H. Burckhart, W. Iwanski, K. Korcyl, M. Leahu 11th Workshop on Electronics for LHC and future Experiments, September 2005, Heidelberg, Germany
1
ATLAS HLT and DAQ physical layout
SDX1 ROS – Read Out System SV – Supervisor DFM – Data Flow Manager USA15 Read-Out Subsystems underground 152 ROSes (max 12 ROLs per ROS) HLT and EB processor farms on the surface Rack-mounted PCs and network switches ~2500 components in ~120 racks
2
HLT and DAQ operational infrastructure components
Housing of HLT/DAQ system - 2 floor counting room in SDX1 building Metallic structure (electronics is heavy nowadays) Lighting, ventilation, mixed water piping and cable trays Power supply (from network and UPS) Housing and operation of rack-mounted PCs and network switches Rack selection and component mounting Power distribution and heat removal Power and network cabling Operational infrastructure monitoring Standard ATLAS DCS tools for rack parameter monitoring Linux and IPMI tools for individual PC internal monitoring Computing farm installation, operation and maintenance Rack configuration and cabling databases System administration
3
Counting room in SDX1 for DAQ/HLT system
Size of barrack constrained by crane, shaft to USA15 and existing walls of SDX Housing 100 racks 600 mm x 1000 mm, up to 52U (height floor to ceiling ~2600 mm) Metallic structure initially designed for 500 kg/rack, re-designed for 800 kg/rack Air-conditioning removes ~10% of the heat dissipated, other 90% are removed via water-cooling Design Temporary power supply Construction Lighting and ventilation Cable trays Water piping
4
Housing of equipment “Standard” ATLAS Investigated 2 options 52U rack
Modified “standard” 52U ATLAS rack for ROSes in USA15 positions already defined weight not an issue in USA15 uniform to other racks in USA15 Industry standard Server Rack for SDX1 (e.g. RITTAL TS8) bayed racks with partition panes for fire protection height and weight limits lighter racks, more flexible mounting for PCs Mounting of equipment Front mounted PC’s on supplied telescopic rails Rear/front mounted switches (on support angles if heavy) All cabling at rear inside the rack RITTAL TS8 47/52U rack
5
Cooling of equipment “Standard” ATLAS
52U rack Common horizontal air-flow cooling solution for ATLAS and RITTAL racks Outcome of joint “LHC PC Rack Cooling Project” Requirement: ~10 kW per rack A water-cooled heat exchanger fixed to the rear door of the rack CIAT cooler mounted on the rear door (+150 mm) 1800x300x150 mm3 Cooling capacity – 9.5 kW Water: in 15°C, out 19.1°C Air: in 33°C, out 21,5 °C 3 fan rotation sensors RITTAL 47U rack Water in/out
6
Powering of equipment in SDX1 (prototype rack study)
Powering problems and (as simple as possible) solutions High inrush current – D-curve breakers and sequential powering Harmonic content – reinforced neutral conductor with breaker 11 kVA of apparent power delivered per rack on 3 phases, 16A each ~10 kW of real power available (and dissipated) with typical PFC of 0.9 3 individually controllable breakers with D-curve for high inrush current First level of sequencing 3 sequential power distribution units inside the rack (e.g. SEDAMU from PGEP) Second level of sequencing 4 groups of 4(5) outlets, max 5A per group, power-on is separated by 200 ms Investigated possibilities for individual power control via IPMI or Ethernet controlled power units Ventilation Auxiliary Power to equipment Power Control
7
Powering of equipment – proposed final implementation (1)
Transformer (in front of SDX1) to power the switchboard Power from the switchboard delivered to 6 distributing cupboards on 2 levels Each cupboard controls and powers 1 row of racks (up to 18 racks) Two 3-phase cables under the false floor from the cupboard to each rack Switchboard Distributing cupboards Distribution cupboard 400 A breaker on the input 1 Twido PLC Read of ON/OFF and Electrical Fault status of all breakers in the row 16A breakers on the output One breaker per power cable Each breaker is individually turned ON by DCS via PLC
8
Powering of equipment – proposed final implementation (2)
3U distribution box on the bottom of each rack (front side) to provide distribution inside a rack 2 manual 3 phase switches on the front panel to cut power on 2 power lines Input and output cables on the rear panel Distribution from two 3 phase power lines to 6 single phase lines Flat cable distribution on the back side of the rack 6 cables from the distribution box to 6 flat cables 6-7 connectors on each flat cable PLC 2 x 16A D type 3 phase breakers 2 x 3 phase manual switches Rack Side Installation is designed to sustain 35 A of inrush current (peak) from 1 PC for ~35-40 PCs per rack
9
Monitoring of operational infrastructure
SDX1 TDAQ computing room environment monitored by: CERN infrastructure services (electricity, C&V, safety) and ATLAS central DCS (room temperature, etc) Two complementary paths to monitor TDAQ rack parameters by available standard ATLAS DCS tools (sensors, ELMB, etc.) by PC itself (e.g. – lm_sensors) or farm management tools (e.g. – IPMI) What is monitored by DCS inside the racks: Air temperature – 3 sensors Inlet water temperature – 1 sensor Relative humidity – 1 sensor Cooler’s fan operation – 3 sensors What is NOT monitored by DCS inside the racks: Status of the rear door (open/closed) Water leak/condensation inside the cooler Smoke detection inside the rack Quartz TF 25 NTC 10 kOhm Precon HS-2000V RH
10
Standard ATLAS DCS tools for sensor readout
The SCADA system (PVSS II) with OPC client / server The PC (Local Control Station) with Kvaser PCIcan-Q (4 ports) CAN power crate (16 branches of 32 ELMBs) and CANbus cable The ELMB, motherboards and sensor adapters Kvaser ELMB CAN power crate Motherboard
11
Sensors location and connection to ELMB
Sensors location on the rear door of TDAQ rack All sensor signals (Temperature, Rotation, Humidity) and power lines are routed to a connector on the rear door to simplify assembly Flat cables connect these signals to 1 of 4 ELMB motherboard connectors 3 connectors receive signals from 3 racks, 1 spare connector for upgrades 1 ELMB may be used for 3 racks CANbus cables To next rack To PSU
12
Use of internal PC’s monitoring
Most PC's now come with a hardware monitoring chips (e.g. LM78) Onboard voltages, fan status, CPU/chassis temperature, etc. a program, running on every TDAQ PC, may use lm_sensors package to access parameters and send this information using DIM to DCS PC IPMI (Intelligent Platform Management Interface) specification platform management standard by Intel, Dell, HP, and NEC a standard methodology for accessing and controlling bare-metal hardware, even without software installed or running based on a specialized micro-controller, or Baseboard Management Controller (BMC) - it is available even if system is powered down and no OS loaded Supermicro IPMI Card
13
Preparation for equipment installation
Design drawing Rack content 6 Pre-Series racks Rack numbering (Y D1)
14
Computing farm – Pre-Series installation
Pre-Series components in CERN (few % of the final size) Fully functional, small scale, version of the complete HLT/DAQ Installed in SDX1 lower level and USA15 Effort for physical installation Highlight and solve procedural problems before we get involved in the much larger scale installation of the full implementation Will grow in time – 2006: + 14 racks, 2007: + 36 racks…
15
Cabling of equipment All cables are defined and labeled:
608 individual optical fibers from ROS to Patch Panels 12 bundles of 60 fibers from USA15 patch panels to SDX1 patch panels Individual cables from Patch Panels to the central switches and then to PCs Cables labeling is updated after installation Cable installation: tries to minimize cabling between racks, to keep cabling tidy and to conform to minimum bend radii not using cable arms but uncable a unit before removal
16
Computing farm – system management
System Management of HLT/DAQ has been considered by SysAdmin Task Force, topics addressed: Users / Authentication Networking in general Booting / OS / Images Software / File Systems Farm Monitoring How to Switch on/off nodes Remote Access & Reset with IPMI IPMI daughter card for PC motherboard - experience with v1.5. which allows access via LAN; to reset, to turn off & on; to login as from the console; to monitor all sensors (fan, temperature, voltages etc.) Cold start procedure tested recently for Pre-Series: IPMI used to powerdown/boot/monitor machines scripts for IPMI operations (booting all EF nodes etc...) are being written
17
Computing farm – Point 1 system architecture
Central Server 1 Central Server 2 Local Server 1 Local Server 2 Local Server n Client 1 Client n Service path Sync path Sync paths Alternative sync paths ATCN CPN Gateway (login necessary) CERN IT Bypass Users access CTN Clients are netbooted. CERN Public Network CERN Technical Network ATLAS Technical and Control Network
18
Computing farm – file servers/clients infrastructure
Gateway & Firewall services to CERN Public Network ssh access Tree servers/clients structure Central File Server - 6 Local File Servers - ~70 clients all files come from a single (later a mirrored pair) Central File Server All clients are net-booted and configured from Local File Servers (PC’s & SBC’s) maximum of ~30 clients per boot-server to allow scaling up to 2500 nodes a top-down configuration (from machine specific / detector specific / function specific / node specific) provided when modified, the atlas software is (push) synced from top down to each node with disk software installation mechanism and responsibilities being discussed with Offline and TDAQ Node logs are collected on the local servers for post-mortem analysis if needed
19
Computing farm – Nagios farm management
All farm management related functions are unified under a generic management tool Nagios (LFS and Clients): a unique tool to view the overall status, issue commands, etc… using IPMI where available otherwise ssh and lm_sensors mail notification for alarms (e,g, - temperature) DCS tools will be integrated
20
Conclusions: Good progress made in developing final infrastructure for the DAQ/HLT system power and cooling - has become a major challenge in computer centers installation monitoring and farm management Pre-series installation has provided invaluable experience to tune/correct the infrastructure, handling and operation Making good progress towards ATLAS operation in 2007 Looking forward to the start of physics running
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.