HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
HEPiX 3 Global organization of service managers and support staff providing computing facilities for HEP community Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF … Meetings are held twice per year - Spring: Europe, Autumn: U.S./Asia Reports on status and recent work, work in progress & future plans - Usually no showing-off, honest exchange of experiences Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Outline Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja
HEPiX Autumn 2014 Oct 13 – 17, 2014 at the University of Nebraska Lincoln - Well organized, rich program - Eduroam, Indico (intervention, incident, power cut) 93 registered participants - Many first timers again - 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites - 45 sites represented 60 contributions - 96 slides (in 25 minutes!) words per slide … 5Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
6 Lincoln, Nebraska About 22 hours door to door …
HEPiX Autumn 2014 Oct 13 – 17, 2014 at the University of Nebraska Lincoln - Well organized, rich program - Eduroam, Indico (intervention, incident, power cut) 93 registered participants - Many first timers again - 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites - 45 sites represented 60 contributions - 96 slides (in 25 minutes!) words per slide … 7Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
HEPiX News Tony Wong (BNL) new HEPiX co-chair - 3-year term Next meetings - Spring 2015: Oxford (UK) March 23 – 27 - Autumn 2015: BNL (US) Oct 12 – 16 - Spring 2016: DESY Zeuthen (DE), Berlin/Potsdam (TBC) 8Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
HEPiX Working Groups IPv6 - Deployment/readiness following Tier structure - Experiments pushing for services at T1/T2 Benchmarking - Awaiting SPEC CPUv6 - Suggestion of a “fast” benchmark (minutes) 9Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Site Reports 15 site reports: T0, 7x T1s, 7x T2s (Move to) HTCondor still very visible - Talk from HTCondor team - INFN (on LSF now) will start evaluation KIT’s “Dropbox”: bwSync&Share - 8’000 users - Based on PowerFolder Ganeti used at multiple sites - VM cluster management tool from Google - Overall positive experience 10Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Site Reports Ceph - Still gaining momentum: many PoCs (RAL: 1PB, BNL: 3PB) - Vivid mail exchange, BoF Session in Oxford? Energy efficiency - No WG, but many activities (refurbishments) - “Energy accounting” discussions INFN still investigating micro-server options - Moonshot and other Avoton based solutions - Experiments seem fine with performance/power ratio During “dark data” cleanup NDGF deleted all ALICE tape data due to misunderstanding of what “NDGF data” means - ALICE::NDGF vs. ALICE::NDGF_tape - 200TB of data now being backfilled … 11Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
CERN Site Report “What about CERN?” 12Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
CERN Site Report “What about CERN?” “Are there ever power cuts at CERN?” 13Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
End User Services & OS Six talks in total, three from CERN - Thomas: CC7 - Borja: Issue tracking and VCS - Michail: FTS3 Scientific Linux / CentOS - FNAL SL team continue to provide Scientific Linux - No competition with other rebuilds - Rebuild from git.centos.org: difficult (as not supported) So, after the initial discussions at the Annecy meeting, the community seems to part ways … 14Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Virtualization Six talks in total, five from CERN - Laurence: Experiment’s Cloud Computing Adoption - Andrea: WLCG Monitoring - Helge: Volunteer Computing - Arne: Cloud Report, VM IO Performance RAL starting batch virtualization - “Burst batch into the cloud” - Successful PoC: Vacuum model integration with HTCondor GSI: MS Windows on KVM - Windows domain restructuring: all on VMs, all on KVM - Partly in prod (CA, TS), partly in testing (DC, Exchange) - No support issue 15Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Outline Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja
Storage and Filesystems Ten talks in total, five from CERN: – Luca: – EOS across 1000 km – CERNbox + EOS: Cloud Storage for Science – Andrea: DPM performance tuning hints for HTTP/WebDAV and Xrootd – Ruben: Experience in running relational databases on clustered storage – Liviu: SSD Benchmarking at CERN 17Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
OpenZFS on Linux OpenZFS Large set of features Independent of the Linux kernel LLNL: Three Lustre filesystems, ~100 PB, OpenZFS backend Moving to commodity JBODs Work ongoing for improving Linux boot time with large number of drives 18Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Ceph Based Storage Systems for RACF Deployment of same scale as at CERN Lots of performance and stability tests Object storage, block storage and file system (Ceph FS) On several platforms (including HP Moonshot) Different networking solutions 19Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Using XRootD to Minimize Hadoop Replication Hadoop replication via XRootD Reduced local Hadoop replication to 1 In case of corrupt local blocks: Request blocks via XRootD Cache locally Repair broken blocks locally in Hadoop 20Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Computing and Batch Systems 21 Six talks in total, one from CERN: Two presentations on benchmarking Four presentations on batch systems Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Benchmarking activities Intel Xeon E v3 (Haswell) Showing good performance Intel Avoton: very good HS06 / Watt ratio ARM 32-bit HS06 / Watt in between Xeon & Avoton 22Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Fast Benchmark Some requirements are clear: Open source Easy to run Small Others requirements not so clear: How fast? Reproducible? Reliable? Single core or multicore? 23Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Fast Benchmark Proposals Geant4 based Linux x86-64 & ARM Realistic detector geometry Footprint: 1/4 to 1/3 of real experiment CPU bound, no I/O LHCb fast benchmark Small python script, single threaded 24Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Next generation HEP-SPEC06 Next SPEC CPU benchmark (CPUv6) in beta Should be released before the end of the year Will probably not run with the default SLC 6 compiler Gcc on CentOS 7 should be fine, config file will be provided by GridKa 25Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Batch Systems All four talks about HTCondor: Two talks from developers Jérôme’s talk: HTCondor CERN Open Science Grid adopting HTCondor 26Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
IT Facilities and Business Continuity Three talks, two from CERN First Experience with the Wigner Data Centre Joint procurement of IT equipment and services UPS Monitoring with Sensaphone Multi-level / SMS alerting Gradual shutdown of servers in case of power cut or cooling failure Wireless temperature sensors used to build 3D heatmap 27Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
NeRSC New Computational Research and Theory (CRT) Building Year-round free air and water cooling PUE < 1.1 42 MW to building 12.5 MW provisioned 28Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Outline Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja
Networking and Security 30Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Four networking talks, two security, one from CERN - Stefan: Situational Awareness: Computer Security IPv6 Deployment - HEPiX Ipv6 Working Group: WLCG dual-stack services deployment. Testing - Open Sciences Grid: Client/Server are dual-stack? Server is but not the client? Infiniband Based Networking evaluation - Brookhaven National Laboratory (USA) ESNet: Extension to Europe - US Department of Energy - “Scientific progress will be completely unconstrained by the physical location of instruments, people, computational resources or data”
Basic IT Services 1/2 Seven talks, three from CERN - Ben: Configuration Services at CERN: Update - Rubén: Database on Deman: insight how to build your DbaaS - Aris: Ermis service for DNS Load Balancer configuration Monitoring with Nagios - NERSC – US Department of Energy - Monitoring clusters of 1000's of compute nodes 31Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Basic IT Services 2/2 CFEngine - ATLAS Great Lakes Tier 2 (AGLT2) - Change management: SVN → Push to production Puppet at USCMS-T1 – FermiLab - Modules + Data in Hiera approach. PuppetDashboard instead of TheForeman - Change management: Git branches → Push to production - Continuous Integration? Not yet but Beaker is the main candidate - Secrets? “hiera-eyaml” Not a good solution Puppet at BNL - RICH and ATLAS computing Facility - Emphasis in Change Management and Cultural Management - Test environments + self-approve delay - Looking for automatic testing 32Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary