HEPiX Fall 2016 Summary Report

Slides:



Advertisements
Similar presentations
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Advertisements

Cloud Computing to Satisfy Peak Capacity Needs Case Study.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
5205 – IT Service Delivery and Support
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
HEPiX Summary Martin Bly HEPSysMan - RAL, June 2013.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
HEPiX Spring 2012 Highlights Helge Meinhard CERN-IT GDB 09-May-2012.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
HEPiX Summary Fall 2014 – Lincoln, Nebraska Martin Bly.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
HEPiX report Autumn/Fall 2015 HEPiX meeting Brookhaven National Lab, Upton NY, U.S.A. Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015 (with.
HEPiX Fall 2009 Highlights Michel Jouvin LAL, Orsay November 10, 2009 GDB, CERN.
HEPiX report Spring 2016 HEPiX meeting DESY Zeuthen (Berlin), Germany Helge Meinhard, CERN-IT Grid Deployment Board 11-May Helge Meinhard (at)
Farming Andrea Chierici CNAF Review Current situation.
HEPiX Fall 2011 Summary Report QMUL 10 November 2011 Martin Bly, STFC-RAL.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
HEPiX spring 2013 report HEPiX Spring 2013 CNAF Bologna / Italy Helge Meinhard, CERN-IT Contributions by Arne Wiebalck / CERN-IT Grid Deployment Board.
INFN Site Report R.Gomezel November 5-9,2007 The Genome Sequencing University St. Louis.
NIIF Cloud Infrastructure and Services EGI Technical Forum September 20, 2011 Lyon, France Ivan Marton.
Canadian Bioinformatics Workshops
Academia Sinica Grid Computing Centre (ASGC), Taiwan
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Virtualisation & Containers (A RAL Perspective)
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
WLCG IPv6 deployment strategy
Monitoring Evolution and IPv6
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
WLCG Workshop 2017 [Manchester] Operations Session Summary
CERN Cloud Service Update
Organizations Are Embracing New Opportunities
C Loomis (CNRS/LAL) and V. Floros (GRNET)
LHCOPN/LHCONE status report pre-GDB on Networking CERN, Switzerland 10th January 2017
The advances in IHEP Cloud facility
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
ATLAS Cloud Operations
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Openlab Compute Provisioning Topics Tim Bell 1st March 2017
INFN Computing infrastructure - Workload management at the Tier-1
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
HEPIX 2017 Budapest Summary Report
HEPIX Fall 2016 Summary Report.
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
AGLT2 Site Report Shawn McKee/University of Michigan
Helge Meinhard, CERN-IT Grid Deployment Board 10-May-2017
Introduction to HEPiX Helge Meinhard, CERN-IT
WLCG Collaboration Workshop;
Red Hat User Group June 2014 Marco Berube, Cloud Solutions Architect
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Scientific Computing At Jefferson Lab
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
The Globus Toolkit™: Information Services
Cloud computing mechanisms
HEPiX Spring 2009 Highlights
Ivan Reid (Brunel University London/CMS)
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
OpenStack Summit Berlin – November 14, 2018
Containers and DevOps.
Presentation transcript:

HEPiX Fall 2016 Summary Report Martin Bly HEPSysMan QMUL November 2016

HEPiX Fall 2016 @ Berkeley - Summary Report Outline Stats Tracks Next HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report Stats (yawn) ;-) 96 registrations 74 accepted abstracts Site Reports (25) End-User IT Services & Operating Systems (2) Security & Networking (11) Storage & Filesystems (12) Batch & Computing Services (7) IT Facilities & Business Continuity (4) Basic IT Services (5) Grid, Cloud & Virtualization (8) HPC BoF >60 people toured the NERSC Data centre (> 4 groups) HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report Site Reports GPFS, Hadoop, Dropbox (CERNBox @ CERN) reported from many sites AFS is slowly dying with little enthusiasm for support of either OpenAFS or commercial derivatives No common one-to-one replacement solution Integration of HPC and HTC activities and resources More adoption of OpenStack/Docker to facilitate integration of cloud resources Lots of monitoring solutions though displays all look like Grafana to the untrained eye Some SDN R&D, upgrading of WAN capacity HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report End-User IT Services Usual Scientific Linux update from FNAL RHEL7.3 in private beta Custodia, XFS updates, performance updates for auditd, beta kernel has GPIO enabled v7.3 in Beta (targeted at cloud provisioning – need feedback) support for container/VM provisioning Talk from DESY about their email systems Host 70+ email domains, 6500 mailboxes, 300k messages daily Replacing commercial stack with mostly open source components Quarantining to reduce spam, phishing, spam, viruses, spam, spam … HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report Network and Security Security @ NERSC BRO provides IDS, monitoring & event correlation data Networking to cloud providers Need better bandwidth to cloud providers Issues with NREN carrying cloud traffic Different approaches taken by ESNET and GEANT Report by IPv6 working group WLCG proposed timelines for IPv6 support migration underway at many levels Need to address security concerns CERN presentation on intrusion detection Decouples network decision logic from traffic flow SDN-based(OpenFlow, OpenDaylight and BFO) Better control over traffic distribution Network analytics project at U. of Chicago Instrumented to detect anomalies, alarms, with existing tools (ELK, perfSONAR…) Can be used for data/job placement, improved productivity – applicable beyond ATLAS HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Storage and Filesystems Spectrum Scale (formerly GPFS) upgrades Possible (partial) replacement for AFS Several changes aimed to improve performance New configuration parameters Better communication (lower latency) CephFS presentations by Australia, RAL and BNL Works well, but still buggy – users advised not to rely exclusively on it Significant development and steep learning curve for system administrators HA dCache presentation by NeIC HA pair deployed recently with two virtual machines – work in progress CERN presentation on EOS, DPM and FTS EOS is a high-performance, disk-only storage DPM – storage manager at smaller sites FTS – data distribution across WLCG infrastructure v3.6 to be released soon HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Storage and Filesystems II ZFS on Linux OSiRiS update AGLT2) Software-defined storage (projected supported by NSF involving several US universities) – based on Ceph Provides common storage infrastructure to participants – reduces cost Participants include HEP, Earth Sciences, Life Sciences, etc Latency test at SC16 to measure how far it can be stretched Working to integrate with ATLAS Status of AFS at CERN 35k users and 450 TB of storage, Used for $HOME, working directory, shared space, etc Replacements studies: possibly, combination of CERNBOX, EOS, CVMFS and others – software maturity issues, need to sort out workflow consequences Possible end-of-service ~2019 (before end of LS2) AuriStor presentation KAFS within Linux kernel – integrates with AFS and AuriStorFS servers Migration from OpenAFS to AuriStorFS requires no downtime Container support in AuriStorFS HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report Computing and Batch JLAB presentation on USQCD computing project KNL cluster production allocation began recently Adding additional hardware to KNL cluster soon Rated at 329 Tflops without additional hardware—will rerun soon Integration of ARM64 and Power8 at CERN Benchmarked with HS06 – comparable to Xeon processors but many more cores Significant higher power consumption – not as efficient as Xeon UCSD: dynamic provisioning of cloud resources using HTCondor Focused on AWS access to evaluate feasibility of moving workflow to clouds HTCondor presentation on latest updates Slurm support, Improved support for OpenStack and AWS Support for Singularity (container system like Docker) Data intensive applications at PDSF and Genepool (NERSC) Workflows pose different challenges than MPI-based applications at Cori Supported by Slurm (batch) and GPFS (storage) HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Facilities and Infrastructure CERN OpenCompute project Simplify and standardize hardware procurement to lower costs and minimize customization DC power at rack-level OpenCompute responses not competitive in regular procurements (yet) New Data Centers at CERN Renovation and new DC to accommodate more T0 capacity and HLTF’s for LHCb and Alice Consider GSI’s Green Cube model – want it ready by 2020 GSI green cube status report Operational since Spring 2016 – no major infrastructure issues since 13k cores and 20 PB over 50 racks by end of year New enhanced monitoring HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

HEPiX Fall 2016 @ Berkeley - Summary Report Basic IT Services BNL presentation on HTCondor monitoring Load-balancing at RAL HAProxy and KeepAlived (floating IP) Instrument process to check if back-up server is healthy Scaleable solution to address multiple failure scenarios Works well with FTS3, OpenStack now in the future add GridFTP, S3 API Puppet at the Australian ATLAS sites Migrated from Cfengine to Puppet a few years ago Gradually puppetizing most (not all) servers – legacy reasons Migration to CentOS7, KVM prompted adoption of “best practices” ELK deployment at KEK Issue of access control features – Kerberos-based controls installed but upgrades to ELK broke them Develop plugins for better portability Overhead of these enhancements is 80-120 ms/query (13% overhead on indexing throughput) HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Grid, Cloud and Virtualisation HEP workloads on NERSC HPC resources Shifter (now open-source) enables Docker-like containers to provide static environment suitable for HEP workflow Burst buffer – dynamic allocation of high-performance filesystems for data-intensive workflows Slurm and SDN used to improve bandwidth to local resources Container orchestration at RAL Aim to manage existing services and provide more services with less effort Apache Mesos for service discovery & management and Docker-based images in private registry – scalable so far Future plans Integration with configuration management system Ceph-based persistent storage for containers OpenStack hypervisors in containers – cloud/batch share resources HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Grid, Cloud and Virtualisation II CNAF expansion on to cloud providers: CMS jobs with Aruba (Italian cloud provider) - works with lower efficiency (49% vs 80%) Bari resources Appears as local resources – good efficiency - Increased CNAF throughput by ~8% KIT also working on extending local HPC resources and commercial cloud providers Resource integration with HTCondor/ROCED ROCED rtualization worked well – continued development Helix Nebula Science (HNSCi) Cloud report 30-month project to address looming computing & data challenges develop hybrid cloud model for European Scientific community Many institutions and science fields participating More prominent usage of public cloud resources—several technical challenges to be addressed IHEP Cloud Project Support for various projects -- 2k cores, 900 TB storage and 200 users OpenStack-based provisioning – self-service virtual platforms and computing environment VPManager provides dynamic scheduling architecture –worked well and there are future expansion plans HEPiX Fall 2016 @ Berkeley - Summary Report 07/11/2016

Next meetings Next Spring meeting Next Fall meeting Apr. 24-28, 2017 @ Wigner (Budapest, Hungary) HEPIX website updated Next Fall meeting HEPIX returns to Asia Oct. 16-20, 2017 @ KEK (Tsukuba, Japan) Mark your calendars

Questions?