AGLT2 Site Report Shawn McKee/University of Michigan

Slides:



Advertisements
Similar presentations
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Fall 2014 / UNL.
Advertisements

What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Spring 2014.
AGLT2 Site Report Benjeman Meekhof University of Michigan HEPiX Fall 2013 Benjeman Meekhof University of Michigan HEPiX Fall 2013.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
OSG Public Storage and iRODS
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 1 MSE Virtual Appliance Presenter Name: Patrick Nicholson.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Shawn McKee/University of Michigan
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
AGLT2 Site Report Shawn McKee University of Michigan March / OSG-AHM.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Storage at the ATLAS Great Lakes Tier-2 Tier-2 Storage Administrator Talks Shawn McKee / University of Michigan OSG Storage ForumShawn McKee1.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
HEPiX spring 2013 report HEPiX Spring 2013 CNAF Bologna / Italy Helge Meinhard, CERN-IT Contributions by Arne Wiebalck / CERN-IT Grid Deployment Board.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
Status of BESIII Distributed Computing
WLCG Network Discussion
High Availability Planning for Tier-2 Services
Bob Ball/University of Michigan
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Andrea Chierici On behalf of INFN-T1 staff
Update on Plan for KISTI-GSDC
Luca dell’Agnello INFN-CNAF
Support for IPv6-only CPU – an update from the HEPiX IPv6 WG
Readiness of ATLAS Computing - A personal view
Oxford Site Report HEPSYSMAN
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Update from the HEPiX IPv6 WG
ATLAS Sites Jamboree, CERN January, 2017
Статус ГРИД-кластера ИЯФ СО РАН.
AGLT2 Site Report Shawn McKee/University of Michigan
NET2.
Brookhaven National Laboratory Storage service Group Hironori Ito
TYPES OF SERVER. TYPES OF SERVER What is a server.
Creating a Windows Server 2012 R2 Datacenter Virtual machine
Creating a Windows Server 2016 Datacenter Virtual machine
WLCG and support for IPv6-only CPU
Creating a Windows 7 Professional SP1 Virtual machine
HEPiX IPv6 Working Group F2F Meeting
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
NSF cloud Chameleon: Phase 2 Networking
ETHZ, Zürich September 1st , 2016
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
The Move Towards Windows 10
IPv6 update Duncan Rand Imperial College London
RHUL Site Report Govind Songara, Antonio Perez,
Pete Gronbech, Kashif Mohammad and Vipul Davda
Kashif Mohammad VIPUL DAVDA
Presentation transcript:

AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Mike Nila, Forest Phillips HEPiX Fall 2017 / KEK Tsukuba, Japan

AGLT2 Numbers The ATLAS Great Lake Tier-2 (AGLT2) is a distributed LHC Tier-2 for ATLAS spanning between UM/Ann Arbor and MSU/East Lansing. Roughly 50% of storage and compute at each site 9408 logical cores MCORE slots 950 (dynamic) + 10 (static) Additional 720 Tier-3 job slots usable by Tier-2 Average 10.21 HS06/slot 6.85 Petabytes of storage Total of 96.1 kHS06 Tier-2 services virtualized in VMware 5.5 (soon upgrading to 6.5) 2x40 Gb inter-site connectivity, UM has 100G to WAN, MSU has 10G to WAN, lots of 10Gb internal ports and 20 x 40Gb ports, 32x100G/40G or 64x50G/25G ports High capacity storage systems have 2 x 50Gb or 2 x 100Gb bonded links 40Gb link between Tier-2 and Tier-3 physical locations AGLT2 Site Report

AGLT2 Monitoring AGLT2 has a number of monitoring components in use As shown before we have: Customized “summary” page -------> OMD (Open Monitoring Distribution) at both UM/MSU Ganglia Central syslog’ing via ELK: Elasticsearch, Logstash, Kibana SRMwatch to track dCache SRM status GLPI to track tickets (with FusionInventory) We find this set of tools indispensable for proactively finding problems and diagnosing issues. AGLT2 Site Report

Personnel Updates Over the last year we have lost two tier-2 cluster administrators, one at UM and one at MSU As of September 1, 2017 we have added a new physics grad-student at MSU working on AGLT2 at 50% time: Forest Phillips I have the sad task of announcing that Bob Ball our Tier-2 Manager since we started in 2006, plans to retire next year  Bob will likely start a phased retirement next April and be fully retired by about 1 year from now. He will be missed! Hopefully Bob can make it to the Spring 2018 HEPiX! For those interested, look for a job opening announcement around January/February of 2018 AGLT2 Site Report

Hardware Additions We are late on our FY17 hardware purchasing Plan is to try to revive Dell ATLAS portal (at least for USATLAS sites) Skylake processors look good but pricing is higher than before Focus will be on computing (rather than storage) Will buy some VMware storage at MSU Need storage under warranty: will go with SAS attached MD3460, 20x1.2TB 10K SAS disks plus 40x6TB 7.2K NL-SAS disks MSU will add a third VMware ESXi host (repurpose WN) Both MSU and UM sites may also purchase more 10G network ports Anticipate adding ~10 K-CPU-benchmarks of CPU Overall storage will stay around 6.8 PB till our next purchase AGLT2 Site Report

Software Updates Since Last Mtg Tier-2 VMs rebuilt to use SL7 (old SL5) dCache updated to 3.0.28 (still Postgresql 9.5) HTCondor running version 8.4.11 OSG CE updated to 3.3.25 Various switch firmware updates applied Monitoring updates: OMD/check_mk to 1.4.0p13 Two “major” updates: Lustre and dCache/OVS AGLT2 Site Report

Lustre at AGLT2 Lustre at AGLT2 is used for “local” users (Tier-3) About 1 PB of space running on SL6/Lustre 2.7/ZFS 0.6.5.4 Upgrade targets: OSS and metadata servers OS updated to SL7.4 Kernel 3.10.0-693.2.2 Lustre 2.10.1 ZFS 0.7.2 on the OSS, ldiskfs on MGS/MGT 20Gb/s bonded Ethernet on OSS Lustre Re-Exported to MSU WN via NFS from Lustre SL7.4 client Same OS and kernel as on the OSS Data read rate shows significant increase (~x2; up to 2GB/s aggregate seen) over older SL6/2.7 combination using older SL6/2.7 clients AGLT2 Site Report

dCache and Open vSwitch AGLT2 has been planning to implement Open vSwitch (OVS) for almost a year Goal: provide a mechanism to test Software Defined Networking for real LHC physics tasks OVS gives us a means to get to the “source & sink” of data (visibility and control) Starting in September we began testing the OVS instructions worked out with help from Ben and Galen Mack-Crane / CORSA: https://www.aglt2.org/wiki/bin/view/Main/Open_vSwitch/InstallOVSonSL73 As of the beginning of this month, all the AGLT2 dCache storage servers are running SL7.4 and have their public IPs on OVS Installing OVS and cut-over the IP to the virtual switch was done “hot” on production servers. Approximately 40 second downtime not a problem for transfers https://www.aglt2.org/wiki/bin/view/Main/Open_vSwitch/ConfigureOpenvSwitchSL6 Have been running for almost 2 weeks without any issues. We are now ready to start LHCONE P2P experiments and have interest from other ATLAS sites in trying this. AGLT2 Site Report

Future Plans Participating in SC17 Will demo use of OSiRIS storage service: sign up on the floor and use the storage during SC17 and possible use of object store for ATLAS Demo with MWT2 and PRP: ML on ATLAS calorimetry data AND on perfSONAR analytics. Data will be sourced from AGLT2 Experimenting with SDN/OVS in our Tier-2 and as part of LHCONE point-to-point testbed. Update to VMware soon: new version (5.5->6.5), new configuration for HA, new ESXi host at MSU Working on IPv6 dual-stack for all nodes in our Tier-2 Have IPv6 address block for AGLT2 (spans UM/MSU) Our perfSONAR nodes are all dual-stacked and working fine. Dual-stacking our dCache system is our next step. Planned for sometime in the next month or two. Moving all worker nodes to SL7 in the next two months AGLT2 Site Report

Summary Conclusion Monitoring stabilized, update procedures working well Tier-2 services and tools are evolving. Site continues to expand and operations are smooth. FUTURE: IPv6 for storage, SL7 WN, SDN testing, SC17 Questions ? AGLT2 Site Report