AGLT2 Site Report Shawn McKee/University of Michigan

Slides:



Advertisements
Similar presentations
IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.
Advertisements

AGLT2 Site Report Shawn McKee University of Michigan HEPiX Fall 2014 / UNL.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Spring 2014.
AGLT2 Site Report Benjeman Meekhof University of Michigan HEPiX Fall 2013 Benjeman Meekhof University of Michigan HEPiX Fall 2013.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Current Job Components Information Technology Department Network Systems Administration Telecommunications Database Design and Administration.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Lucien Boland and Sean Crosby Research Computing.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Shawn McKee/University of Michigan
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
AGLT2 Site Report Shawn McKee University of Michigan March / OSG-AHM.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.
ATLAS Tier-2 Storage Status AGLT2 OSG Storage Forum – U Chicago – Sep Shawn McKee / University of Michigan OSG Storage ForumShawn McKee1.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
INFN Site Report R.Gomezel November 5-9,2007 The Genome Sequencing University St. Louis.
IHEP Computing Center Site Report Shi, Jingyan Computing Center, IHEP.
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
Bob Ball/University of Michigan
The Beijing Tier 2: status and plans
Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing
Cluster / Grid Status Update
HEPSYSMAN Summer June 2017 Ian Loader Chris Brew
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Andrea Chierici On behalf of INFN-T1 staff
CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd
Stuart Wild. Particle Physics Group Meeting, January 2010.
AGLT2 Site Report Shawn McKee/University of Michigan
ATLAS Great Lakes Tier-2 Site Report
Southwest Tier 2 Center Status Report
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
Welcome! Thank you for joining us. We’ll get started in a few minutes.
ATLAS Sites Jamboree, CERN January, 2017
Southwest Tier 2.
Статус ГРИД-кластера ИЯФ СО РАН.
NET2.
PK-CIIT Grid Operations in Pakistan
Creating a Windows Server 2012 R2 Datacenter Virtual machine
Creating a Windows Server 2016 Datacenter Virtual machine
Creating a Windows 7 Professional SP1 Virtual machine
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
NSF cloud Chameleon: Phase 2 Networking
Hyper-Converged Technology a User's Perspective
ETHZ, Zürich September 1st , 2016
Cloud computing mechanisms
The Move Towards Windows 10
HEPSYSMAN Summer th May 2019 Chris Brew Ian Loader
RHUL Site Report Govind Songara, Antonio Perez,
Frontier Status Alessandro De Salvo on behalf of the Frontier group
HEPSYSMAN Summer June 2017 Ian Loader Chris Brew
Presentation transcript:

AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Mike Nila HEPiX Spring 2017 / Budapest

Site Summary The ATLAS Great Lake Tier-2 (AGLT2) is a distributed LHC Tier-2 for ATLAS spanning between UM/Ann Arbor and MSU/East Lansing. Roughly 50% of storage and compute at each site 9408 logical cores MCORE slots 950 (dynamic) + 10 (static) Additional 720 Tier-3 job slots usable by Tier-2 Average 10.21 HS06/slot 6.85 Petabytes of storage Total of 96.1 kHS06 Tier-2 services virtualized in VMware 5.5 (soon upgrading to 6.5) 2x40 Gb inter-site connectivity, UM has 100G to WAN, MSU has 10G to WAN, lots of 10Gb internal ports and 20 x 40Gb ports, 32x100G/40G or 64x50G/25G ports High capacity storage systems have 2 x 50Gb bonded links 40Gb link between Tier-2 and Tier-3 physical locations AGLT2 Site Report

AGLT2 Monitoring AGLT2 has a number of monitoring components in use As shown before we have: Customized “summary” page -------> OMD (Open Monitoring Distribution) at both UM/MSU Ganglia Central syslog’ing via ELK: Elasticsearch, Logstash, Kibana SRMwatch to track dCache SRM status GLPI to track tickets (with FusionInventory) We find this set of tools indispensable for proactively finding problems and diagnosing issues. AGLT2 Site Report

Personnel Reorganization Over the last year we have lost two tier-2 cluster administrators, one at UM and one at MSU For the near-term, we are not planning on hiring a replacement We have been very successful in getting good undergraduates to work on an hourly basis for us. MSU added Mike Nila (Engineering-tech) 20% time Looking to augment our current two undergraduate hourly students with 1-2 more to maintain “institutional” knowledge as students move on. AGLT2 Site Report

Hardware Additions Last summer we had a one-time infusion of NSF funds AGLT2 received $398K additional funds for hardware We negotiated good pricing with Dell and purchased: 55 R630s; 2xE2640v4 processors, 128GB ram, 2x800 SSDs 4 Storage nodes (R730xd+MD3460+MD3060e; 120x8TB) 2 MD3060e shelves (each 60x8TB disk) All with 5-year warranties Total addition was 24 kHS06, 4.8 PB(raw) / 3.84 PB(r6) All equipment was added and has been operational in 2017 We also spent remaining funds from last 5-year grant in January 18 R630 nodes, 4 N2048, 1 N4032F Added 7.8 kHS06 AGLT2 Site Report

Software Updates Since Last Mtg Tier-2 VMs rebuilt to use SL7 (old SL5) dCache updated to 3.0.11 and Postgresql 9.5 HTCondor now running version 8.4.11 OSG CE updated to 3.3.21 Various switch firmware updates applied Monitoring updates: OMD/check_mk to 1.2.8p18, ELK stack upgraded, custom monitoring upgraded. AGLT2 Site Report

Lustre at AGLT2 We have updated our Lustre storage, using new hardware and incorporating old servers The new Lustre server and storage shelves were racked in the Tier-2 center last year Lustre version 2.7.0 was installed and new file system created (Using ZFS) Old files from /lustre/umt3 were copied to the new system Old Lustre servers were then recycled to increase total storage in new file system New Lustre file system is (re)mounted at the old location, /lustre/umt3 Current AGLT2 lustre size is now 1.1 PB (retired some *old* storage) We continue to have challenges with older hardware and this combination of Lustre (2.7) + ZFS (0.6.4.2) Next up: go to Lustre 2.10+ZFS v0.7 when ready AGLT2 Site Report

Possible Relocation at UM I reported last year that AGLT2 may need to move to a new physical location at the University of Michigan Reminder: After exploring our requirements the University had the architects involved with the building renovation work around our Tier-2 center space. Update: Our college just purchased new batteries for the two APC Symmetra 80kW systems in that room. We were also informed that the room will not be altered. All good news. Hopefully we are set for at least the next 5 years. AGLT2 Site Report

Future Plans Participating in SC17 (simple infrastructure for 100G+) Likely will demo use of OSiRIS object stores as part of the ATLAS distributed event service Working on integration of OVS (Open vSwitch) on production storage and Software Defined Networking (OpenFlow). We are experimenting with SDN in our Tier-2 and as part of LHCONE point-to-point testbed. Working on IPv6 dual-stack for all nodes in our Tier-2 Have IPv6 address block for AGLT2 (spans UM/MSU) Our perfSONAR nodes are all dual-stacked and working fine. Dual-stacking our dCache system is our next step. Planned for sometime in the next month or two. AGLT2 Site Report

Summary Conclusion Tier-2 services and tools are evolving. Site continues to expand and operations are smooth. Monitoring stabilized, update procedures working well FUTURE: IPv6 for storage, SDN, SC17 Questions ? AGLT2 Site Report