AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.

Slides:



Advertisements
Similar presentations
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Fall 2014 / UNL.
Advertisements

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Adam Duffy Edina Public Schools.  The heart of virtualization is the “virtual machine” (VM), a tightly isolated software container with an operating.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Site Report: ATLAS Great Lakes Tier-2 HEPiX 2011 Vancouver, Canada October 24 th, 2011.
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Spring 2014.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
AGLT2 Site Report Benjeman Meekhof University of Michigan HEPiX Fall 2013 Benjeman Meekhof University of Michigan HEPiX Fall 2013.
For more notes and topics visit:
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
Slide 1 Experiences with NMI R2 Grids Software at Michigan Shawn McKee April 8, 2003 Internet2 Spring Meeting.
Planning and Designing Server Virtualisation.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
The production deployment of IPv6 on WLCG David Kelsey (STFC-RAL) CHEP2015, OIST, Okinawa 16 Apr 2015.
Adam Duffy Edina Public Schools.  Traditional server ◦ One physical server ◦ One OS ◦ All installed hardware is limited to that one server ◦ If hardware.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
The HEPiX IPv6 Working Group David Kelsey EGI TF, Prague 18 Sep 2012.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Spring 2012 Prague, Czech Republic April 23 th, 2012 April 23 th, 2012.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Shawn McKee/University of Michigan
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Simple Infrastructure to Exploit 100G Wide Are Networks for Data-Intensive Science Shawn McKee / University of Michigan Supercomputing 2015 Austin, Texas.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
AGLT2 Site Report Shawn McKee University of Michigan March / OSG-AHM.
IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
ATLAS Tier-2 Storage Status AGLT2 OSG Storage Forum – U Chicago – Sep Shawn McKee / University of Michigan OSG Storage ForumShawn McKee1.
VM Technologies (and others) for Site and Service Resiliency Shawn McKee/University of Michigan OSG AHM - Lincoln, Nebraska March 19 th, 2012.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
HEPiX spring 2013 report HEPiX Spring 2013 CNAF Bologna / Italy Helge Meinhard, CERN-IT Contributions by Arne Wiebalck / CERN-IT Grid Deployment Board.
Dynamic Extension of the INFN Tier-1 on external resources
Bob Ball/University of Michigan
AGLT2 Site Report Shawn McKee/University of Michigan
Welcome! Thank you for joining us. We’ll get started in a few minutes.
AGLT2 Site Report Shawn McKee/University of Michigan
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
ETHZ, Zürich September 1st , 2016
Presentation transcript:

AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016 / Zeuthen

AGLT2 Site Report Site Summary The ATLAS Great Lake Tier-2 (AGLT2) is a distributed LHC Tier-2 for ATLAS spanning between UM/Ann Arbor and MSU/East Lansing. Roughly 50% of storage and compute at each site 7116 single core job slots (Added 1104 slots (23xR630s); E v3, 192 GB ram, 4x500G NL-SAS each) MCORE slots 550 (dynamic) + 10 (static) 720 Tier-3 job slots usable by Tier-2 Average 9.65 HS06/slot 3.7 Petabytes of storage Total of 68.6 kHS06 Most Tier-2 services virtualized in VMware 2x40 Gb inter-site connectivity, UM has 100G to WAN, MSU has 10G to WAN, lots of 10Gb internal ports and 16 x 40Gb ports High capacity storage systems have 2 x 10Gb bonded links 40Gb link between Tier-2 and Tier-3 physical locations 2

AGLT2 Site Report AGLT2 Monitoring AGLT2 has a number of monitoring components in use As shown in before we have: Customized “summary” page-> OMD (Open Monitoring Distribution) at both UM/MSU Ganglia Ganglia ElasticsearchLogstashKibana Central syslog’ing via ELK: Elasticsearch, Logstash, Kibana SRMwatch SRMwatch to track dCache SRM status GLPI GLPI to track tickets (with FusionInventory) 3

AGLT2 Site Report Personnel Changes AGLT2 After many years with AGLT2 Ben Meekhof is moving on. Fortunately for us it is only across campus. OSiRIS Ben is the new OSiRIS Project Engineer (See OSiRS talk yesterday). AGLT2 Many, many thanks to Ben for all his hard work and innovations for AGLT2. He will be missed. We interviewed 9 candidates as a replacement (out of about 18 applications) and selected Ryan Sylvester as the best candidate. Ryan started near the end of March and we hope he will be able to attend a future HEPiX meeting in person. 4

AGLT2 Site Report5 Hardware Changes Since Last Mtg We added 13 Dell R630s in November at UM and 10 at MSU in February (2x E GHz, 24/48 Cores, 192GB, 2x10G,4x500G SAS 7.2K) Found memory was unbalanced via HS06 runs Distributed memory to create 128GB and 256GB hosts Working well now over 20Gbps bonded links Retired ~10 PE1950s to make space for these Purchased a Mellanox SN2700 switch/router (32x100G ports) in November 2015 as part of a bundle including 12 ConnectX-4 NICs (4xDual-25G, 4xDual-50G and 4xDual-100G) and associated 1m cables Have yet to integrate it because of issues getting working 40G active optical cables (Mellanox Juniper) Will be core aggregation; many of our dCache storage nodes to be connected at 25G+ (QSFP28 cables still hard to find)

AGLT2 Site Report6 Software Updates Since Last Mtg Last Tier-2 SL5 VMs updated to SL7 (rebuilt) dCache updated in Nov15 to and then in Feb16 to – Associated update of Postgresql 9.3->9.5 Condor updated in Dec to and then, in March to 8.4.3, then to (bugfix) OSG CE install updated in Oct and then in March to Various switch firmware updates applied in February; bios/firmware on Dell systems in March

AGLT2 Site Report7 Lustre at AGLT2 We have updated our Lustre storage, using new hardware and incorporating old servers The new Lustre server and storage shelves were racked in the Tier-2 center last year – Lustre version was installed and new file system created (Using ZFS) – Old files from /lustre/umt3 were copied to the new system – Old Lustre servers were then recycled to increase total storage in new file system – New Lustre file system is (re)mounted at the old location, /lustre/umt3 Current AGLT2 lustre size is now 1.1PB Next up: go to the recently released 2.8.x version

AGLT2 Site Report Possible Relocation at UM AGLT2 I was notified in February that the University would be doing construction and renovation of the building which houses AGLT2 at the University of Michigan in 10 months. LS&A IT AGLT2 Someone at the University determined that the space currently housing LS&A IT and AGLT2 would be required to host new HVAC equipment. However they didn’t check all the paperwork: We have a signed agreement from the University as part of the Tier-2 grant process to occupy that specific room for 5 more years. We are discussing options but are in a strong position if there are disagreements about how to proceed. No single location meets our needs: equivalent power, space and cooling AND walk-able from physics (Undergrad access) One option: split to two locations…under discussion No costs to project. May involve some downtime(s)… 8

AGLT2 Site Report Future Plans Participating in SC16 (simple infrastructure for 100G+) Still exploring OpenStack as an option for our site. Testing Ceph for a back-end. – Our Tier-2 will be first OSiRIS Client (see yesterday’s talk) New network components support Software Defined Networking (OpenFlow). We are experimenting with SDN in our Tier-2 and as part of LHCONE point-to- point testbed. Working on IPv6 dual-stack for all nodes in our Tier-2 – Have IPv6 address block – Waiting for UM/MSU network engineers to complete routing (ESnet peering underway; delayed because of personnel changes in networking at UM) 9

ConclusionConclusion AGLT2 Site Report Summary Tier-2 services and tools are evolving. Site continues to expand and operations are smooth. Monitoring stabilized, update procedures working well FUTURE: OpenStack, IPv6, SDN Questions ? 10

AGLT2 Site Report AGLT2 100G Network Details 100G to WAN works Last Fall: “normal” FTS transfer hit 2.73 Gbytes/sec 11

Virtualization Status AGLT2 Site Report Madalert TODO: report screenshot TODO: OMD/check_mk integration Report for this grid  12

Virtualization Status AGLT2 Site Report Madalert Report integrated in OMD/check_mk Each site is a separate entry Site count for green/yellow/red/orange is tracked over time 13