Bob Ball/University of Michigan

Slides:



Advertisements
Similar presentations
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Fall 2014 / UNL.
Advertisements

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Spring 2014.
AGLT2 Site Report Benjeman Meekhof University of Michigan HEPiX Fall 2013 Benjeman Meekhof University of Michigan HEPiX Fall 2013.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Planning and Designing Server Virtualisation.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
Shawn McKee/University of Michigan
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
STATUS OF KISTI TIER1 Sang-Un Ahn On behalf of the GSDC Tier1 Team WLCG Management Board 18 November 2014.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
AGLT2 Site Report Shawn McKee University of Michigan March / OSG-AHM.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.
HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
WLCG IPv6 deployment strategy
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
Pete Gronbech GridPP Project Manager April 2016
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Joint AGLT2-MWT2 Networking meeting
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
AGLT2 Site Report Shawn McKee/University of Michigan
Update on Plan for KISTI-GSDC
5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007
Proposal for obtaining installed capacity
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Update from the HEPiX IPv6 WG
ATLAS Sites Jamboree, CERN January, 2017
AGLT2 Site Report Shawn McKee/University of Michigan
John Gordon, STFC GDB October 12th 2011
NET2.
Brookhaven National Laboratory Storage service Group Hironori Ito
PK-CIIT Grid Operations in Pakistan
ADC Requirements and Recommendations for Sites
WLCG dedicated accounting portal (proposal)
GridPP Tier1 Review Fabric
NSF cloud Chameleon: Phase 2 Networking
ETHZ, Zürich September 1st , 2016
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
IPv6 update Duncan Rand Imperial College London
RHUL Site Report Govind Songara, Antonio Perez,
Presentation transcript:

AGLT2 Site Report AGLT2 is a Co-Production of University of Michigan and Michigan State University Bob Ball/University of Michigan Shawn McKee, Chip Brock, Philippe Laurens, Mike Nila OSG AHM March, 2017 / San Diego

Outline Outline Site Summary 2016 Procurement Details 2017 Retirements HTCondor-CE Status Site Mover Controls and Pilots Networking IPv6 and SL7 AGLT2-OSG-AHM

Site Summary The ATLAS Great Lake Tier-2 (AGLT2) is a distributed LHC Tier-2 for ATLAS spanning between UM/Ann Arbor and MSU/East Lansing. Roughly 50% of storage and compute at each site 9408 logical cores MCORE slots 950 (dynamic) + 10 (static) 720 Tier-3 job slots usable by Tier-2 Average 10.21 HS06/slot 6.85 Petabytes of storage Total of 96.1 kHS06 Most Tier-2 services virtualized in VMware 2x40 Gb inter-site connectivity, UM has 100G to WAN, MSU has 10G to WAN, lots of 10Gb internal ports and 16 x 40Gb ports and a few 100Gb High capacity storage systems have at least 2 x 10Gb bonded links Newest have 2 x 25Gb, or even 2 x 50Gb 40Gb link between Tier-2 and Tier-3 physical locations AGLT2-OSG-AHM

Procurement Details All funds for 2016 and the Columbia CA purchases have been expended Of order $100 will be returned to Columbia Paperwork is on track Three N2048 switches yet to be brought online (March projected) All other CPU and storage is online and operational CA purchase summary CPU: 7841 HS06 (720 logical cores) 4 x Dell N2048 Gb switches Dell S4048-ON 10Gb switch Compute and storage capacity increase December 2015 to January 2017 2015 Actual 2016 Pledge 2016 Actual 2017 Pledge 2018 Pledge CPU (HS06) 68506 24000 96080 57500 59289 Storage (TB) 3712 3000 6850 4242 4651 Actual as of year end except for Jan 31, 2017 AGLT2-OSG-AHM

2017 Retirement Details Potential storage retirements in 2017 Up to 1150TB of older storage in 60 shelves of 13 servers Compression to half a rack from ~4 racks if replaced by more MD3460 with currently procured disks Potential CPU retirements in 2017 Site total 7992 HS06 32U at MSU 30U at UM 824 Logical Cores Total Storage following retirements: 5.7PB (2017 pledge is 4.242PB) Total CPU following retirements: 88.1 kHS06 (2017 pledge is 57.5 kHS06) Total Logical Cores following retirements: 8576 Future FY2017 purchase schedule not yet determined LOCALGROUPDISK capacity 445TB Usage 346TB AGLT2-OSG-AHM

2016 Performance Overall site availability and reliability metric was 96%-100% all year Stage-in rate limitations likely affected cpu/wall time ratio throughout the year Much improved for 2017 CPU and Wall time below are in hours Single Core includes LargeMEMory jobs Task Num Jobs Cpu-time Wall-time Cpu/wall Analysis 1435679 1644606 2345423 0.702 All Jobs 6945552 38446849 7812634 4.921 Score Prod 4488418 12163929 1433707 0.848 Mcore Prod 1021455 24638314 4033504 6.108 AGLT2-OSG-AHM

HTCondor-CE Status Running at AGLT2 since Fall of 2015 GRAM is running, but will soon be totally disabled Just upgraded to OSG 3.3.21 on all gatekeepers, with HTCondor 8.4.11 WN are still on 8.4.6 Only skeleton of [Resource Entry CHANGEME] section in 30-gip.Ini is ready Will configure and test this OSG Collector reporting over the next few weeks Question: will BNL provide a “template” JobRouter conf file for US Sites? AGLT2-OSG-AHM

SiteMover Controls and Pilots New pilots have been running at AGLT2 for a few weeks without incident Using MWT2 LSM as our site mover Some site-specific mods in place; reporting, locale, … Logging to syslog and UC Kibana AGLT2-OSG-AHM

Networking No external network changes planned Internal network is evolving Using 100Gb Mellanox SN2700 as local collector switch 2x40Gb uplink to Juniper EX8200 thence to WAN Replacing Dell PC6248 with N2048 on public NICs for increased useful bandwidth Local 10Gb collector Dell S4048-ON 48 10Gb and 6 40Gb ports AGLT2-OSG-AHM

Networking Plot shows UC Kibana-collected stage-in rates Clear peak identified as PC6248 Replacement N2048 has > 3x better sustained rate AGLT2-OSG-AHM

IPv6 and SL7 Both UM and MSU sites now routing IPv6 perfSONAR instances all now dual stacked Storage will be next in this regard No plans for SL7 Some servers transitioned Waiting for directions on SL7 xrootd redirector Singularity now installed for OSG experimentation Experience likely useful for SL7 transition when we are ready for that AGLT2-OSG-AHM

Software-Defined Storage Research NSF proposal OSiRIS funded involving UM, MSU, WSU (and our Tier-2) Exploring Ceph+SDN for future software-defined storage Goal is centralized storage that supports in place access from CPUs across multiple institutions See NSF announcement http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541335 AGLT2-OSG-AHM

Summary No big issues holding us back All funds spent out and closeout paperwork on track WAN operating at 2x40Gb with transparent 10Gb fallback IPv6 namespace in place No wide rollout of SL7 any time soon Singularity on our cluster now that could help AGLT2-OSG-AHM