Presentation is loading. Please wait.

Presentation is loading. Please wait.

AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.

Similar presentations


Presentation on theme: "AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016."— Presentation transcript:

1 AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016 / Zeuthen

2 AGLT2 Site Report Site Summary The ATLAS Great Lake Tier-2 (AGLT2) is a distributed LHC Tier-2 for ATLAS spanning between UM/Ann Arbor and MSU/East Lansing. Roughly 50% of storage and compute at each site 7116 single core job slots (Added 1104 slots (23xR630s); E5- 2680v3, 192 GB ram, 4x500G NL-SAS each) MCORE slots 550 (dynamic) + 10 (static) 720 Tier-3 job slots usable by Tier-2 Average 9.65 HS06/slot 3.7 Petabytes of storage Total of 68.6 kHS06 Most Tier-2 services virtualized in VMware 2x40 Gb inter-site connectivity, UM has 100G to WAN, MSU has 10G to WAN, lots of 10Gb internal ports and 16 x 40Gb ports High capacity storage systems have 2 x 10Gb bonded links 40Gb link between Tier-2 and Tier-3 physical locations 2

3 AGLT2 Site Report AGLT2 Monitoring AGLT2 has a number of monitoring components in use As shown in before we have: Customized “summary” page-> OMD (Open Monitoring Distribution) at both UM/MSU Ganglia Ganglia ElasticsearchLogstashKibana Central syslog’ing via ELK: Elasticsearch, Logstash, Kibana SRMwatch SRMwatch to track dCache SRM status GLPI GLPI to track tickets (with FusionInventory) 3

4 AGLT2 Site Report Personnel Changes AGLT2 After many years with AGLT2 Ben Meekhof is moving on. Fortunately for us it is only across campus. OSiRIS Ben is the new OSiRIS Project Engineer (See OSiRS talk yesterday). AGLT2 Many, many thanks to Ben for all his hard work and innovations for AGLT2. He will be missed. We interviewed 9 candidates as a replacement (out of about 18 applications) and selected Ryan Sylvester as the best candidate. Ryan started near the end of March and we hope he will be able to attend a future HEPiX meeting in person. 4

5 AGLT2 Site Report5 Hardware Changes Since Last Mtg We added 13 Dell R630s in November at UM and 10 at MSU in February (2x E5-2680 v3 @ 2.50GHz, 24/48 Cores, 192GB, 2x10G,4x500G SAS 7.2K) Found memory was unbalanced via HS06 runs Distributed memory to create 128GB and 256GB hosts Working well now over 20Gbps bonded links Retired ~10 PE1950s to make space for these Purchased a Mellanox SN2700 switch/router (32x100G ports) in November 2015 as part of a bundle including 12 ConnectX-4 NICs (4xDual-25G, 4xDual-50G and 4xDual-100G) and associated 1m cables Have yet to integrate it because of issues getting working 40G active optical cables (Mellanox Juniper) Will be core aggregation; many of our dCache storage nodes to be connected at 25G+ (QSFP28 cables still hard to find)

6 AGLT2 Site Report6 Software Updates Since Last Mtg Last Tier-2 SL5 VMs updated to SL7 (rebuilt) dCache updated in Nov15 to 2.10.42 and then in Feb16 to 2.13.23 – Associated update of Postgresql 9.3->9.5 Condor updated in Dec to 8.2.10 and then, in March to 8.4.3, then to 8.4.4 (bugfix) OSG CE install updated in Oct 3.2.31 and then in March to 3.3.9 Various switch firmware updates applied in February; bios/firmware on Dell systems in March

7 AGLT2 Site Report7 Lustre at AGLT2 We have updated our Lustre storage, using new hardware and incorporating old servers The new Lustre server and storage shelves were racked in the Tier-2 center last year – Lustre version 2.7.0 was installed and new file system created (Using ZFS) – Old files from /lustre/umt3 were copied to the new system – Old Lustre servers were then recycled to increase total storage in new file system – New Lustre file system is (re)mounted at the old location, /lustre/umt3 Current AGLT2 lustre size is now 1.1PB Next up: go to the recently released 2.8.x version

8 AGLT2 Site Report Possible Relocation at UM AGLT2 I was notified in February that the University would be doing construction and renovation of the building which houses AGLT2 at the University of Michigan in 10 months. LS&A IT AGLT2 Someone at the University determined that the space currently housing LS&A IT and AGLT2 would be required to host new HVAC equipment. However they didn’t check all the paperwork: We have a signed agreement from the University as part of the Tier-2 grant process to occupy that specific room for 5 more years. We are discussing options but are in a strong position if there are disagreements about how to proceed. No single location meets our needs: equivalent power, space and cooling AND walk-able from physics (Undergrad access) One option: split to two locations…under discussion No costs to project. May involve some downtime(s)… 8

9 AGLT2 Site Report Future Plans Participating in SC16 (simple infrastructure for 100G+) Still exploring OpenStack as an option for our site. Testing Ceph for a back-end. – Our Tier-2 will be first OSiRIS Client (see yesterday’s talk) New network components support Software Defined Networking (OpenFlow). We are experimenting with SDN in our Tier-2 and as part of LHCONE point-to- point testbed. Working on IPv6 dual-stack for all nodes in our Tier-2 – Have IPv6 address block – Waiting for UM/MSU network engineers to complete routing (ESnet peering underway; delayed because of personnel changes in networking at UM) 9

10 ConclusionConclusion AGLT2 Site Report Summary Tier-2 services and tools are evolving. Site continues to expand and operations are smooth. Monitoring stabilized, update procedures working well FUTURE: OpenStack, IPv6, SDN Questions ? 10

11 AGLT2 Site Report AGLT2 100G Network Details 100G to WAN works Last Fall: “normal” FTS transfer hit 2.73 Gbytes/sec 11

12 Virtualization Status AGLT2 Site Report Madalert TODO: report screenshot TODO: OMD/check_mk integration Report for this grid  12

13 Virtualization Status AGLT2 Site Report Madalert Report integrated in OMD/check_mk Each site is a separate entry Site count for green/yellow/red/orange is tracked over time 13


Download ppt "AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016."

Similar presentations


Ads by Google