AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.

Slides:



Advertisements
Similar presentations
What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Advertisements

1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Marilyn T. Smith, Head, MIT Information Services & Technology DataSpace IS&T Data CenterMIT Optical Network 1.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005 TRIUMF SITE REPORT Corrie Kost Update since Hepix Spring 2005.
Implementing Failover Clustering with Hyper-V
AGLT2 Site Report Shawn McKee University of Michigan HEPiX Spring 2014.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
“Better together” PowerVault virtualization solutions
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Upgrading the Platform - How to Get There!
Elad Hayun Agenda What's New in Hyper-V 2012 Storage Improvements Networking Improvements VM Mobility Improvements.
AGLT2 Site Report Benjeman Meekhof University of Michigan HEPiX Fall 2013 Benjeman Meekhof University of Michigan HEPiX Fall 2013.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Step Arena Storage Introduction. 2 HDD trend- SAS is the future Source: (IDC) Infostor June 2008.
Module 12: Designing High Availability in Windows Server ® 2008.
Planning and Designing Server Virtualisation.
Locality Aware dCache & Discussion on Sharing Storage USATLAS Facilities Meeting SMU October 12, 2011.
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Microsoft Virtual Academy Module 8 Managing the Infrastructure with VMM.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Spring 2012 Prague, Czech Republic April 23 th, 2012 April 23 th, 2012.
StorCenter ix4-200d Training By Erik Collett August 2009.
Bosch DSA Storage (based on NetApp E2700)
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Shawn McKee/University of Michigan
VMware vSphere Configuration and Management v6
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
1 | SharePoint Saturday Calgary – 31 MAY 2014 About Me.
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
WINDOWS SERVER 2003 Genetic Computer School Lesson 12 Fault Tolerance.
Simple Infrastructure to Exploit 100G Wide Are Networks for Data-Intensive Science Shawn McKee / University of Michigan Supercomputing 2015 Austin, Texas.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
AGLT2 Site Report Shawn McKee University of Michigan March / OSG-AHM.
Jérôme Jaussaud, Senior Product Manager
Virtual Machine Movement and Hyper-V Replica
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
AGLT2 Site Report Shawn McKee/University of Michigan Bob Ball, Chip Brock, Philippe Laurens, Ben Meekhof, Ryan Sylvester, Richard Drake HEPiX Spring 2016.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
By Harshal Ghule Guided by Mrs. Anita Mahajan G.H.Raisoni Institute Of Engineering And Technology.
Storage at the ATLAS Great Lakes Tier-2 Tier-2 Storage Administrator Talks Shawn McKee / University of Michigan OSG Storage ForumShawn McKee1.
ATLAS Tier-2 Storage Status AGLT2 OSG Storage Forum – U Chicago – Sep Shawn McKee / University of Michigan OSG Storage ForumShawn McKee1.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
VM Technologies (and others) for Site and Service Resiliency Shawn McKee/University of Michigan OSG AHM - Lincoln, Nebraska March 19 th, 2012.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
AGLT2 Info on VM Use Shawn McKee/University of Michigan USATLAS Meeting on Virtual Machines and Configuration Management June 15 th, 2011.
E2800 Marco Deveronico All Flash or Hybrid system
Dynamic Extension of the INFN Tier-1 on external resources
Scalable sync-and-share service with dCache
High Availability Planning for Tier-2 Services
Bob Ball/University of Michigan
AGLT2 Site Report Shawn McKee/University of Michigan
AGLT2 Site Report Shawn McKee/University of Michigan
Introduction to Networks
Ákos Frohner EGEE'08 September 2008
Hyper-Converged Technology a User's Perspective
Outline Chapter 2 (cont) OS Design OS structure
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Presentation transcript:

AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012

Overview of Presentation  Site Summary Update (See last report at onId=1&resId=0&materialId=slides&confId= ) onId=1&resId=0&materialId=slides&confId= onId=1&resId=0&materialId=slides&confId=  Site Resiliency Progress using VMware  New Storage Hardware  SuperComputing 2012 Plans  perfSONAR-PS Work  Conclusion 10/14/2012Shawn McKee - AGLT2

Site Summary  The ATLAS Great Lakes Tier-2 (AGLT2) is a large distributed LHC Tier-2 for ATLAS that spans two locations: UM/Ann Arbor and MSU/East Lansing with roughly 50% of the storage and compute at each site  4200 single-core job slots  10 eight-core (multi-core) job slots  350 Tier-3 job slots (useable by Tier-2)  Average 8.5 HS06/job-slot  2.2 Petabytes of storage  We run most of our Tier-2 services virtualized in VMware  We have excellent networking in place (10 GE ring to Chicaco for WAN; multi-10GE servers in LAN) with a total of ~2000 1G ports, ~300 10G ports and 4 40G ports.  Power, space and cooling OK but at about steady-state now 10/14/2012Shawn McKee - AGLT2

Summarizing Job State from PANDA 10/14/2012Shawn McKee - AGLT2 AGLT2 site admins wanted a quick view of nodes failing jobs This PHP program gathers panda data and caches it every hour so we can get an overview quickly Initially the data was gathered from parsing the panda web page, it worked but isn’t ideal. Thanks to Valeri Fine of BNL and his work on a new Panda monitor we now have a JSON interface to Panda to use for our data retrieval. This turns out to be faster than the usual Panda page and is a more logical/efficient way to be able to easily implement alternative monitor interfaces. Example URL:

Update on Use of Virtualization  We upgraded to vSphere 5.1 in September  Lots of new/useful features we may leverage  Increased our iSCSI storage in September  Added MD1200 shelf to MD3600i; 6x900G 10K, 6x600G 15K disks  We are still working on “site resiliency”. VMware base- systems and backend storage are installed at MSU  Testing/optimizing storage configuration  Exploring VMware capabilities like vSphere Replication  Working on overall site configuration and options (how best to both leverage relevant features &integrate with the instance at UM?)  Goal is to have MSU capable of bringing up VMs within 1 day after the loss of the UM site  Services need specific planning and testing  End of the year target for demonstrating this capability 10/14/2012Shawn McKee - AGLT2

New Storage Hardware  For our current refresh of our storage we needed to add at least 1 PB of useable space.  Dell had a “new” dense-storage option based upon 4U shelves with 60 disks. Two variants: MD3260 with RAID ctrl and without MD3060e (expansion shelf)  The 11 th Gen servers we normally use (R710) were no longer available and we need more I/O capability, so the R720 was a good match.  We were concerned with having such a large amount of storage behind a single head-node. Worry about IOPS and bandwidth / TB compared to our current R710+6xMD1200 configuration.  UM and MSU ended up with different variations 10/14/2012Shawn McKee - AGLT2

New Storage at UM 10/14/2012Shawn McKee - AGLT2 R720, dual E5-2620, 2.0 GHz, 256G 1333 MHz RAM PCIe Gen3 (6x8,1x16) bus 4 SAS HBAs MD3x60 shelves hold 60x3TB NLSAS 7.2K disks. RAID controllers support “Dynamic Disk Pools” (x10 faster rebuilds) We estimate 168TB/ shelf (84TB/pool) usable for a total of 672 TB LAG 2x40G

New Storage at MSU 10/14/2012Shawn McKee - AGLT2 2xR720, dual E5-2620, 2.0 GHz, 128G 1333 MHz RAM PCIe Gen3 (6x8,1x16) bus 4 SAS HBAs MD3x60 shelves hold 60x3TB NLSAS 7.2K disks. RAID controllers support “Dynamic Disk Pools” (x10 faster rebuilds) We estimate 168TB/ shelf (84TB/pool) usable for a total of 672 TB MSU doesn’t have 40G networking. Primary design change: 2 head- nodes and all MD3260s.

New Storage Summary  We are very interested to test “Dynamic Disk Pools”. They provide RAID-6 equivalent protection and x10 faster rebuilds  Total I/O capability testing will start once we receive the hardware (this week?). Need to verify Dell’s numbers: ~3.2 GB/sec writes, ~5.1 GB/sec reads.  Networking configuration is close to matching unit capabilities. The bandwidth/TB is close to what we have now  Timescale matches SuperComputing 2012, so our real stress testing will be as part of two demos for SC12 10/14/2012Shawn McKee - AGLT2

SuperComputing 2012  Our Tier-2 is planning to participate in two demos at SC12:  Working with Caltech and Victoria on using 100G for LHC data transfers  Working with BNL on demonstrating RoCE technologies for high-performance data movement at 40G (or maybe 2x40G)  We may additionally try to demonstrate some of the DYNES capabilities (See Ben Meekhof’s talk on DYNES at this conference tomorrow for some DYNES details) 10/14/2012Shawn McKee - AGLT2

AGLT2 Planned Networking for SC12 10/14/2012Shawn McKee - AGLT km- This setup should allow us to test the WAN capabilities of our new storage node as well as get some experience with 100G links

perfSONAR-PS at AGLT2  We have also been working closely with the perfSONAR-PS toolkit and associated modular dashboard at BNL  Work areas:  Deployment of the dashboard WLCG-wide  Hardening the toolkit to reduce maintenance/hands-on effort  Creating a new modular dashboard for visualizing metrics  Testing and deploying the “mesh configuration”  Current dashboard is visible at:  Some information about mesh config follows 10/14/2012Shawn McKee - AGLT2

perfSONAR-PS Mesh Configuration  Since USATLAS deployed perfSONAR-PS at its Tier-2 centers starting in 2007 we have found it a challenge to maintain an up-to-date, consistent testing configuration  As sites change DNS names or are added or deleted, all other sites participating in “mesh testing” need to update  Aaron Brown/Internet2 developed a mesh configuration option for the perfSONAR-PS toolkit  Sites install an agent via RPM and configure to refer to a mesh URL  The mesh URL contains the sites, tests and scheduled tests for the specific “mesh” (LHCOPN, LHCONE, USATLAS, etc)  If the configuration file (JSON format) stored at the URL changes, site agents rebuild the local site’s mesh tests  The dashboard can also refer to the mesh config to build matrices 10/14/2012Shawn McKee - AGLT2

Testing at AGLT2  The AGLT2_UM perfSONAR-PS instances started testing the mesh “beta” configuration last week.  Both the latency (psum01.aglt2.org) and bandwidth (psum02.aglt2.org) instances subscribe to the mesh configuration at ONAR_PS_Mesh/usatlas.json ONAR_PS_Mesh/usatlas.json  Details at R_PS_Mesh R_PS_Mesh R_PS_Mesh  Summary: after quick fixes by Aaron we are in good shape. Plan to convert USATLAS to using the mesh very soon  Further tips and tuning about perfSONAR-PS at ONAR ONAR ONAR 10/14/2012Shawn McKee - AGLT2

Summary  Our Tier-2 has been running well and making progress on our “Site Resiliency” plans  Didn’t mention our work in FAX (Federated ATLAS Xrootd) but we are a primary testing site for dCache-FAX integration  We are exploring new dense storage options from Dell that should provide is with a significant amount of storage with very good performance. We need to determine how best to configure/optimize these new systems (Dynamic Disk Pools)  We plan to participate in SuperComputing 2012 and take part in some high-performance data movement on the WAN demonstrations.  perfSONAR use for LHC is ramping up. Some new capabilities should make it easier to deploy and manage. 10/14/2012Shawn McKee - AGLT2

Questions / Comments? 10/14/2012Shawn McKee - AGLT2

ADDITIONAL SLIDES 10/14/2012Shawn McKee - AGLT2

Virtualization Considerations 10/14/2012Shawn McKee - AGLT2 TBefore deploying any VM technology you should plan out the underlying hardware layer to ensure a robust basis for whatever gets installed TMultiple VM “servers” (to run VM images) are important for redundancy and load balancing TMultiple, shared back-end storage is important to provide VM storage options to support advanced features TiSCSI or clustered filesystems recommended TScale hardware to planned deployment TSufficient CPU/memory to allow failover (N-1 svrs) TSufficient storage for range of services (IOPS+space) TSufficient physical network interfaces for # of VMs TDesign for no-single point-of-failure to the extent possible

iSCSI for “backend” Live Storage Migration 10/14/2012Shawn McKee - AGLT2 TThis set of equipment + VMware allows us to distribute the I/O load as needed to provide the level of service the VMs need level of service the VMs need Interim purchase as MSU will only allow MSU nodes to replicate VM images from UM site. This is OK for our primary goal as long as we maintain a timely replica TWe can also move VMs “live” between storage for maintenance or repair reasons

Storage Connectivity  Increase robustness for storage by providing resiliency at various levels:  Network: Bonding (e.g ad)  Raid/SCSI redundant cabling, multipathing  iSCSI (with redundant connections)  Disk choices: SATA, SAS, SSD ?  Single-Host resiliency: redundant power, mirrored memory, RAID OS disks, multipath controllers  Clustered/failover storage servers  Multiple copies, multiple write locations 10/14/2012Shawn McKee - AGLT2

Redundant Cabling Using Dell MD1200s  Firmware for Dell RAID controllers allows redundant cabling of MD1200s  MD1200 can have two EMMs, each capable of accessing all disks  An H800 has two SAS channels  Can now cable each channel to an EMM on a shelf. Connection shows one logical link (similar to “bond” in networking): Performance and Reliability 10/14/2012Shawn McKee - AGLT2

Storage Example: Inexpensive, Robust and Powerful 10/14/2012Shawn McKee - AGLT2 Dell has special LHC pricing available. US ATLAS Tier-2s have found the following configuration both powerful and inexpensive Individual storage nodes have exceeded 750MB/sec on the WAN