US CMS Tier1 Facility Network Andrey Bobyshev (FNAL) Phil DeMar (FNAL) CHEP 2010 Academia Sinica Taipei, Taiwan.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

HetnetIP Ethernet BackHaul Configuration Automation Demo.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
FNAL Site Perspective on LHCOPN & LHCONE Future Directions Phil DeMar (FNAL) February 10, 2014.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
An Analysis of Bulk Data Movement Patterns in Large-scale Scientific Collaborations W. Wu, P. DeMar, A. Bobyshev Fermilab CHEP 2010, TAIPEI TAIWAN
Network Upgrades { } Networks Projects Briefing 12/17/03 Phil DeMar; Donna Lamore Data Comm. Group Leaders.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
Yes, yes it does! 1.Guest Clustering is supported with SQL Server when running a guest operating system of Windows Server 2008 SP2 or newer.
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
Agenda Network Infrastructures LCG Architecture Management
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
CD FY09 Tactical Plan Review FY09 Tactical Plan for Wide-Area Networking Phil DeMar 9/25/2008.
1 ESnet Network Measurements ESCC Feb Joe Metzger
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Fermi National Accelerator Laboratory, U.S.A. Brookhaven National Laboratory, U.S.A, Karlsruhe Institute of Technology, Germany CHEP 2012, New York, U.S.A.
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
1 Prepared by: Les Cottrell SLAC, for SLAC Network & Telecommunications groups Presented to Kimberley Clarke March 8 th 2011 SLAC’s Networks.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.
Lambda Station Project Andrey Bobyshev; Phil DeMar; Matt Crawford ESCC/Internet2 Winter 2008 Joint Techs January 22; Honolulu, HI
100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.
100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.
The University of Bolton School of Games Computing & Creative Technologies LCT2516 Network Architecture CCNA Exploration LAN Switching and Wireless Chapter.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
US LHC Tier-1 WAN Data Movement Security Architectures Phil DeMar (FNAL); Scott Bradley (BNL)
Clusterix:National IPv6 Computing Facility in Poland Artur Binczewski Radosław Krzywania Maciej Stroiński
TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.
CD FY10 Budget and Tactical Plan Review FY10 Tactical Plans for Wide-Area Networking Phil DeMar 10/1/2009 Tactical plan name(s)DocDB# FY10 Tactical Plan.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Clustering Servers Chapter Seven. Exam Objectives in this Chapter:  Plan services for high availability Plan a high availability solution that uses clustering.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
High performance Brocade routers
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CRICOS No J a university for the world real R Nov 2009 Andy Joyce Infrastructure Services Information Technology Services The Provision, Support.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Brookhaven Science Associates U.S. Department of Energy 1 Network Services LHC OPN Networking at BNL Summer 2006 Internet 2 Joint Techs John Bigrow July.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Use of Alternate Path Circuits at Fermilab {A Site Perspective of E2E Circuits} Phil DeMar I2/JointTechs Meeting Monday, Feb. 12, 2007.
Run II Networking Assessment Andrey Bobyshev, CD/LCS/NVS/Network Service June 24, 2011.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
100G R&D for Big Data at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab ISGC – March 22, 2013 Overview Fermilab.
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
Brookhaven Science Associates U.S. Department of Energy 1 n BNL –8 OSCARS provisioned circuits for ATLAS. Includes CERN primary and secondary to LHCNET,
Australian Institute of Marine Science Jukka Pirhonen.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013 Overview.
Fermilab Cal Tech Lambda Station High-Performance Network Research PI Meeting BNL Phil DeMar September 29, 2005.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Network Monitoring Sebastian Büttrich, NSRC / IT University of Copenhagen Last edit: February 2012, ICTP Trieste
Luca dell’Agnello INFN-CNAF
DE-KIT(GridKa) -- LHCONE
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
GridPP Tier1 Review Fabric
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
ESnet Network Measurements ESCC Feb Joe Metzger
Presentation transcript:

US CMS Tier1 Facility Network Andrey Bobyshev (FNAL) Phil DeMar (FNAL) CHEP 2010 Academia Sinica Taipei, Taiwan

Outline of the talk :  USCMS Tier1 Facility Resources  Data Model/requirements  Current status  Circuits/CHIMAN/USLHC Net  Network tools graphs and snapshots

Tape (TB)Disk (TB) CPU (kHS06) SwitzerlandCH-CERN FranceFR- CCIN2P GermanyDE-KIT ItalyIT-INFN- CNAF SpainES-PIC TaiwanTW-ASGC UKUK-T1-RAL USAUS-FNAL- CMS Summary of CMS resources

CMS resources (2011 pledges)

Model of USCMS-T1 Network traffic cmsstor/dCache nodes Federated File System T0 2.2Gbps 3.2Gbps Data processing /~1600 Worker nodes 30-80Gbps EnStore Tape Robots BlueArc NAS CMS-LPC/SLB Clusters Tier2s/Tiers1 Interactive users 10-20Gbps 1Gbps 3-10Gbps QoS

Upgrade of Core Fabric Recently Completed Cisco 6509-based core Cisco Nexus 7000-based core

USCMS-T1 Network in $AB,

site-core1 site-core2 r-s-bdr End-To-End Circuits USCMS Tier1 Network in GbE N x 100GbE ESnet / 100GE 10GbE vPC 100GbE Site Network ?

Network use (2%) Interactive (2%) Real-Time (Database,monitoring) (2%) Critical (34%) NAS (10%) Best effort (50%) QoS Classes of Traffic in USCMS-T1 8 Gbps 40Gbps 27.2Gbps 1.6Gbps

Redundancy Within the Tier-1 Today:  FCC2 Nexus is core switch fabric GCC-CRB Nexus = redundant core  Switches connected at 4 (or 8) x 10GE Virtual Port channel (vPC) Gateway Load-balancing protocol (gLPB) for failover to redundant core Near Future:  Interconnected Gb/s Function as distributed fabric  Switches still configured w/ vPC & gLPB Most connections to nearest core switch

Off-Site Traffic: End-to-End Circuits USCMS-T1 has a long history of using ESNet/SDN and Internet2 DCN circuits SLA monitor, IOS Track objects to automatically fail over traffic if circuit is down

PerfSonar Monitoring  Monitoring status of circuits  Alert on a change of link status  Utilization  PingEr RTT measurements  PerfSonar-BUOY – Active measurements, BWCTL & OWAMP o Two NPI Took kit boxes  Two LHCOPN/MDM monitoring boxes

Circuits SLA monitoring (Nagios) Each circuit has an SLA monitor running icmp-echo application. Status of SLA monitor is tracked by SNMP

Weather Map Tools (I) SNMP-Based

Weather Map Tools (II) Flow Data-Based Drill-down Capabilities

Work in progress:  New monitoring approaches  Any-to-Any/cloud-to-cloud performance  Any production host can become an element of monitoring infrastructure  Combination of passive and active measurements  10GBase-T/IEEE 802.3an for end system  Intel E10G41AT2 NIC PCI-Express  AristaNetworks DCS-7120T-4S (as ToR solution)  Directly to Nexus7K (via fiber at the moment)  Cat6E regular physical infrastructure deployed ~100m

Summary  Two buildings, four computing rooms (fifth one is coming)  Two Nexus 7000 switches for 10G aggregation, interconnected Gbps  2 x10Gbps to the Site Network (read/write data to tapes )‏  10Gbps to the Border Router (non-US Tier2s, other LHC- related traffic)‏  20Gbps toward ESNET CHIMAN and USLHCNET, SDN/DCN/E2E circuits

Summary (continued)  ~200 dCache nodes with 2x1GE  ~1600 worker nodes with 1GE  ~150 various servers  2X 20G for BlueArc NAS storage  C6509 access switches connected by 40-80GE  Redundancy/loadsharing at L2 (vPC) and L3 (GLBP)‏  IOS based Server Load Balancing for interactive clusters  19 SDN/DCN End-To-End Circuits  Virtual port channelling (vPC)  QoS, 5 major classes of traffic

Additional slides

USCMS Tier1: Data Analysis Traffic R-CMS-FCC2 R-CMS-FCC2-2 R-CMS-FCC of 40G 18 of 20G 29.8 of 30G 65 of 80G 20 of 20G 25 of 30G 20 0f 30G 18 of 30G 30G S-CMS-GCC-1 S-CMS-GCC-3 S-CMS-GCC-5 S-CMS-GCC-4 S-CMS-GCC-6 R-CMS-N7K-GCC-B