GridPP Tier1 Review Fabric

Slides:



Advertisements
Similar presentations
RAL Tier1 Operations Andrew Sansum 18 th April 2012.
Advertisements

Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
Computing Infrastructure
Proposed Storage Area Network Facilities For Discussion.
Virtualisation From the Bottom Up From storage to application.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Cost to serve reduction Removal of middleware Hardware platform agnostic Use of commodity hardware Just in time hardware provisioning.
Terri Lahey LCLS Facility Advisory Committee 20 April 2006 LCLS Network Security Terri Lahey.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
Extreme Networks Confidential and Proprietary. © 2010 Extreme Networks Inc. All rights reserved.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Planning and Designing Server Virtualisation.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Hosting on a managed server hosted by TAG  No technical support required  Full backup of database and files  RAID 5 system means that if a hard drive.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
1 NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL 1 NG/VITA Strategy & Architecture NG/VITA Strategy & Architecture Tony Shoot December 19, 2006.
11 Copyright © 2009 Juniper Networks, Inc. ANDY INGRAM VP FST PRODUCT MARKETING & BUSINESS DEVELOPMENT.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Panoptic Capacity Planning Presented by. "Scotty, I need warp speed in 3 minutes or we're all dead!” (William Shatner - Star Trek II ‘The Wrath of Khan’)
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
Fundamental Network Improvements Summer 2012 Activity May 14, 2012.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
Copyright © 2015 Juniper Networks, Inc. 1 QFX5100 Line of Switches The World’s Most Nimble 10/40GbE Data Center Access Switches Speaker Name Title.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
Tier1 Databases GridPP Review 20 th June 2012 Richard Sinclair Database Services Team Leader.
Exploration 3 Chapter 1. Access layer The access layer interfaces with end devices, such as PCs, printers, and IP phones, to provide access to the rest.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
Instructor Materials Chapter 1: LAN Design
Luca dell’Agnello INFN-CNAF
Joint AGLT2-MWT2 Networking meeting
Data Center Network Architectures
High Availability Linux (HA Linux)
Belle II Physics Analysis Center at TIFR
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Andrea Chierici On behalf of INFN-T1 staff
NG/VITA Strategy & Architecture Tony Shoot December 19, 2006
Service Challenge 3 CERN
HPEiX Spring RAL Site Report
NGS Oracle Service.
HEPiX IPv6 Working Group F2F Meeting
GGF15 – Grids and Network Virtualization
Managing Clouds with VMM
Internet and Web Simple client-server model
Cost Effective Network Storage Solutions
In-network computation
RHUL Site Report Govind Songara, Antonio Perez,
Presentation transcript:

GridPP Tier1 Review Fabric 20 June 2012 Martin Bly – Fabric Team Leader

Fabric - Tier1 Review, June 2012 Rôle Fabric Team ‘runs the hardware’ Look after and develop fabric of the Tier: Networks and SANS Servers ( storage, CPU, Virtualisation, core ) Configuration Management Tape Robotics Maintenance & Spares Fabric - Tier1 Review, June 2012 20/06/2012

Fabric - Tier1 Review, June 2012 Highlights CVMFS Solved the problem of serving software directories MagDB Integration of hardware information sources Solving disk issues Finding fixes for hardware problems enabling continued use of older hardware Continued rollout of Quattor Fabric - Tier1 Review, June 2012 20/06/2012

Configuration Management Configuration Management system is used to control all aspects of the state of a machine from installation to production and eventually decommissioning Using Quattor with Puppet for two years Hybrid system Quattor for bare metal provisioning and in most cases full configuration control Puppet for some service configuration files, notably for Castor Moving towards integrating Puppet functions into Quattor Some services still raw kickstart: Oracle database servers running RedHat Migration to Quattor planned this year Overall works well Fabric - Tier1 Review, June 2012 20/06/2012

Fabric - Tier1 Review, June 2012 Current Tier1 Network Star configuration Single Force10 C300 ‘core’ switch with 32 x 10GbE ports Several Force10/Arista/Fujitsu switches providing access at 10Gb/s to storage servers and switch stacks. Nortel/Avaya switches in stacks providing 1Gb/s access. Mostly dual 10Gb/s uplinks, some direct, some via Force10/Arista/Fujitsu switches Uplinks to ULKR (20Gb/s) and Router A (10Gb/s) Issues: Core is single point of failure though a resilient configuration Spare switch components available No path to higher bandwidths Limited expansion capability Fabric - Tier1 Review, June 2012 20/06/2012

Fabric - Tier1 Review, June 2012 Tier1 Network - Current Fabric - Tier1 Review, June 2012 20/06/2012

Network plan - Requirements Resilient configuration Against unit and link/path failures Future-proof Higher bandwidth on switch links and in ‘core’ Room for bandwidth expansion Higher bandwidth to RAL Site network Affordable Fabric - Tier1 Review, June 2012 20/06/2012

Fabric - Tier1 Review, June 2012 Network Plan - Designs Core with ‘Star’ topology Same design as now, simple Duplicate ‘core’ switch to add resilience Expensive to upgrade to higher bandwidth links Higher load on individual elements Element failures impair bandwidth Mesh topology More complex More but cheaper elements Upgrade paths cheaper Lower load on elements Failures have less impact on overall bandwidth Fabric - Tier1 Review, June 2012 20/06/2012

Fabric - Tier1 Review, June 2012 Network Plan - Mesh New design will use a mesh Routing layer between Tier1 and Site Aggregation layer: Force10 Z9000 - 32x40GbE switches 10Gb/s access layer: Force10 S4810 - 48x10GbE + 4x40GbE 1Gb/s access layer: Force10 S60, Avaya 56xx, Arista 7124 Aggregation and 10Gb access layer linked a 4x40Gb/s Access layers linked at 2 or 4x10Gb/s Fabric - Tier1 Review, June 2012 20/06/2012

Future Network Topology Fabric - Tier1 Review, June 2012 20/06/2012