Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.

Slides:



Advertisements
Similar presentations
Storage Procurements Some random thoughts on getting the storage you need Martin Bly Tier1 Fabric Manager.
Advertisements

Confidential Prepared by: System Sales PM Version: 1.0 Lean Design with Luxury Performance.
4/11/2017 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
SGI ® Company Proprietary SGI ® Modular InfiniteStorage Sales Deck – V4 – Feb 2013 An evolutionary new compute & storage platform exclusive to SGI.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Tier1 Site Report RAL June 2008 Martin Bly.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
“GRID” Bricks. Components (NARA I brick) AIC RMC4E2-QI-XPSS 4U w/SATA Raid Controllers: 3ware- mirrored root disks (2) Areca- data disks, battery backed.
DSN-6000 series Introduction. ShareCenter Pulse DNS-320 SMB/ Entry SME SOHO/ Consume r SOHO DNS-315 Capacity and Performance D-Link Storage Categories.
EMC DATA DOMAIN DD640, DD620, AND DD160
Emerging Storage Options for Server Blade Architectures Server Blade Summit 2005.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Computer Co-ordinators Day – Sydney Region ASI Solutions T4L Offerings.
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
Tier1 Report Cambridge 23rd October 2006 Martin Bly.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
VO Sandpit, November 2009 e-Infrastructure for Climate and Atmospheric Science Research Dr Matt Pritchard Centre for Environmental Data Archival (CEDA)
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
D0 Disk Array Replacement on d0ora2 May 20, 2005.
Solution to help customers and partners accelerate their data.
Platform Disaggregation Lightening talk Openlab Major review 16 th Octobre 2014.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
Tested, seen, heard… Andrei Maslennikov Rome, April 2006.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Leading in the compute.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
HP Proliant Server  Intel Xeon E3-1220v3 (3.1GHz / 4-core / 8MB / 80W).  HP 4GB Dual Rank x8 PC E (DDR3-1600) Unbuffered Memory Kit.  HP Ethernet.
9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
1062m0656 between 10692m2192 DS/ICI/CIF EqualLogic PS6510E
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
RAL Site Report Spring CERN 5-9 May 2008 Martin Bly.
Experience of Lustre at QMUL
Paul Kuipers Nikhef Site Report Paul Kuipers
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
LCG Deployment in Japan
Experience of Lustre at a Tier-2 site
Oxford Site Report HEPSYSMAN
GridPP Tier1 Review Fabric
Power couple. Dell EMC servers powered by Intel® Xeon® processors and running Windows Server* 2016, ready to securely handle dynamic business workloads.
Presentation transcript:

Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013

Capacity Storage Storage in a box – Variations on 4U chassis with 24 or 36 bays – Single controller: Areca 1280, 3ware 9650SE, Adaptec 5405,52445, LSI i – 24 or 34 drives: 2 system RAID1, 22 or 32 data RAID6, no hot spare (*) Data drive sizes: 1TB, 2TB, 3TB – WD or Hitachi Mostly 7.2k SATA, one batch SAS – Useable sizes: 20, 38, 40, 90TB – 2 CPUs or more recently only 1 CPU – RAM: 8, 12, 24, 64GB (swap typically 8GB) – NIC: 1GbE to 2010, 10GbE from 2011

Capacity Summary 2012: 4.0PB in 46 servers, Viglen and OCF 4U 36-bay chassis, LSI MegaRAID SAS i, i, 3TB HDDs 2011: 2.6PB in 68 servers, Viglen and Clustervision 4U 24- bay chassis, Adaptec 5404 and LSI MegaRAID SAS i, 2TB HDDs 2010: 1.3PB in 36 servers, Streamline and Viglen 4U 24-bay chassis, Adaptec 5404 and 3ware 9650, 2TB HDDs 2009: 3.64PB in 98 servers, Streamline and Viglen 4U 24- bay chassis, Adaptec 5405 and Adaptec 52445, 2TB HDDs 2008: 2.2PB in 110 servers, Streamline + Viglen 4U 24-bay chassis, Areca 1280 and dual 3ware 9650, 1TB HDDs 2007: 1.64PB in 182 servers, Viglen 3U 16-bay chassis, dual 3ware 9650, 750GB HDDs

Other Storage Dell R510, PERC H7xx, X520 10GbE NIC – 12 x 1TB SATA – 12 x 600GB 10k SAS iSCSI: – 3 x Equallogic PE6510 arrays – 2 x Infortrend FC – 3 x MD3620f – 4 x EMC Clariion – Several Infortrend + other OEM

Capacity Compute Multi-system chassis: – SM Twin and Twin² – Dell C6100 – HP Proliant z6000 CPUs: E56xx, E5-26xx RAM: 3 or 4GB/core (*2012 – per thread) NIC: 1GbE (2GbE pair in 2012) Disk: at least two – ~100GB per core or thread – SW RAID0

Non-capacity Virtualise it! R410, R510, R710, R620 + odds and sods R710: 96GB RAM, 2 x 6-core CPUs R620: 96GB RAM, 2 x 8-core CPUs R[5-7][12]0 with 2 or 4 x 10GbE NICs Shared storage on Equallogic arrays

Switches Extreme routers: – 2 x x670V: 48 x 10Gb/s SFP+ + 4 x 40Gb/s QSFP Force10: – 2 x Z9000: 32 x 40Gb/s QSFP – 13 x S4810P: 48 x 10Gb/s SFP+ + 4 x 40Gb/s QSFP – 9 x S60: 44 x 1Gb/s + 2 x 10Gb/s SFP+ Arista: 2 x x 10Gb/s SFP+ Fujitsu: 1 x 2600G: 26 x 10Gb/s Nortel/Avaya: – x 56xx, xx=50, 98 – x 55xx, xx=10, 30 Netgear – 50+ x FS750T2, 2 x FS726T – 4 x GS724T, 1 x GS748T 3Com: 25+ old 10/100Mb/s

Thoughts… No plan to change type of capacity storage this year – 4TB drives will make RAID6 lump size ~120TB useable in a 36-bay chassis (single controller) – 1 CPU seems to be enough, more RAM good – 10GbE NIC OK. May look to double up links for resilience rather than capacity –

Thoughts (2) CPU requirements continue to be defined by HS06 – unlikely to change any time soon Quad-system chassis fine with 2-way systems – Have not tested 4-way systems Maybe go for 10GbE on-board Disk I/O bandwidth a known issue as job counts per server rise – More spindles or faster disks Power supply spreading – what happens if a rail drops? – 3 or 4 x PDUs per rack for proper resilience

Switches Variety is the spice of life – until you have to make them work together Mesh with aggregation layer and access layer – all Force10 – About to test an Arista 7050 as a possible access layer switch for our mesh Big layer 2 domains are a problem (arp, spanning tree…)