Diamond Computing Status Update Nick Rees et al..

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Supermicro © 2009Confidential HPC Case Study & References.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
National Manager Database Services
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Server Hardware Chapter 22 Release 22/10/2010Jetking Infotrain Ltd.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
Planning and Designing Server Virtualisation.
Trends In Network Industry - Exploring Possibilities for IPAC Network Steven Lo.
School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
1 Computer and Network Bottlenecks Author: Rodger Burgess 27th October 2008 © Copyright reserved.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
Large Scale Parallel File System and Cluster Management ICT, CAS.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
High Availability in DB2 Nishant Sinha
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Install, configure and test ICT Networks
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
A Scalable Distributed Datastore for BioImaging R. Cai, J. Curnutt, E. Gomez, G. Kaymaz, T. Kleffel, K. Schubert, J. Tafas {jcurnutt, egomez, keith,
EMTTS UAT Day1 & Day2 Powered by:. Topics CoversTopics Remaining Comparison Network Infrastructure Separate EP Hosting Fault Tolerance.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Introduction to Exadata X5 and X6 New Features
Diamond Computing Nick Rees et al.. Storage and Computing Requirements.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Cluster Status & Plans —— Gang Qin
Video Security Design Workshop:
Experience of Lustre at QMUL
Appro Xtreme-X Supercomputers
Experience of Lustre at a Tier-2 site
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
Hadoop Clusters Tess Fulkerson.
Design Unit 26 Design a small or home office network
EPICS: Experimental Physics and Industrial Control System
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

Diamond Computing Status Update Nick Rees et al.

Introduction 2 years ago (16 Oct 2008 at INFN) I gave a presentation about Diamond Computing that was about to be commissioned. This is what actually happened...

Outline of the computing requirements Installation consisted of 4 Systems: –Resilient high density computer room. –Compute cluster with minimum 1500 SPECfp_rate2006 total. –Resilient network with dual 10 Gbit core. –200 Tbytes storage with 20 Mbytes/sec scalable aggregated throughput and 8000 metadata operations/sec.

Computer room Overall a success Has up to 350 kW redundant power, (A and B) from two separate sub-stations. Power from A is UPS and generator backed up. Has up to 350 kW cooling water. Primary cooling is from site chilled water, 220 kW standby chiller in case of problems. Ran commissioning tests at 160 kW power load satisfactorily. Standby system has proved its worth a number of times.

Layout Water pipes Cable Tray

Testing with heat loads

Now

Temperature rise on flow fail (0.5C/sec)

Problems Had one failure mode was not tested properly: –If flow stopped a differential pressure sensor was meant to indicate a fault. –In the event, it didn’t.

Result of flow failure Cold aisle temperature got to > 50 C before we controlled it Hot aisle got to ~ 60 C Luckily power load is still only ~ 40 kW. In future –Low flow detection is improved –PLC system will switch to standby if T > 30 C.

Compute cluster This was a success. Easy to specify One supplier validated system against specification. –They were the successful supplier A few teething problems –Network driver had problems being both IPMI and normal interface at same time –Sometimes need multiple restarts to get network I/F configure correctly.

Computing – solution 1U twin motherboard systems Each 1U motherboard is: –Supermicro X7DWT –Intel 5400 (Seaburg) Chipset –Dual Intel Xeon E5420 (Quad Core, 2.5GHz) Processors –16GB DDR2-667 ECC RAM –Installed as 4x4GB FBDIMMs –160GB SATA Hard Disk –1x16 PCI-Express slot (for GPU extension) Total 20 systems, 16 cores/system, 320 cores. Have since added ~20 more –some with Infiniband –some with nvidia Tesla.

Network Overall, another success Two core switches, one in each computer room Each beamline connected to both switches – so either 2GBit or 20 GBit peak bandwidth available –halved if one core switch fails Cluster switch connected with 2 x 10 GBit to each switch. Storage system connected directly into core. –Possibly a mistake Rack-top switches in both computer rooms also connected to both cores.

Network Layout

Network issues At high data rates packets get lost Standard TCP back off time is 200 ms, which is a lot of data at 10 Gbit/s See the problem mostly when 10 Gbit servers write to 1 Gbit clients. Other sites see similar issues –use your favorite search engine to look up “Data Center Ethernet” or “Data Center Bridging”

Storage – Requirements 200TBytes usable disk space 20 Mbytes/sec/Tbyte scalable aggregated throughput. –4 GBytes/sec aggregated throughput for 200 Tbytes. 100 MBytes/sec transfer rate for individual 1 Gbit clients 400 MBytes/sec for individual 10 Gbit clients 8000 metadata operations/sec Highly resilient Support for Linux clients (RHEL4U6 and RHEL5 or later) POSIX with extended attributes and ACLs Groups for file access control based (> 256 groups/user) Ethernet Clients. Extendable by at least 200TB for future growth.

Storage - solution Based on Data Direct Networks S2A9900 system Up to 5.7 GB/s throughput Fault-tolerant architecture Runs Lustre file system –Open source file system acquired by Sun in 2008, and now part of Oracle –Most popular file system in top 500 supercomputers

Storage - solution 10 Disk trays 60 hot-swap disks/tray 2 redundant trays.

Storage - solution 60 tiers of 10 disks 2 parity disks/tier 2 controllers – both controllers can access all data. FPGA based - no loss of performance when running in degraded mode.

Storage - problems In initial tests: –Sometimes got near 4 Gbytes/sec writing –Only ever got about 3 Gbytes/sec reading –Only got 8000 metadata operations/sec for the specific case of sequential creates of zero length files. –Other metadata operations were slower (~2000/sec). In reality with real users: –Primarily seem to be limited by metadata rates –8000 operations/sec specification based on 40 clients creating 200 files/sec. –In practice a detector takes 3-5 metadata operations to create a file. –The total rates also conceal that the latency for some individual files can be up to 15 seconds under load.

File writing latency

Storage – what next? It seems that our metadata rates are less than others see with simple hardware. We have bought some duplicate hardware and will be trying different configurations to optimise the system Will be talking to expert consultants for advice. Hoping that all this will enable us to do a simple upgrade in the near future. May need to increase the number of file systems to get better metadata scaling.

Conclusions Don’t leave computing until last. Computing rooms need to focus on power an cooling and not much else. Test the failure modes of the computing room. Don’t commission any detector without thinking about computing. The next generation of detectors will need the next generation of networks. Storage is not easy – prototype if you can to avoid surprises. Don’t forget about metadata rates when scaling.

Any Questions?