Experience of Lustre at a Tier-2 site

Slides:

Advertisements

Similar presentations

Computing Infrastructure

Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deployment and Management of Grid Services.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.

NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

CVMFS AT TIER2S Sarah Williams Indiana University.

Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation:

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Tier 3g Infrastructure Doug Benjamin Duke University.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.

Optimizing Performance of HPC Storage Systems

Presented by, MySQL AB® & O’Reilly Media, Inc. 0 to 60 in 3.1 Tyler Carlton Cory Sessions.

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

Investigation of Storage Systems for use in Grid Applications 1/20 Investigation of Storage Systems for use in Grid Applications ISGC 2012 Feb 27, 2012.

UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.

Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.

PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.

A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.

Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

First Look at the New NFSv4.1 Based dCache Art Kreymer, Stephan Lammel, Margaret Votava, and Michael Wang for the CD-REX Department CD Scientific Computing.

Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC

The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.

TCD Site Report Stuart Kenny*, Stephen Childs, Brian Coghlan, Geoff Quigley.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

By Harshal Ghule Guided by Mrs. Anita Mahajan G.H.Raisoni Institute Of Engineering And Technology.

Storage at SMU OSG Storage 9/22/2010 Justin Ross Southern Methodist University.

G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

CWG12: Filesystems for TDS U. FUCHS / CERN. O2 Data Flow Schema FLPs Data Network 1500 EPNs ~3o PB, 10 9 files, ~150 GBps Data Management facilities,

Testing the Zambeel Aztera Chris Brew FermilabCD/CSS/SCS Caveat: This is very much a work in progress. The results presented are from jobs run in the last.

Title of the Poster Supervised By: Prof.*********

Experience of Lustre at QMUL

The Beijing Tier 2: status and plans

The demonstration of Lustre in EAST data system

Cluster / Grid Status Update

HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop

LCG Deployment in Japan

Jeremy Maris Research Computing IT Services University of Sussex

Southwest Tier 2 Center Status Report

Oxford Site Report HEPSYSMAN

Welcome! Thank you for joining us. We’ll get started in a few minutes.

STORM & GPFS on Tier-2 Milan

UTFSM computer cluster

Southwest Tier 2.

BEIJING-LCG2 Site Report

Oracle Storage Performance Studies

ASM-based storage to scale out the Database Services for Physics

Cost Effective Network Storage Solutions

Proposal for a DØ Remote Analysis Model (DØRAM)

RHUL Site Report Govind Songara, Antonio Perez,

QMUL Site Report by Dave Kant HEPSYSMAN Meeting /09/2019

Presentation transcript:

Experience of Lustre at a Tier-2 site Alex Martin + Christopher J. Walker Queen Mary, University of London

Why Lustre? Posix compliant High performance Scalable Free (GPL) Used on large fraction of top supercomputers Able to stripe files if needed Scalable Performance should scale with number of OSTs Tested with 25,000 Clients 450 OSSs (1000 OSTs) Max filesize 2^64 bytes Free (GPL) Source available (Paid support available)

QMUL 2008 Lustre Setup 12 OSS (290 TiB) MDS Rack Switches Worker Nodes 10GigE MDS Failover pair Rack Switches 10GigE uplink Worker Nodes E5420 – 2*GigE Opteron – GigE Xeon - GigE

Number of machines 2 Threads, 1MB block size 3.5 GB/s max transfer Probably limited by network to racks used

StoRM Architcture Storm Traditional SE StoRM

HammerCloud 718 WMS Scales well to about ~600 jobs 369 655 451 Events (24h) - 155/4490 Job failures (3.4%) Scales well to about ~600 jobs

2011 Upgrade Design criteria Maximise storage provided needed ~1PB Sufficient performance, we also upgraded #cores from 1500 to ~3000 - Goal to be able to run ~3000 ATLAS analysis jobs with high efficiency - Storage bandwidth matches compute bandwidth. Cost!!!

Upgrade Design criteria Considered both “Fat” servers with 36 x 2 TB drives and “Thin” servers 12 x 2 TB drives Similar total cost (including networking). Chose “Thin” solution - more bandwidth - more flexibility - One OST/node (although currently the is a 16 TB ext4 limit) Maximise storage provided needed ~1PB Sufficient performance - Aim to be able to run ~2500 ATLAS analysis jobs with high efficiency - Storage bandwidth matches compute bandwidth. Cost!!! Considered both “Fat” servers with 36 x 2 TB drives and “Thin” servers 12 x 2 TB drives Similar total cost (including networking). Chose “Thin” solution - more bandwidth - more flexibility - One OST/node (althouh

New Hardware 60 * Dell R510 12*2TB SATA disk H700 RAID controller 12 Gig RAM 4 * 1GbE (4 with 10 GbE) Total ~1.1 PB Formatted (integrate with legacy kit to give ~ 1.4 PB)

Lustre “Brick” (half Rack) HP 2900 Switch (legacy) 48 ports (24 storage, 24 compute) 10Gig uplink (could go to 2) 6 * Storage nodes 6 * Dell R510 4*GigE 12*2TB disk ( ~19 TB RAID6) 12 * Compute node 3 * Dell C6100 (contains 4 motherboards) 2*GigE Total of 144 (288) cores and ~110 TB ( Storage is better coupled to local CN's)

Old QMUL Network

New QMUL Network 48 *1Gig per switch 24 – storage 24 – CPU

The real thing Hepspec06 RAL disk thrashing scripts 2 machines low (power saving mode) RAL disk thrashing scripts 1 Backplane failure 2 disk failures 10Gig cards in x8 slots

RAID6 Storage Performance R510 Disk ~ 600M/s Performance well matched to 4 x Gb/s network

Lustre Block (+ Rack) performance Preliminary tests using iozone 1-24 clients 8 threads/node Network Limit 6 GB/s

Ongoing and Work Need to tune performance Integrate legacy storage into new Lustre filestore Starting to investigate other filesystems particularly Hadoop.

Conclusions Have successfully deployed a ~1PB Lustre filesystem using low cost hardware Required performance. Would scale further with more “Bricks” but would be better if Grid jobs could be localized to a specific “Brick” Would be better if the storage/CPU could be more closely integrated.

Conclusions 2 The storage nodes contain 18% of the CPU cores in the cluster. And we spend a lot of effort networking these to the CPU. It would be better (and cheaper) if these could use directly for processing the data Could be achieved using Lustre pools (or other filesystem such as hadoop)

WMS Throughput (HC 582) Scales well to about ~600 jobs

Overview Design Network Hardware Performance StoRM Conclusions