CERN Lustre Evaluation and Storage Outlook

Slides:



Advertisements
Similar presentations
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Advertisements

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
CT NIKHEF June File server CT system support.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
CERN IT Department CH-1211 Genève 23 Switzerland t R&D Activities on Storage in CERN-IT’s FIO group Helge Meinhard / CERN-IT HEPiX Fall 2009.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Step Arena Storage Introduction. 2 HDD trend- SAS is the future Source: (IDC) Infostor June 2008.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.
1 © 2010 Overland Storage, Inc. © 2012 Overland Storage, Inc. Overland Storage The Storage Conundrum Neil Cogger Pre-Sales Manager.
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei.
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
HEPiX bit-preservation WG update – Spring 2014 Dmitry Ozerov/DESY Germán Cancio/CERN HEPiX Spring 2014, Annecy.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Bosch DSA Storage (based on NetApp E2700)
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Storage issues for end-user analysis Bernd Panzer-Steindel, CERN/IT 08 July
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
SATA In Enterprise Storage Ron Engelbrecht Vice President and General Manager Engineering and Manufacturing Operations September 21, 2004.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Extending Auto-Tiering to the Cloud For additional, on-demand, offsite storage resources 1.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
GPFS Parallel File System
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
System Storage TM © 2007 IBM Corporation IBM System Storage™ DS3000 Series Jüri Joonsaar Tartu.
Power Systems with POWER8 Technical Sales Skills V1
EonStor DS 1000.
Storage HDD, SSD and RAID.
Open-E Data Storage Software (DSS V6)
CASTOR: possible evolution into the LHC era
File Syncing Technology Advancement in Seafile -- Drive Client and Real-time Backup Server Johnathan Xu CTO, Seafile Ltd.
Integrating Disk into Backup for Faster Restores
EonStor DS 2000.
DSS-G Configuration Bill Luken – April 10th , 2017
iSCSI Storage Area Network
The demonstration of Lustre in EAST data system
Fujitsu Training Documentation RAID Groups and Volumes
Introduction to Operating Systems
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
Experiences and Outlook Data Preservation and Long Term Analysis
Cluster Active Archive
Luca dell’Agnello INFN-CNAF
Bernd Panzer-Steindel, CERN/IT
Olof Bärring LCG-LHCC Review, 22nd September 2008
Operating System I/O System Monday, August 11, 2008.
Introduction to Networks
Ákos Frohner EGEE'08 September 2008
CERN Windows Roadmap Tim Bell 8th June 2011.
The INFN Tier-1 Storage Implementation
Large Scale Test of a storage solution based on an Industry Standard
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Kirill Lozinskiy NERSC Storage Systems Group
Upgrading to Microsoft SQL Server 2014
Special Promo Valid Until
AFS at CERN and other High Energy Physics Sites
Cost Effective Network Storage Solutions
Special Promo Valid Until
Special Promo Valid Until
Presentation transcript:

CERN Lustre Evaluation and Storage Outlook Tim Bell Arne Wiebalck HEPiX, Lisbon 20th April 2010

Agenda Lustre Evaluation Summary Storage Outlook Conclusions Life cycle management Large disk archive Conclusions CERN Lustre Evaluation and Storage Outlook - 2

Lustre Evaluation Scope HSM System CERN Advanced STORage Manager (CASTOR) 23 PB, 120 million files, 1’352 servers Analysis Space Analysis of the experiments’ data 1 PB access with XRootD Project Space >150 projects Experiments’ code (build infrastructure) CVS/SVN, Indico, Twiki, … User home directories 20’000 users on AFS 50’000 volumes, 25 TB, 1.5 billion acc/day, 50 servers 400 million files 3

Evaluation Criteria Mandatory is support for ... Life Cycle Management Backup Strong Authentication Fault-tolerance Acceptable performance for small files and random I/O HSM interface Desirable is support for ... Replication Privilege delegation WAN access Strong administrative control Performance was explicitly excluded See the results of the HEPiX FSWG 4

Compliance (1/3) Life cycle management Backup Strong Authentication Not OK: no support for live data migration, Lustre or kernel upgrades, monitoring, version compatibility Backup OK: LVM snaphots for MDS plus TSM for files worked w/o problems Strong Authentication Almost OK: Incomplete code in v2.0, full implementation expected Q4/2010 or Q1/2011 5

Compliance (2/3) Fault-tolerance Small files HSM interface OK: MDS and OSS failover (we used a fully redundant multipath iSCSI setup) Small files Almost OK: Problems when mixing small and big files (striping) HSM interface Not OK: Not supported yet, but under active development 6

Compliance (3/3) Replication Privilege delegation WAN access Not OK: not supported (would help with data migration and availability) Privilege delegation Not OK: not supported WAN access Not OK: may become possible once Kerberos is fully implemented (cross-realm setups) Strong administrative control Not OK: pools not mandatory, striping settings cannot be enforced 7

Additional thoughts Lustre comes with (too) strong client/server coupling Recovery case Moving targets on the roadmap Some of the requested features are on the roadmap since years, some are simply dropped Lustre aims at extreme HPC rather then a general purpose file system Most of our requested features are not needed in the primary customers‘ environment 8

Lustre Evaluation Conclusion Operational deficiencies do not allow for a Lustre-based storage consolidation at CERN Lustre still interesting for the analysis use case (but operational issues should be kept in mind here as well) Many interesting and desired features (still) on the roadmap, so it’s worthwhile to keep an eye on it For details, see write up at https://twiki.cern.ch/twiki/pub/DSSGroup/LustreEvaluation/CERN_Lustre_Evaluation.pdf 9

Agenda Lustre Evaluation Summary Storage Outlook Conclusions Life cycle management Large disk archive Conclusions CERN Lustre Evaluation and Storage Outlook - 10

Life Cycle Management Archive sizes continue to grow 28 PB tape used currently at CERN 20 PB/year expected Media refresh every 2-3 years Warranty expiry on disk servers Tape drive repacking to new densities Time taken is related to TOTAL space, not new data volume recorded Interconnect between source and target Metadata handling overheads per file Must be performed during online periods Conflicts between user data serving and refresh Storage Outlook - 11

Repack Campaign Last repack campaign took 12 months to copy 15PB of data When next drives are available, there will be around 35PB of data To complete repack in 1 year, data refresh will require as much resources as LHC data recording This I/O capacity needs to be reserved in the disk and tape planning for sites with large archives Storage Outlook - 12

Disk Based Archive? Can we build a disk based archive at reasonable cost compared to a tape based solution? Storage Outlook - 13

Storage in a Rack Tape Storage at CERN Disk Server equivalent 1 drive has 374 TB storage Average rate 25 MB/s Disk Server equivalent 2 head nodes 2 x 4 port SAS cards 8 JBOD expansion units 45 x 2 TB disks each Capacities 720 TB per rack 540 TB when RAID-6 of 8 disks 270 TB per head node Storage Outlook - 14

High Availability Disk Array Disk Array Disk Array Disk Array Head Node Head Node Disk Array Disk Array Disk Array Disk Array Storage Outlook - 15

Simulation 20 PB/yr 2011-15 Costs normalised to tape HSM as 100 Storage in a rack can be comparable with tape on cost/GB Storage Outlook - 16

Simulation for power Additional power consumption of 100 kWatt Cost included in the simulation Storage Outlook - 17

Areas to investigate Reliability Availability Power conservation Corruptions Scrubbing Availability Fail-over testing Power conservation Disk spin down / up Lifecycle management 40 days to drain at gigabit ethernet speeds Manageability Monitoring, Repair, Install Operations cost How much effort is it to run Storage Outlook - 18

Conclusions Lustre should continue to be watched but currently is not being considered for Tier-0, Analysis or AFS replacement Lifecycle management is a major concern for the future as the size of the archive grows Disk based archiving may be an option In-depth reliability study before production Watch trends for disk/tape capacities and pricing Adapt software for multiple hierarchies CERN Lustre Evaluation and Storage Outlook - 19

Backup Slides

Home Directories/Analysis Space Use Cases HSM Generic interface, CASTOR/TSM Scalable Support for random access Analysis Space Low latency access (disk-based) O(1000) open/sec Several GB/s aggregate bandwidth ~10% Backup Mountable plus XrootD access ACLs & quota Home Directories/Analysis Space Strong authentication Backup Wide-area access Small files Availability ACLs & Quota Presentation Title - 21