CWG12: Filesystems for TDS U. FUCHS / CERN. O2 Data Flow Schema FLPs Data Network 1500 EPNs ~3o PB, 10 9 files, ~150 GBps Data Management facilities,

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

RAID Redundant Array of Independent Disks
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Faster and Easier with Built-in Data Protection Ryan Troy – NorthEast System Engineering Manager 12/13/13.
Modularized Redundant Parallel Virtual System
SQL Server, Storage And You Part 2: SAN, NAS and IP Storage.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Challenges of Storage in an Elastic Infrastructure. May 9, 2014 Farid Yavari, Storage Solutions Architect and Technologist.
Ceph Storage in OpenStack Part 2 openstack-ch,
HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
Agenda Data Storage Database Management Systems Chapter 3 AOL Time-Warner Case.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Data Storage: File Systems and Storage Software Rainer Schwemmer LHC Workshop
Your university or experiment logo here CEPH at the Tier 1 Brain Davies On behalf of James Adams, Shaun de Witt & Rob Appleyard.
Ceph: A Scalable, High-Performance Distributed File System
Bosch DSA Storage (based on NetApp E2700)
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Terascala – Lustre for the Rest of Us  Delivering high performance, Lustre-based parallel storage appliances  Simplifies deployment, management and tuning.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Q G Making Storage Real-time Steve Modica, CTO, Small Tree.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Jérôme Jaussaud, Senior Product Manager
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Making a Difference with Azure Storage Solutions Dudu Sinai.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
Testing the Zambeel Aztera Chris Brew FermilabCD/CSS/SCS Caveat: This is very much a work in progress. The results presented are from jobs run in the last.
Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.
Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future.
Experience of Lustre at QMUL
WP18, High-speed data recording Krzysztof Wrona, European XFEL
DSS-G Configuration Bill Luken – April 10th , 2017
The demonstration of Lustre in EAST data system
Flash Storage 101 Revolutionizing Databases
Experiences with http/WebDAV protocols for data access in high throughput computing
BD-CACHE Big Data Caching for Datacenters
StoRM Architecture and Daemons
Research Introduction
Introduction to Data Management in EGI
Experience of Lustre at a Tier-2 site
CERN Lustre Evaluation and Storage Outlook
Storage SIG State and Future
Ákos Frohner EGEE'08 September 2008
Enabling High Speed Data Transfer in High Energy Physics
CTA: CERN Tape Archive Overview and architecture
Computing Infrastructure for DAQ, DM and SC
Large Scale Test of a storage solution based on an Industry Standard
VNX Storage Report Project: Sample VNX Report Project ID:
Oracle Storage Performance Studies
Unity Storage Array Profile
ASM-based storage to scale out the Database Services for Physics
Steve Ko Computer Sciences and Engineering University at Buffalo
Software Overheads of Storage in Ceph
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

CWG12: Filesystems for TDS U. FUCHS / CERN

O2 Data Flow Schema FLPs Data Network 1500 EPNs ~3o PB, 10 9 files, ~150 GBps Data Management facilities, Tier-0 storage

Transient Data Storage, “The Can” High-Capacity, High-Throughput file system ~100PB, ~200GBps, 10 9 files High number of clients: ~2000 There are also good news: No mixed access (only read or write) Operations mostly linear Few candidates on the market, three retained: Lustre (v2.6.32) GPFS (v4.1.1) CEPH/RADOS (Hammer)

File System Caveats Lustre Clustered File system: data servers, one meta-data server Beware of MDS bottlenecks, whole meta data should fit in memory to avoid disk i/o GPFS All i/o striped over all servers/LUNs Distributed meta data or separate MDS server possible RADOS Object storage, “get”-”put”-”list” interface Underlying storage pools made for redundancy and zero-data loss CEPH POSIX file system interface on top of RADOS data stores

Transient Data Storage, Tests Tests are still ongoing Out-of-the-box tests finished Now working with vendors to tune the system Test: linear workload (r/w), big files, big block sizes It’s all about throughput We’re not pushing for iops performance (not needed)

Transient Data Storage, Test Setup Test environment 6 LUNs, 500MBps ea, per storage chassis MD3660 chassis with 30 disks 4TB Centos 7 Infiniband FDR only tested for Lustre

Test Results Performance vs (Application) Block Size 1 client on 10GE/IB 1 stream

Test Results Performance vs # of streams 1 client on 10GE/IB x streams

Test Results Performance vs # of clients x clients on 10GE/IB 1 stream

Observations GPFS, LUSTRE: hitting network limitations CEPH, RADOS: made for safety, not speed Future versions will improve performance, so they say. Linear writes cause 30-40% read i/o on disk level Found to be the journaling “feature" ! Data Integrity Lustre: based on underlying LUNs (e.g. RAID) GPFS: good protection through “declustered RAID” Rebuild Times: Lustre: based on underlying LUNs GPFS: Minutes Disk types: all-SSDs will change the picture

Conclusions There’s a big storage challenge ahead of us If you want to play the BIG game, there are only few solutions, choose wisely. But it is possible.. Already today !

Thank you.