Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.

Slides:



Advertisements
Similar presentations
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
System Administration Storage Systems. Agenda Storage Devices Partitioning LVM File Systems.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle Database with ZFS and NetApp Mike Carew Oracle University.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
Zetabyte FileSystem The Last Word In File Systems
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
Introducing Snap Server™ 700i Series. 2 Introducing the Snap Server 700i series Hardware −iSCSI storage appliances with mid-market features −1U 19” rack-mount.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
An Open Source approach to replication and recovery.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Local IBP Use at Vanderbilt University Advanced Computing Center for Research and Education.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.
SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
ZFS Zetabyte FileSystem The Last Word In File Systems ylin.
Chap 5: Hard Drives. Chap 5: Magnetic HD Chap 5: Hard Drive structure.
Workshop sullo Storage da Small Office a Enterprise Class Presentato da:
US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
Chap 5: Hard Drives. Chap 5: Magnetic HD Chap 5: Solid State Drives.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Workshop sullo Storage da Small Office a Enterprise Class Presentato da:
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Introduce Caching Technologies using Xrootd Wei Yang 10/28/14ATLAS TIM 2014 Univ. Chicago1.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
This courseware is copyrighted © 2016 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
ZFS Zetabyte FileSystem The Last Word In File Systems hlku.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
1 Lustre ZFS Team Sun Microsystems Lustre, ZFS, and Data Integrity ZFS BOF Supercomputing 2009 Andreas Dilger Sr. Staff Engineer Sun Microsystems.
ZFS overview And how ZFS is used in ECE/CIS At the University of Delaware Ben Miller.
E2800 Marco Deveronico All Flash or Hybrid system
ZFS / OpenStorage Lee Marzke IT Consultant - 4AERO.
Storage HDD, SSD and RAID.
Solid State Disks Testing with PROOF
Experiences and Outlook Data Preservation and Long Term Analysis
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Lightning Talk FreeBSD storage performance
Flexible Disk Use In OpenZFS Matt Ahrens
VNX Storage Report Project: Sample VNX Report Project ID:
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor

Thumper hardware configuration ZFS Sequential IO performance Simulated “Random” IO Performance issue with ZFS and Xrootd

Thumper  SUN X4500 4U box 2x2 AMD Opteron 2.8Ghz, 16GB memory, 4x 1GigE Service processor for remote control Solaris 5.10 x86_64 and ZFS 48x 1TB (cheap) SATA drive, no RAID controller  2 mirror drives for OS (Solaris disk suite)  2 hot spare drive for zpool  44 drives for data  Total 32 TB usable  4x RAID Z2 in one ZFS storage pool, each RAID Z2 11 drives, including two drives for (dual) parities.

ZFS End-to-End Data Integrity in ZFS ZFS Pool is a Merkle Tree with checksum in data block’s parent Detect silent data corruption Copy-on-write Allocate new block to write, then update points Never overwrite live data Raid Z and Raid Z2 Similar to Raid 50/60 but use variable stripe width Every block is its own RAID-Z stripe, regardless of blocksize Eliminate read-modify-write in RAID 5/6 Eliminate “write hole” With COW, provide fast reconstruction when not near full capacity No special hardware – ZFS loves cheap disks

Size (GB)Write (MB/s)Read(MB/s) Single Sequential IO Write: dd bs=32k if=/dev/zero of=/… Read: dd bs=32k if=/… of=/dev/null

Random IO Real usage is the best place to observe thumpers “random” IO None of Babar, Glast and ATLAS thumpers is busy at this time. Simulated “Random” IO Use 1384 files from …/dq2/panda, Rename them to 0,1…,1384 Batch jobs to read them back in random order Try to similar Panda Production Copy the whole file using Xrootd native copying tool (xrdcp)

#!/bin/bash typeset –i i while [ 1 ]; do i=$RANDOM*1384/32768 xrdcp -s root://wain003.slac.stanford.edu//path/$i /dev/null done File size distribution (MB)

Simulated Random IO: ~140 MByte/s for reading

Solaris and ZFS issues 900 xrootd reading and 450 xrootd writing of many 28 MB files All start at the same time Solaris 10 x86_64 update 3 Do not take all connections, clients have to keep on trying Solaris 10 x86_64 update 5 Take all connections, however Xrootd doesn’t respond to many of them Limiting the Kernel memory usage of ZFS to 11GB solve the problem

Xrootd and ZFS complete all 900 reads and 450 writes after the memory usage of ZFS is limited to 11 GB

Summary Thumper and ZFS well fit the need of scientific data storage provides a relatively inexpensive, high density entry level storage. ZFS is a filesystem and volume manager with many good features Run on Solaris 10 A Linux port is in progress, however, it is based on FUSE ZFS sometimes use too much memory for caching Simulated Random reading is around 140MB/s