Technology Overview: Mass Storage D. Petravick Fermilab March 12, 2002.

Slides:



Advertisements
Similar presentations
30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Advertisements

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Operating System Structures
Database Administration and Security Transparencies 1.
11 BACKING UP AND RESTORING DATA Chapter 4. Chapter 4: BACKING UP AND RESTORING DATA2 CHAPTER OVERVIEW Describe the various types of hardware used to.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Distributed components
Introduction to Databases Transparencies
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
By : Nabeel Ahmed Superior University Grw Campus.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Introduction to Databases and Database Languages
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 Designing Storage Architectures Meeting September 17-18, 2007 Washington, DC Robert M. Raymond Tape Drive Engineering Storage Group Sun Microsystems.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
Distributed File Systems
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Computer Organization & Assembly Language © by DR. M. Amer.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
1 Introduction to Databases. 2 Examples of Database Applications u Purchases from the supermarket u Purchases using your credit card u Booking a holiday.
1 Chapter 1 Introduction to Databases Transparencies.
Virtual Tape Library
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Solving Today’s Data Protection Challenges with NSB 1.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Storage Systems CSE 598d, Spring 2007 Lecture 13: File Systems March 8, 2007.
TEVATRON DATA CURATION UPDATE Gene Oleynik, Fermilab Department Head, Data Movement and Storage 1.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
XtreemOS IP project is funded by the European Commission under contract IST-FP Scientific coordinator Christine Morin, INRIA Presented by Ana.
Introduction to Databases Transparencies
Introduction to Databases Transparencies
CASTOR: possible evolution into the LHC era
GridOS: Operating System Services for Grid Architectures
MERANTI Caused More Than 1.5 B$ Damage
Memory COMPUTER ARCHITECTURE
Introduction to Data Management in EGI
Introduction to Databases Transparencies
Chapter III, Desktop Imaging Systems and Issues: Lesson II Storing Image Data
XenData SX-550 LTO Archive Servers
An Introduction to Computer Networking
Database System Architecture
Overview of DLSS Workshop
Introduction to Databases Transparencies
Enterprise Class Virtual Tape Libraries
Presentation transcript:

Technology Overview: Mass Storage D. Petravick Fermilab March 12, 2002

Mass Storage Is a Problem In: Computer Architecture. Distributed Computer Systems. Software and System Interface. Material Flow(!) Ingest. Re-copy and Compaction. Usability.

Implementation Technologies Standard Interfaces Standard Interfaces –which work across mass storage system instances. Storage System Software Storage System Software –A variety of storage system software. Hardware: Hardware: –Tape and/or disk at large scale.

Interface Requirements Protocols for LHC era…. Protocols for LHC era…. –Well known. –Interoperable. –At least either Simple or Stable. Areas. Areas. –File Access. –Management. –Discovery and “Monitoring”. –(perhaps) Volume Exchange.

Storage Systems

Provide network interface suitable for a very large, virtual organization’s integrated data systems. Provide network interface suitable for a very large, virtual organization’s integrated data systems. Provide cost-effective implementations. Provide cost-effective implementations. Provide permanence as required. Provide permanence as required. Provide availability as required. Provide availability as required.

Network Access Protocols for Storage Systems IP based, with “hacks” to cope with IP performance issues. (i.e. //FTP) IP based, with “hacks” to cope with IP performance issues. (i.e. //FTP) Staging: Staging: –Domain Specific w/ Grid Protocols under early Implementation. File system extensions: File system extensions: –RFIO in European “DataGrid.” Management – “SRM.” Management – “SRM.” –Prestage, Pin, Space reservations, etc.

File System Extensions to Storage Systems Goal: reduce explicit staging. Goal: reduce explicit staging. Environment is almost surely distributed. Environment is almost surely distributed. –Storage systems are often implemented on their own computers. Libraries have to deal with: Libraries have to deal with: –Performance. (read ahead, write behind) –Security Implementation techniques include Implementation techniques include –Libraries (impact: relink software). –Overloading the POSIX calls in the system library.

File System Extensions to Storage Systems Typically, only a subset of POSIX file access is consistent with access to a managed store. Typically, only a subset of POSIX file access is consistent with access to a managed store. Root Cause: conflict with permanence. Root Cause: conflict with permanence. –Modification (rm –rf /). –Deletion. –Naming.

Hardware

Tape Facility Implementation Expensive, specialized to set up. Expensive, specialized to set up. “easy” to commission in large quantities of media with great likelihood of success. “easy” to commission in large quantities of media with great likelihood of success. Good deal of permanence achieved with one copy of a file. Good deal of permanence achieved with one copy of a file. (formerly) clearly low system cost over the lifetime of an experiment. (formerly) clearly low system cost over the lifetime of an experiment. Currently -- all data written are typically read. Currently -- all data written are typically read. Trend is for diverse storage system software, with custom lab software at many major labs. Trend is for diverse storage system software, with custom lab software at many major labs.

Magnetic Disk Facilities Current Use: Buffer and cache data residing on tape. Current Use: Buffer and cache data residing on tape. Requirements: Affordable, with good usability. Requirements: Affordable, with good usability. Implementations: Implementations: –Exterior to mass storage systems. »Local or network file system. »Files are staged to and from tape. –Internal to storage system. »Provides buffering and caching for tape system. »Transparent Interface: DMAPI, Kernel level interfaces rare (unknown). DMAPI, Kernel level interfaces rare (unknown). file system extensions to storage systems. (rfio dccp). file system extensions to storage systems. (rfio dccp).

A Good Problem Disk capacities have enjoyed better-than- Moore’s law growth in capacity. Disk capacities have enjoyed better-than- Moore’s law growth in capacity. –Doubling each year. –Subject to superparamagnetic limit, market. Tape doubles every two years. Tape doubles every two years. –right now 60 GB tapes, ~200 GB disks. What sort of systems do these trend enable? What sort of systems do these trend enable? –An Immense amount of disk comes for “free”. »Storage systems co-resident with compute systems? »Relax the constraint that staging areas are scarce. –Explore disk based permanent stores.

Any Large Disk Facility - Usability MTBF Failures (failure of perfect items). MTBF Failures (failure of perfect items). »Mechanical. (perhaps) predictable by S.M.A.R.T. (perhaps) predictable by S.M.A.R.T. »Electrical (failures dues to thermal and current densities) Mitigated by good thermal environment Mitigated by good thermal environment (perhaps) mitigated by spin down. (perhaps) mitigated by spin down. Outside of MTBF failures. Outside of MTBF failures. –Freaks and Infant mortals. –Defective batches. –Firmware. BER failures (esp file system meta) data. BER failures (esp file system meta) data.

Disk As Permanent Storage Mental Schema: Mental Schema: –Many replicas ensure the preservation of information. »Allows the notion that no single copy of a file be permanent. –Permanent stores. »Backup-to-tape model (i.e. read ~never). »Only-on-Disk model.

Disk: Backup-to-tape Model Is Conventional. Is Conventional. Each byte on disk is supported by a by a byte on tape. Each byte on disk is supported by a by a byte on tape. (perhaps) backup tape need not be supported in an ATL. (perhaps) backup tape need not be supported in an ATL. »Some technologies Slot costs ~= tape costs. Tape plant < ½ the size of read-from- tape systems. Tape plant < ½ the size of read-from- tape systems.

Disk: No Backing on Tape Requires R&D. Requires R&D. –Who else is interested in this? Conflicts w industry’s model that important data is backed up anyway. Conflicts w industry’s model that important data is backed up anyway. –This is built into the assumption for quality of raid controllers, file systems, busses, etc. Market Fundamentals (guess). Market Fundamentals (guess). –Margin in highly integrated disk > Margin in tape > Margin in commodity disk. Material flow problem – would you rather commission 200 tapes/week or 100 disks/week? Material flow problem – would you rather commission 200 tapes/week or 100 disks/week?

No Backing on Tape – Technical At large scale users see the MTBF of disk. At large scale users see the MTBF of disk. Need to consistently commission large lots of disk. Need to consistently commission large lots of disk. Useful features (spin down, S.M.A.R.T) can be abstracted away by controllers. Useful features (spin down, S.M.A.R.T) can be abstracted away by controllers. Would like a better primitive than a RAID controller. Would like a better primitive than a RAID controller.

Imagine Disks Arranged in a Cube (say 16x16x16)

Add Three Parity Planes

Very High Level of Redundancy: 3/19 Overhead For 16x16x16 Data Array.

Handles on Cheap Disk in the Storage System Program oriented work in progress today… Program oriented work in progress today… TB commodity disk servers. (Castor, dCache….). TB commodity disk servers. (Castor, dCache….). “small file problem” -- Today’s tapes are poor vehicles for < 1 GB files. “small file problem” -- Today’s tapes are poor vehicles for < 1 GB files. –Some concrete plans in storage systems. » HPSS, small files as meta data, backup to tape. »FNAL Enstore permanent disk mover. Exploit excess disk associated with farms. Exploit excess disk associated with farms.

Summary Quantity has a Quality all of its own. Quantity has a Quality all of its own. Low cost disk: Low cost disk: –Is likely to provide significant optimizations. –is potentially disruptive to our current models. At the high level (well for a storage system…) we must implement interoperation with experiment middleware. At the high level (well for a storage system…) we must implement interoperation with experiment middleware. –Any complex protocols must be standard. –Stage and file-system semantics are both prominent.