Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.

Slides:



Advertisements
Similar presentations
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Operating System.
Chapter 7 LAN Operating Systems LAN Software Software Compatibility Network Operating System (NOP) Architecture NOP Functions NOP Trends.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 2: z/OS Overview.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Network-Attached Storage
Distributed Processing, Client/Server, and Clusters
Vorlesung Speichernetzwerke Teil 2 Dipl. – Ing. (BA) Ingo Fuchs 2003.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
STORAGE Virtualization
Module – 7 network-attached storage (NAS)
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
File Systems (2). Readings r Silbershatz et al: 11.8.
CT NIKHEF June File server CT system support.
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Module 10 Configuring and Managing Storage Technologies.
MCTS Guide to Configuring Microsoft Windows Server 2008 Active Directory Chapter 6: Windows File and Print Services.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
1 © 2010 Overland Storage, Inc. © 2012 Overland Storage, Inc. Overland Storage The Storage Conundrum Neil Cogger Pre-Sales Manager.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
 CASTORFS web page - CASTOR web site - FUSE web site -
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
CASTOR status Presentation to LCG PEB 09/11/2004 Olof Bärring, CERN-IT.
VMware vSphere Configuration and Management v6
Chap 7: Consistency and Replication
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Introduction to Active Directory
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
OpenStorage Training Introduction to NetBackup
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CASTOR new stager proposal CASTOR users’ meeting 24/06/2003 The CASTOR team.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
An Introduction to GPFS
2016 Global Seminar 按一下以編輯母片標題樣式 Virtualization apps simplify your IoT development Alfred Li.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
XenData SX-10 LTO Archive Appliance
CASTOR: possible evolution into the LHC era
Chapter 11: File System Implementation
Chapter 12: File System Implementation
File System Implementation
Ákos Frohner EGEE'08 September 2008
Oracle Solaris Zones Study Purpose Only
Storage Virtualization
CASTOR: CERN’s data management system
Threads Chapter 4.
Chapter 15: File System Internals
INFNGRID Workshop – Bari, Italy, October 2004
Chapter 1: Introduction
IBM Tivoli Storage Manager
STATEL an easy way to transfer data
Presentation transcript:

Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002

Outline ◆ Current Environment ◆ Main Problems ◆ Solutions ◆ Related Techniques ◆ Introduction to CERN/Castor ◆ Test environment

Current storage Environment Isolated Storage ◆ Isolated Storage Each server has its own storage ■ Each server has its own storage Multi-platform ◆ Multi-platform Redhat Linux, HP-UX, Solaris, Windows ■ Redhat Linux, HP-UX, Solaris, Windows Various mediums ◆ Various mediums Disk array, tapes including LTO, DLT, SDLT, etc. ■ Disk array, tapes including LTO, DLT, SDLT, etc. Obsolete Management ◆ Obsolete Management NFS ◆ NFS

Isolated Storage Sun Storage File System Volume Manager File System Volume Manager HP StorageDell Storage File System Volume Manager Storage Island

Main Problems DAS( Directly Attached Storage )  Data Island ◆ DAS( Directly Attached Storage )  Data Island Bad scalability ◆ Bad scalability Low efficiency ◆ Low efficiency Inconvenient to use ◆ Inconvenient to use NFS ◆ NFS Overload on System ■ Overload on System Overhead on Network ■ Overhead on Network ◆ Small capacity

Solutions Building an Advanced Storage Environment ◆ Building an Advanced Storage Environment Provides ■ Provides ● Remote access to disk files ● Disk pool management ● Indirect access to tape ● Volume manager ● Hierarchical Storage Manager Functionality Main Objectivities ■ Main Objectivities ● Focussed on HEP requirements ● Easy to use, deploy, administer ● High performance ● Good scalability ● Available on most Unix systems and Windows/NT ● Integration and Virtualization of storage resource

Related Techniques ◆ Hierarchical Storage Manager (HSM) Distributed file system ◆ Distributed file system Storage Area Network (SAN) ◆ Storage Area Network (SAN) Virtual Storage ◆ Virtual Storage

Hierarchical Storage Manager Characteristics of data in High Energy Physics ◆ Characteristics of data in High Energy Physics ■ 20% active, 80% non-active Layers of storage devices ◆ Layers of storage devices Data migration ◆ Data migration Data recall ◆ Data recall 3-tier storage infrastructure ◆ 3-tier storage infrastructure

Distributed file system Load balance between storage devices ◆ Load balance between storage devices Alleviate the overload of OS and network ◆ Alleviate the overload of OS and network A single, shared name space for all users, ◆ A single, shared name space for all users, from all machines from all machines Location-independent file sharing ◆ Location-independent file sharing Client Caching ◆ Client Caching Extended security through Kerberos authentication and Access Control Lists ◆ Extended security through Kerberos authentication and Access Control Lists Replication techniques for file system reliability ◆ Replication techniques for file system reliability

Storage Area Network ◆ A private network specially for storage ◆ Storage devices are connected to a switch through FCP, iscsi, InfiniBand and other protocols ◆ These protocols are designed specially for large amount of data transfer ◆ Servers are directly connected to the disks and share data ◆ Use native filesystems  much better performance than NFS ◆ Some HSM functionality still needed

HSM SAN Model LAN Storage Area Network server

Virtual Storage Map all the storage resource to a virtual device or a single file space ◆ Map all the storage resource to a virtual device or a single file space Integrating storage devices ◆ Integrating storage devices different storage connections: DAS, NAS, SAN ■ different storage connections: DAS, NAS, SAN different storage mediums: Disk, Tape ■ different storage mediums: Disk, Tape Indirectly access physical storage devices ◆ Indirectly access physical storage devices Easy to use, administer ◆ Easy to use, administer Support multi-platform ◆ Support multi-platform Data sharing ◆ Data sharing

Our Implement of Virtual Storage Physical Storage Devices Storage Management Software Redhat Client HP Client Solaris Client NT Client Virtual Storage Space Client Virtualization Transparent access

Introduction to CERN/castor Cern Advanced STORage manager ◆ Cern Advanced STORage manager In January, 1999, CERN began to develop CASTOR ■ In January, 1999, CERN began to develop CASTOR ■ Hierarchical Storage Manager used to store user and physics files It manages the secondary and tertiary storage ■ It manages the secondary and tertiary storage Currently holds more than 1800 TB of data ■ Currently holds more than 1800 TB of data The servers are installed in the Computer center, while the clients are deployed on most of the computers including the desktops ■ The servers are installed in the Computer center, while the clients are deployed on most of the computers including the desktops. automatic management experiment data on files ■ automatic management experiment data on files Main access to data is through RFIO (Remote File I/O package) ◆ Main access to data is through RFIO (Remote File I/O package)

Remote File I/O (RFIO) Provide transparent access to files: they can be local, remote or HSM files ◆ Provide transparent access to files: they can be local, remote or HSM files There exist ◆ There exist ■ a command line interface: rfcp, rfmkdir, rfdir ■ an Application Programming Interface (API) All calls handle standard file names and file descriptors (Unix or Windows) ◆ All calls handle standard file names and file descriptors (Unix or Windows) The routine names are obtained by pre-pending standard Posix system calls by rfio_ ◆ The routine names are obtained by pre-pending standard Posix system calls by rfio_ The function prototypes are unchanged ◆ The function prototypes are unchanged The function name translation is done automatically by including the header file “rfio.h” ◆ The function name translation is done automatically by including the header file “rfio.h”

RFIO access to data RFIO Client RFIOD (DISK MOVER) RFIO Client LOCAL DISK REMOTE DISK

Disk Pool a series of disks on different machines form disk pool managed by Stager ◆ a series of disks on different machines form disk pool managed by Stager disk virtualization ◆ disk virtualization allocate space in disk pool to store files ◆ allocate space in disk pool to store files make space in the pools to store new files by garbage collector ◆ make space in the pools to store new files by garbage collector Keeps a catalog of all files residing in the pools ◆ Keeps a catalog of all files residing in the pools

File access in a disk pool STAGER RFIOD (DISK MOVER) DISK POOL RFIO Client CATALOG

Castor Name Server File names are in the form: ◆ File names are in the form: ■ /castor/domain_name/experiment_name/… for example: /castor/ihep.ac.cn/ybj/ ■ /castor/domain_name/user/… for example: /castor/ihep.ac.cn/user/c/cheng Role: ◆ Role: ■ Implement an hierarchical view of the name space: files and directories ■ Remember the file residency on tertiary storage ■ Keep the file class definitions

CASTOR file access NAME server STAGER RFIOD (DISK MOVER) DISK POOL NAME server RFIO Client CATALOG

CASTOR components The backend store consists of: ◆ The backend store consists of: RFIOD (Disk Mover) ■ RFIOD (Disk Mover) Name server ■ Name server Volume Manager ■ Volume Manager Volume and Drive Queue Manager ■ Volume and Drive Queue Manager ■ RTCOPY daemon (Tape Mover) Tpdaemon ■ Tpdaemon Main characteristics of the servers ◆ Main characteristics of the servers Distributed ■ Distributed Critical servers are replicated ■ Critical servers are replicated Use CASTOR Database (Cdb), Open Source databases (MySQL) ■ Use CASTOR Database (Cdb), Open Source databases (MySQL)

Main components Distributed components ◆ Distributed components Remote File I/O (RFIO) ■ Remote File I/O (RFIO) ■ CASTOR Name Server (Cns) Stager ■ Stager Tape Mover (RTCOPY) ■ Tape Mover (RTCOPY) Physical Volume Repository (Ctape) ■ Physical Volume Repository (Ctape) Central components ◆ Central components Volume Manager (VMGR) ■ Volume Manager (VMGR) Volume and Drive Queue Manager (VDQM) ■ Volume and Drive Queue Manager (VDQM) Message Daemon ■ Message Daemon

Stager Role: Storage Resource Manager ◆ Role: Storage Resource Manager ■ Disk pool manager Allocates space on disk to store files Keeps a catalog of all files residing in the pools Makes space in the pools to store new files (garbage collector) ■ Hierarchical Resource Manager Migrates files according to file class and disk pool policies Recalls files ■ Tape Stager (deprecated) Caches tape files on disk

File classes Associated with each file or directory ◆ Associated with each file or directory Inherited from the parent directory but can be changed (at sub-directory level) ◆ Inherited from the parent directory but can be changed (at sub-directory level) Describes how the file is managed on disk, migrated and purged ◆ Describes how the file is managed on disk, migrated and purged File class attributes are: ◆ File class attributes are: Ownership ■ Ownership Migration time interval ■ Migration time interval Minimum time before migration ■ Minimum time before migration Number of copies ■ Number of copies Retention period on disk ■ Retention period on disk Number of parallel streams (number of drives) ■ Number of parallel streams (number of drives) Tape pools ■ Tape pools

Migration policies Migration policy depends on ◆ Migration policy depends on File class ■ File class Disk pool ■ Disk pool Start migration ◆ Start migration Amount of data ready to be migrated exceeds a given threshold ■ Amount of data ready to be migrated exceeds a given threshold Percentage of free space below a given threshold ■ Percentage of free space below a given threshold Time interval ■ Time interval ■ Migration can also be forced Stop migration ◆ Stop migration Data ready at start migration time has been migrated ■ Data ready at start migration time has been migrated Algorithm ◆ Algorithm Least recently accessed file migrated first ■ Least recently accessed file migrated first Maximum number of tape drives (parallel streams) can be set ■ Maximum number of tape drives (parallel streams) can be set

Physical Volume Repository (Ctape) ◆ Dynamic configuration of tape drives Reservation of resources ◆ Reservation of resources Drive allocation (when not using VDQM) ◆ Drive allocation (when not using VDQM) Tape volume mount and position ◆ Tape volume mount and position Automatic label checking ◆ Automatic label checking User callable routines to write labels ◆ User callable routines to write labels Drive status display ◆ Drive status display Operator interface ◆ Operator interface VMGR and VDQM interface ◆ VMGR and VDQM interface Hardware supported: ◆ Hardware supported: Drives: DLT, LTO, IBM 3590, STK 9840, STK9940 ■ Drives: DLT, LTO, IBM 3590, STK 9840, STK9940 Robots: ADIC Scalar, IBM 3494, IBM 3584, Odetics, Sony DMS24, STK ■ Robots: ADIC Scalar, IBM 3494, IBM 3584, Odetics, Sony DMS24, STK

Volume Manager (VMGR) Handle pool of tapes ◆ Handle pool of tapes private to an experiment ■ private to an experiment public pool ■ public pool ■ supply pool Features: ◆ Features: ■ Determine the most appropriate tapes for storing files in a given tape pool according to file size ■ minimize the number of tape volumes for a given file Tape volumes are administered by the Computer Center. They are not owned nor managed by users. ◆ Tape volumes are administered by the Computer Center. They are not owned nor managed by users. There is one single Volume Manager ◆ There is one single Volume Manager

Volume and Drive Queue Manager (VDQM) VDQM maintains a global queue of tape requests per device group ◆ VDQM maintains a global queue of tape requests per device group VDQM maintains a global table of all tape drives ◆ VDQM maintains a global table of all tape drives ■ Provide tape server load-balancing ■ Optimize the number of tape mounts Tape requests are assigned a priority: ◆ Tape requests are assigned a priority: ■ Requests are queued in priority order ■ Requests with same priority are queued in time order ◆ Drives may be dedicated ◆ Easy to add functionality like ■ Drive quotas ■ Fair share scheduler (prototype exists)

User interface Command line ◆ Command line ■ name server commands: nsls, nsmkdir, nsrm, nstouch, nschmod,nsenterclass ■ rfio commands: rfdir, rfcp,rfcat, rfchmod,rfrm,rfrename Applications Programming Interface (API) ◆ Applications Programming Interface (API) ■ # include ■ Add a library “lshift” when compiling ■ Two forms of routine names ● obtained by pre-pending standard Posix system calls by rfio_, such as rfio_open, rfio_read, rfio_write, rfio_seek, rfio_close, etc. ● The function prototypes are unchanged. The function name translation is done automatically by including the header file “rfio.h”

Test Environment Hardware ◆ Hardware Servers: Dell 6400, Dell 4400, Dell 2400,Dell GX110 ■ Servers: Dell 6400, Dell 4400, Dell 2400,Dell GX110 Disk array, DAS disk ■ Disk array, DAS disk Tape library: Adic100 ( 2 HP LTO devices, slots) ■ Tape library: Adic100 ( 2 HP LTO devices, slots) ◆ Software ■ Operation System: redhat 7.2 ■ Storage Management Software: CERN/castor Distributed file system: NFS, AFS ■ Distributed file system: NFS, AFS Job scheduling system: PBS ■ Job scheduling system: PBS Database: MySQL ■ Database: MySQL

Future Storage Environment

Conclusion Handle the large amount of data in a fully distributed environment. ◆ Handle the large amount of data in a fully distributed environment. Mapping all the storage resource to a single file space ◆ Mapping all the storage resource to a single file space Users access files in the space through command line or API ◆ Users access files in the space through command line or API Users only remember the file name, and don’t know where their files are placed and whether the storage capacity is enough ◆ Users only remember the file name, and don’t know where their files are placed and whether the storage capacity is enough

Thanks!! Thanks!!

Storage Hierarchy

3-tier storage infrastructure SlowFastVery Fast $$$$$/MB$$/MB $/MB Storage Network LAN or WAN “disk-to-disk” appliance Filers Servers Tape Library Optical Library Heterogeneous Storage Tier 1 primary storage Tier 3 tertiary storage Tier 2 Secondary storage Archival / HSM Backup/restore Parameter s of Prevalent Tapes