Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric.

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Storage Architecture CE202 December 2, 2003 David Pease.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Introduction to DBA.
IBM Solutions for Grid Computing. I. IT view on “GRID” II. IBM and GRID III. IBM Storage and GRID Index …
Vorlesung Speichernetzwerke Teil 2 Dipl. – Ing. (BA) Ingo Fuchs 2003.
SQL Server, Storage And You Part 2: SAN, NAS and IP Storage.
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing large amount.
STORAGE Virtualization
Storage Area Network (SAN)
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Using and Configuring Storage Devices Guide to Operating Systems Third Edition.
By : Nabeel Ahmed Superior University Grw Campus.
Peter Stefan, NIIF 29 June, 2007, Amsterdam, The Netherlands NIIF Storage Services Collaboration on Storage Services.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Secondary Storage 7.
11 Capacity Planning Methodologies / Reporting for Storage Space and SAN Port Usage Bob Davis EMC Technical Consultant.
Coming revolutions in mass storage: implications for image archives Christopher D. Elvidge, Ph.D. NOAA-NESDIS National Geophysical Data Center E/GC2 325.
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Untitled (Hidden Track): Born Digital Content Preservation Service at UIUC Tracy Popp, MS LIS, CAS Digital Preservation Coordinator University Library.
A New Tape Technology to Meet Tomorrows Storage Demands SAIT.
Module 10 Configuring and Managing Storage Technologies.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
1 Chapter 3 Understanding Computers, 11 th Edition Storage Medium The physical material on which a computer keeps data, instructions and information. Can.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Secondary Storage Chapter 8 Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. 8-1.
Virtualization for Storage Efficiency and Centralized Management Genevieve Sullivan Hewlett-Packard
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
Technology Overview: Mass Storage D. Petravick Fermilab March 12, 2002.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data Center EDC CR1 Storage Architecture August 2003 Ken Gacke.
Switched Storage Architecture Benefits Computer Measurements Group November 14 th, 2002 Yves Coderre.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Business Data Communications, Fourth Edition Chapter 11: Network Management.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Virtual Tape Library
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
STORAGE ARCHITECTURE/ MASTER): Disk Storage: What Are Your Options? Randy Kerns Senior Partner The Evaluator Group.
SATA In Enterprise Storage Ron Engelbrecht Vice President and General Manager Engineering and Manufacturing Operations September 21, 2004.
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007.
© Copyright 2004 Instrumental, Inc I/O Types and Usage in DoD Henry Newman Instrumental, Inc/DOD HPCMP/DARPA HPCS May 24, 2004.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Security Operations Chapter 11 Part 2 Pages 1262 to 1279.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
Open-E Data Storage Software (DSS V6)
Video Security Design Workshop:
Processing Device and Storage Devices
Chapter III, Desktop Imaging Systems and Issues: Lesson II Storing Image Data
Computing Infrastructure for DAQ, DM and SC
Storage Virtualization
Real IBM C exam questions and answers
UNIT IV RAID.
Data Management Components for a Research Data Archive
Primary Storage 1. Registers Part of the CPU
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric Research

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 2 Vision How do we accomplish that vision? Handling large datasets – Analysis and Visualization Shared File Systems and Cache Pools Middleware and layering Management tools Emerging Technologies (To name a few)

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 3 Large Datasets The NCAR MSS was originally a tape based archive. NCAR MSS average file size is 35 MBs (11 M files); small due to historical restrictions (single volume datasets, model history files) and a large number (25%) of files < 1 MB (user backups) Single TB sized files are common for visualization and analysis Currently these large files are sliced up prior to landing in the archive. Access is generally sequential, but some random access.

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 4 Large Datasets Are tape based archives obsolete? No, but there is a need to reevaluate the entire storage structure at NCAR. Cache pools Data warehouses, data sub-setting The NCAR MSS is being treated as a shared file system rather than an archive.

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 5 Shared File System Heterogeneous High-Performance High-Capacity Doesn’t yet exist. Shared Data Web/ GRID/ servers Programmatic Command Line

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 6 Cache Pools External to the archive Minimize archive activity Temporary data stays out of the archive Customized for a smaller set of associated data Internal to the archive Minimize tape activity Improve response time Federate and distribute Repackage small files for tape storage under system control

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 7 MSS Proxy Data analysis GPFS Shared File System Advanced Research Computing System (IBM SP) Terascale Modeling & Analysis

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 8 Vislab MSS Proxy Data analysis Storage Area Network Shared File System Terascale Analysis & Visualization

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 9 CDP/ESG Data Processor DSS server Storage Area Network Shared File System Unidata, DODs MSS Proxy Data Provisioning & Access

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 10 Internal Cache Pools NCAR MSS event log modeling (April 2000 – April 2001) – looking at tape activity 20 TB cache pool – can be federated and distributed 30 day average cache residency 70% reduction in tape read-backs Greatly enhanced response time Reduce the amount of tape resources or redefine their use.

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 11 Middleware and Layering An Archive performs 2 basic functions Reliably storing data Returning data on demand Data analysis, data mining, data assimilation, distributed data servers, etc. are functions utilizing middleware that sits on top of an archive and should be implemented independent of the underlying archive. Role of an archive

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 12 Middleware and Layering Separate archive functionality from Visualization Data servers Data warehousing, data mining, data subsetting Web and Grid access Etc. Maximally enables the use of COTS Allows (transparent) replacement of components as needed Fill the gaps with custom software

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 13 Future Data Services File Cache Services Pools NCAR MSS Archive Data Analysis/Mining/Assimilation Data Cataloging/ Searching Data Storage Digital Libraries, Data Servers Visualization WEB

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 14 Management Tools There is a need for better user and system management tools as MSS capacity scales. How does a single user manage 1 million files? How does a MSS administrator dynamically tune a system, predict workloads, find and correct bottlenecks?

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 15 Management Tools Defining new roles Single ordinary user MSS superuser As users come and go, there is a need for: Project superuser (new) Division data administrator (new) Web based metadata user tools List, search, catalog holdings – metadata mining Remove unwanted files NCAR MSS tools

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 16 Management Tools From the system perspective – utilize data warehousing and data mining techniques System modeling using event logs. Capacity planning Identify bottlenecks Operational monitoring Track errors, identify trends (media problems) Intrusion detection Dynamic system tuning NCAR MSS tools

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 17 Emerging Technologies Data Path Tape Holographic Storage Probe-Based MEMS High-Density Rosetta (analog)

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 18 Data Path HIPPI in use today in the NCAR archive Fibre Channel will replace our HIPPI in the near term FC SAN for RAID Cache Pools FC SAN for Tape sharing Others iSCSI FC over IP Infiniband

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 19 Tape Linear E MammothDLT-7000 DTF SD-3 Helical Native Cartridge Capacity (GB) 3480/90 AIT AIT 3570C Ultrium EE Accelis Mammoth 2 SDLT 3490 E DLT H B Opt TB 200GB 1Q02 500GB 500GB TB,60MB,2004

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 20 Tape To be competitive with magnetic disk, magnetic tape must grow at 10x each 5 years. Achieved by a combination of increased areal density and longer (and possibly wider) tape. (from a storage vendor)

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 21 Tape RAIT (Redundant Array of Independent Tapes) Increased Performance Higher Reliability with the use of parity Higher single “volume” Capacity Large datasets on a single “volume” RAIL (Redundant Array of Independent Libraries) Greater total system capacity Improved response time These are resource intensive solutions – dedicated libraries and drives

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 22 Holographic Large capacity – 10 GBs in a single cubic centimeter (10 Gbits/in 2 for magnetic disk) High-speed – 2 Gigabits/sec Low power Billions of write cycles

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 23 Probe-Based MEMS MEMS – Micro-Electrical Mechanical Systems Probe-based storage arrays Dense Highly parallel to achieve high bandwidth Rectilinear 2D positioning Commercial devices in the next several years

CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 24 HD Rosetta Product marketed by Norsam Technologies Developed at Los Alamos National Lab Analog Lifetime of 1000s of years Can be read back with only a microscope Stores text and images