Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
High Performance Computing Course Notes Grid Computing.
1 SRM-Lite: overcoming the firewall barrier for large scale file replication Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory April, 2007.
Data Grids Darshan R. Kapadia Gregor von Laszewski
1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
1 Grid File Replication using Storage Resource Management Presented By Alex Sim Contributors: JLAB: Bryan Hess, Andy Kowalski Fermi: Don Petravick, Timur.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
1 SRM-Lite: overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Objective What is RFT ? How does it work Architecture of RFT RFT and OGSA Issues Demo Questions.
1 Overall Architectural Design of the Earth System Grid.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
The Data Grid: Towards an architecture for Distributed Management
U.S. ATLAS Grid Production Experience
Evaluation of “data” grid tools
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL Alex Sim Arie Shoshani

Super Computing 2000 What is Earth System Grid? Climate modeling: a mission critical application area High resolution, long-duration of climate modeling simulations produces tens of petabytes of data Earth System Grid (ESG): a virtual collaborative environment connecting distributed centers, users, models and data

Super Computing 2000 Earth System Grid ESG provides scientists with —Virtual proximity to the distributed data —Resources comprising collaborative environment ESG supports —Rapid transport of climate data between storage centers and users upon user’s request —Integrated middleware and network mechanisms that broker and manage high speed, secure, and reliable access to data and other resource in a wide area system —Persistent testbed that provides virtual proximity and shows reliable high performance data transport across a heterogeneous environment Data volume and Transmittal problems —High speed data transfer in heterogeneous Grid environments

Super Computing 2000 In this Demo Will show managing user requests accessing files from multiple sites in a secure manner selecting the “best” replica participating institutions for file replicas —SDSC: all the files for the demo on HPSS —about 15 disjoint files on disk in each of 5 locations: ISI, ANL, NCAR, LBNL, LLNL —some files are only on tape —size of files MBs the entire dataset stored on HPSS at NERSC (LBNL) —use HRM (via CORBA) to request staging of files to HRM’s disk —use GSI-ftp (security enhanced FTP) to transfer the file after it is staged

Super Computing 2000 Request Manager Coordination

Super Computing 2000 Request Manager Request Manager: developed at LBNL —accepts a request to cache a set of logical file names —checks replica locations for each file —gets NWS bandwidth/latency for each replica location —selects “lowest” cost location —initiates transfer using GSI-FTP —monitors file transfer progress, responds to status command Client: PCMDI software (LLNL) —it has its own “metadata” catalog —lookup in the catalog generates a set of files that are needed to satisfy a user’s request

Super Computing 2000 FTP Services for GRID Secured FTPs used for GRID: —GridFTP (developed at ANL) Support for both client and server Secured with grid security infrastructure (GSI) Parallel streaming capability —gsi-wuftpd server (developed at WU) Wuftp server with grid security infrastructure —gsi-ncftp client (ncftp.com) Ncftp client with grid security infrastructure —gsi-pftpd (developed at SDSC) For access to HPSS Parallel ftp server with grid security infrastructure

Super Computing 2000 Replica Catalog Service Globus Replica Catalog —developed using LDAP —has concept of a logical file collection —registers logical file name by collection —uses URL format for location of replica this includes host machine, (port), path, file_name —may contain other parameters, e.g. file size —provides hierarchical partitioning of a collection in the catalog (does not have to reflect physical organization at any site) —provides C-API

Super Computing 2000 Network Weather Service Network weather service (NWS) —developed by U of Tennessee —require installation at each participating host —provides pair-wise bandwidth/latency estimates —accessible through LDAP query

Super Computing 2000 Hierarchical Resource Manager Hierarchical Resource Manager (HRM) —HRM: for managing the access to tape resources (and staging to local disk) A HRM uses a disk cache for staging functionality generic but needs to be specialize for specific mass storage systems e.g. HRM-HPSS, HRM-Enstore,... —DRM: for managing disk resources Under development

Super Computing 2000 HRM Functionality HRM functionality includes : —queuing of file transfer requests —reordering of request to optimize Parallel FTP (ordered by files on the same tape) —monitoring progress and error messages —re-schedules failed transfers —enforces local resource policy number of simultaneous file transfer requests number of total file transfer requests per user priority of users fair treatment of users

Super Computing 2000 Current implementation of an HRM system Currently implemented for HPSS system All transfers go through HRM disk —reasons: flexibility of pre-staging —disk is sufficiently cheap for a large cache —opportunity to optimize for same file requests Functionality —Queuing file transfers —File queue management —File clustering parameter —Transfer rate estimation —Query estimation - total time —Error handling

Super Computing 2000 Queuing File Transfers Number of Parallel FTPs to HPSS are limited —limit set by a parameter —parameter can be changed dynamically HRM is multi-threaded —issues and monitors multiple Parallel FTPs All requests beyond PFTP limit are queued File Catalog used to provide for each file —HPSS path/file_name —Disk cache path/file_name —File size —tape ID

Super Computing 2000 File Queue Management Goal — minimize tape mounts — still respect the order of requests — do not postpone unpopular tapes forever File clustering parameter - FCP —If the file at top of queue is in Tape i and FCP > 1 (e.g. 4) then up to 4 files from Tape i will be selected to be transferred next —then, go back to file at top of queue Parameter can be set dynamically F 1 (T i ) F 3 (T i ) F 2 (T i ) F 4 (T i ) Order of file service

Super Computing 2000 Reading Order from Tape for different File Clustering Parameters File Clustering Parameter = 1 File Clustering Parameter = 10

Super Computing Typical Processing Flow

Super Computing 2000 Typical Processing Flow with HRM

Super Computing 2000 Conclusion Demo ran successfully at SC2000 Received “hottest infrastructure” award Proved the ability to put together multiple middleware components using common standards, interfaces, and protocols Proved usefulness of Storage Resource Management (SRM) concept for grid applications Most difficult problem for future: robustness in the face of —hardware failures —network failures —system failures —client failures