The SMB Archive System: Data Backup Across the Web Kenneth R. Sharp Stanford Synchrotron Radiation Laboratory.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
The Storage Resource Broker and.
The Storage Resource Broker and.
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.
Peter Berrisford RAL – Data Management Group SRB Services.
Chapter 20 Oracle Secure Backup.
Data Management Expert Panel - WP2. WP2 Overview.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
INTERNET DATABASE. Internet and E-commerce Internet – a worldwide collection of interconnected computer network Internet – a worldwide collection of interconnected.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Maintaining and Updating Windows Server 2008
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Week 11 Further Web Design Concepts and Tools FTP, CMS, Wordpress and Responsive Web Design.
 2000 Deitel & Associates, Inc. All rights reserved. Chapter 24 – Web Servers (PWS, IIS, Apache, Jigsaw) Outline 24.1Introduction 24.2Microsoft Personal.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
By: Roman Olschanowsky An Introduction to the.
©Kwan Sai Kit, All Rights Reserved Windows Small Business Server 2003 Features.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
The Collaboratory: computing environments and infrastructure for structural biology research Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
GridFE: Web-accessible Grid System Front End Jared Yanovich, PSC Robert Budden, PSC.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Users and Documents.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
A Remote Collaboration Environment for Protein Crystallography HEPiX-HEPNT Conference, 8 Oct 1999 Nicholas Sauter, Stanford Synchrotron Radiation Laboratory.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Goals Structural Biology Collaboratory Allow a team of researchers distributed anywhere in the world to perform a complete crystallographic experiment.
The Storage Resource Broker and.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Maintaining and Updating Windows Server 2008 Lesson 8.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Amazon Storage- S3 and Glacier
Printer Admin Print Job Manager
Patrick Dreher Research Scientist & Associate Director
VORB Virtual Object Ring Buffers
Presentation transcript:

The SMB Archive System: Data Backup Across the Web Kenneth R. Sharp Stanford Synchrotron Radiation Laboratory

Why a high capacity, long term data archive is needed Need a replacement for tapes Tapes age and medium formats change rapidly. Storage capacity and reliability of tapes limited. Much manual book-keeping is needed to keep track of data stored on tapes. Need to support large-area CCD detectors Three Q315 detectors will be generating MB files at much increased rate when the SPEAR3 upgrade is complete. RAID data storage at SSRL will be 24 TB in all that data must be backed up somehow! Need to archive data as rapidly as it is collected. Need to support high-throughput structural biology Automated beam lines will generated huge amounts of data. Large numbers of samples and targets require that metadata be stored and tracked systematically. Data must be archived automatically and easy to retrieve.

SMB Archive Uses NPACI Resources at SDSC High Performance Storage System (HPSS) Centralized long term data storage system at SDSC. Stores over 344 TB of data in 18 million files. (Jan 2002) Capacity: 2000 GBytes Disk; 6000 TBytes Tape Storage. Storage Resource Broker (SRB) Client-server middleware provides uniform interface for accessing heterogeneous resources over the network. Presents data in hierarchical folders w/data and access controls. May be used to store and retrieve data on the HPSS at SDSC. Powerful metadata querying system allows data sets to be accessed based on their attributes. Data sets can be replicated over multiple resources. Organizations may install and maintain their own SRB Servers. We use the SRB installation at SDSC. National Partnership for Advanced Computational Infrastructure (NPACI) Mission: advance science by creating national computational infrastructure: the Grid. Maintains resources at San Diego Supercomputer Center (SDSC) including HPSS, SRB.

Organizations Using SRB Digital Libraries UCB, Umich, UCSB, Stanford,CDL NSF NSDL - UCAR / DLESE NASA Information Power Grid Astronomy National Virtual Observatory 2MASS Project (2 Micron All Sky Survey) Particle Physics Particle Physics Data Grid (DOE) GriPhyN Medicine Digital Embryo (NLM) Earth Systems Sciences ESIPS LTER Persistent Archives NARA LOC Neuro Science & Molecular Science TeleScience/NCMIR, BIRN SLAC, AfCS, …

InQ SRB client for Microsoft Windows SRB client applications Users must be able to upload data, download data, and view the data in the archive. Users perform these functions via SRB client applications. Available clients: Command-line programs (“S Commands”), InQ, MySRB. Tools for custom clients: SRB C library; Java API. InQ for Microsoft Windows InQ is the easiest to use client provided by NPACI. Individual files or entire folders may be uploaded or downloaded. Files in the archive may be browsed either by directory structure or by data attributes. Limitations of InQ Runs only on Microsoft Windows platforms. Windows is not the major platform used at synchrotron light sources or in crystallography research labs. No batch job capability for long archive jobs. Exposes confusing SRB features and terminology (resources, containers, collections, etc).

MySRB web browser-based SRB client MySRB MySRB is a powerful web-based SRB client which can be run from standard web browsers. Files in the archive may be browsed either by directory structure or by data attributes. Limitations of MySRB No way to upload or download more than one file at a time. The otherwise rich functionality and powerful features are confusing to users. The bottom line: Capabilities of HPSS and SRB far exceed the perceived needs of our beam line users. Our users need a customized interface with simplified functionality. Additional infrastructure had to be designed and implemented in order to make the SRB a viable storage system for crystallographic data. A browser-based user interface is ideal.

The SMB Archive interface for using the SRB Simple archive job definition Users may rapidly browse their /home and /data directories at SSRL. Directory contents are listed in the browser window. Directories may be navigated by clicking on directory names. Files to be uploaded may be filtered according to a list of wildcards. Subdirectories may be archived recursively. The only SRB related information required is the name of the new data collection to create. Convenient web browser interface Users may define archive jobs over the web from anywhere in the world using any common type of computer. Users need only log in with their SMB Unix account name and password.

Monitoring archive jobs and downloading data Batch operation Archive job runs in background once definition is confirmed. Browser does not hang during archival. New jobs may be started while previously defined jobs are in progress. Automatically restarts jobs if HPSS is unavailable. A job status page indicates definitions and status of all running jobs. User may abort running jobs. is sent to the user when a job is started and again when it is completed. Similar interface for data download Users browse their archived data sets in exactly the same fashion. Data may be downloaded from the archive to a directory at SSRL (analogous to an upload job). Another option is to download selected files in one or more tar files directly to any computer on the Internet.

Archive System Infrastructure But first a word about SRB Accounts: An SRB account (independent of the SSRL Unix Account) is required to archive data. Your SRB account permits you to upload/download any data using SRB clients. Handy web page on our site to create an SRB account: Archive System Infrastructure – the Archive System uses the following software elements: Apache Web Server (v1.3.27) Apache Tomcat Servlet Container (v4.1.24) Java 2 Runtime (v1.4.1) SMB Authentication Gateway Server SMB Impersonation Server SRB JARGON Java API (v1.1) Archive System Servlets (for Upload, Download, and Job Maintenance) Archive System Background Applications All Archive System applications and servlets are written in Java. Archive System front-end is made up of Java servlets. Archive System back-end is made up of Java applications. All infrastructure elements are either available for free or are home-grown.

Significant infrastructure is required to provide this “simple” interface--but the payoff is huge. Authentication Gateway Server Java servlet that provides a common authentication protocol for all web-based and stand-alone applications. Used to authenticate archive system users. All web-based software developed at SSRL is being updated to use this single authentication server. Support for the authentication server has already been integrated into Blu- Ice/DCS. Allows users to navigate seamlessly between applications without authenticating multiple times. Will eventually allow access to beamline systems to be controlled automatically based on the beam schedule. Access to other resources (computing, data directories, etc.) available 24/7 Impersonation Server Unix daemon that can run any non- interactive program on behalf of any Unix user. Enables web applications to run background jobs for a user with the actual rights of the Unix user account. Accepts commands via the HTTP protocol. Verifies authentication information with the Authentication Server. Used by the archive system to list directories in the web browser and run background archive jobs as the user. Will allow further analyses to be automatically initiated by the beam line control system.

Archive System Web Architecture Internet Internet (Backbone) SMB Impersonation Archive Servlets (Tomcat) Define UploadDefine Download View Job Status Authentication Archive Jobs (background) Upload Jobs Download Jobs Job Maintenance SDSC MCAT SRB HPSS Disk Cache Tape Storage Web Browser ApacheApache

Archive Projects for the next year Optimize data transfer rates between SSRL and SDSC. Provide stand-alone application for users wishing to download datasets directly from the SRB. Implement other functions available in inQ and MySRB for manipulating existing collections (replicate, delete, etc.) Provide option for automatic data upload from Blu-Ice. Provide link from Blu-Ice to automatically start browser and load Archive page w/o user having to log in again. (New Authentication Server makes this possible.) Provide additional options for using SRB Metadata Catalog (MCAT) to describe, index, and retrieve data files. The Collaboratory for Macromolecular Crystallography is supported by the NIH, NCRR as a supplement to the SSRL Synchrotron Radiation Structural Biology Resource (P41-RR-01209). The SSRL Structural Molecular Biology program is funded by DOE BER, NIH NCRR, and NIH NIGMS.