Data Grids Darshan R. Kapadia Gregor von Laszewski

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Amazon Web Services and Eucalyptus
High Performance Computing Course Notes Grid Computing.
GridFTP: File Transfer Protocol in Grid Computing Networks
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
1 GRID D. Royo, O. Ardaiz, L. Díaz de Cerio, R. Meseguer, A. Gallardo, K. Sanjeevan Computer Architecture Department Universitat Politècnica de Catalunya.
Simo Niskala Teemu Pasanen
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
Grid Computing. What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 –“A computational grid is a hardware.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
DISTRIBUTED COMPUTING
1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
The Globus Project: A Status Report Ian Foster Carl Kesselman
The Anatomy of the Grid Mahdi Hamzeh Fall 2005 Class Presentation for the Parallel Processing Course. All figures and data are copyrights of their respective.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
7. Grid Computing Systems and Resource Management
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Grid and Cloud Computing
The Data Grid: Towards an architecture for Distributed Management
Globus —— Toolkits for Grid Computing
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Milestone 2 Include the names of the papers
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Data Grids Darshan R. Kapadia Gregor von Laszewski

Grids We’ve seen computational grids – collections of computing clusters and protocols/software in order to submit jobs, distribute work, schedule jobs, monitor status, etc. But how do we manage collections of data on a grid – not just the computations / programs themselves?

Data GRID 1.Lothar A T Bauerdick (2003). Grid Tools and the LHC Data Challenges. LHC Symposium. May 3, 2003.

Why data grids? The immense computational demands of many scientific applications are often coupled with massive amounts of data. These data sets must be shared by a virtual organization (or multiple VOs) for a variety of computations Distributing jobs to diverse geographic computing resources also requires distributing data collections for processing and storing output.

Data Grid Challenges Storage capacity for massive quantities of data Distribute data sets to disperse geographic locations to complete jobs in a grid Maximize computation to communication ratio Aggregation of results, data coherency – Who has “the” copy of the data set Need to do all of this securely and robustly

Functions of Data GRID Data Access – How do we access and manage data? Storage Resource Brokers UNIX File Systems, Distributed File Systems, HTTP servers, etc – How do we transfer data? Metadata Access – Data about data! Replica Management – Create/delete copies of data – Replica “catalogs” Replica Selection – Locating the best data replica to use for an application – Determine subset of data required for a job

Earth System GRID The Earth System Grid (ESG) integrates supercomputers with large-scale data and analysis servers located at numerous national labs and research centers to create a powerful environment for next generation climate research. Participating Organization – Argonne National Laboratory – Lawrence Berkeley National Laboratory – Lawrence Livermore National Laboratory – Los Alamos National Laboratory – National Center for Atmospheric Research – Oak Ridge National Laboratory – University of Southern California/Information Sciences Institute

High Energy Physics Application B. Allcock J. Bester, B. C. F. K. M. N. Q. T. J.. A. L.. I.. C.. S.. V.. D.. S. (2002). Data Management and Transfer in High Performance Computational Grid Environments. Parallel Computing Journal, 28(5),

Data GRID Architecture 1.Chervenak, A., Deelman, E., Kesselman, C., Allcock, B., Foster, I., & Nefedova, V., et al. (2003). High-performance remote access to climate simulation data: a challenge problem for data grid technologies. Parallel Comput., 29(10),

Data Grid Design Mechanism Neutrality Policy Neutrality Compatibility with Grid Infrastructure Uniformity of Information Infrastructure

Core Data GRID services Storage System and Data Access – Data Abstraction: Storage System – Data Access Metadata Services

High Level Data Grid Components Replica Management Replica Selection and Data Filtering

GASS Globus Access to Secondary Storage [5] – NOT a distributed file system – Unix (C-style) fopen/fclose – Default behavior is to transfer entire file from remote site into a local cache when file is opened – GASS also provides finer-tuned control. Pre-stage/Post-stage file accesses Cache management – No cache coherency (changes made to remote file do not get propagated to caches)

Contd.. Commands globus_gass_fopen globus_gass_fclose File names are URLs

GridFTP GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth, wide-area networks. Based on FTP (RFC-959) Extended for higher-performance, flexibility, and robustness – Parallel data sources, parallel transfers – Partial file transfers – Transfer restart capabilities

GridFTP Can Use GSI for security. TeraGrid has three clients which utilize GridFTP – UberFTP(recommended) – Globus-url-copy(preferred for scripting) – tgcp (deprecated)

Amazon Simple Storage Service (Amazon S3™) Amazon S3 is storage for the Internet. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.

AWS S3 Functionalities Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited. Each object is stored in a bucket and retrieved via a unique, developer-assigned key. Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users. Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.

Replica Management A Taxonomy of Data Grids for Distributed Data Sharing, Management, and Processing KUMAR VENUGOPAL, RAJKUMAR BUYYA, AND KOTAGIRI RAMAMOHANARAO

Conclusion Data Grid involves maintenance of large amount of data, So it is unique in terms of its architecture. Data Grid are very important for the future as large amount of data will be required for future applications.

References Chervenak, A., Deelman, E., Kesselman, C., Allcock, B., Foster, I., & Nefedova, V., et al. (2003). High-performance remote access to climate simulation data: a challenge problem for data grid technologies. Parallel Comput., 29(10), Allcock, W., Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2001). The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications, 23, Bester, J., Foster, I., Kesselman, C., Tedesco, J., & Tuecke, S. (1999). GASS: A Data Movement and Access Service for Wide Area Computing Systems. Paper presented at the Proceedings of IOPADS'99. 5.B. Allcock J. Bester, B. C. F. K. M. N. Q. T. J.. A. L.. I.. C.. S.. V.. D.. S. (2002). Data Management and Transfer in High Performance Computational Grid Environments. Parallel Computing Journal, 28(5),