CEDPS Data Services Ann Chervenak USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
High Performance Computing Course Notes Grid Computing.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
GridFTP: File Transfer Protocol in Grid Computing Networks
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.
Minerva Infrastructure Meeting – October 04, 2011.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
GridFTP Guy Warner, NeSC Training.
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Globus Data Services for Science Raj Kettimuthu Argonne National Laboratory/Univ. of Chicago Ann Chervenak, Rob Schuler USC Information Sciences Institute.
Moving Large Amounts of Data Rob Schuler University of Southern California.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Managed Object Placement Service John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Argonne National Lab.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
University of Technology
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Presentation transcript:

CEDPS Data Services Ann Chervenak USC Information Sciences Institute

2 Goals of CEDPS Data Area Assist DOE applications with petascale data management requirements Includes assisting with evaluation and deployment of existing services Globus GridFTP for secure, efficient data transfer Replica Location Service for data registration and discovery Data Replication Service Condor NeST, etc. Development of new functionality Improvements to GridFTP for better resource management Policy-driven data placement services

3 New Data Services in CEDPS Develop tools and techniques for reliable, high- performance, secure, and policy-driven placement of data within a distributed science environment Managed Object Placement Service — enhancement to today’s GridFTP—that allows for management of: Space Bandwidth Connections Other resources needed to endpoints of data transfers Data placement and distribution services that implement different data distribution and placement behaviors

4 Extending GridFTP: The Managed Object Placement Service (MOPS) Functionality that will be added Adding Resource management to GridFTP Memory usage limitation Enforce appropriate storage usage Enforce appropriate bandwidth usage Eliminates the potential to consume too many system resources Bandwidth and storage reservation Transfer scheduling

5 MOPS Released under the CEDPS project MOPS 1.0 is available at Includes: Optimization for lots of small files transfer Globus fork (Gfork) - inetd like service that allows state to be maintained across connections Gfork plugin for GridFTP - allows for dynamic addition/removal of data movers, limit memory usage Lotman - manage storage GridFTP plugin to enforce storage usage policies using lotman

6 GridFTP - New Features GridFTP over UDT Users can substitute UDT for TCP UDT provides a reliable layer on top of UDP 4-5 times performance improvement over TCP GridFTP over SSH Globus-url-copy (GridFTP client) uses the standard ssh program to remotely start GridFTP server as user stdin/out becomes the control channel No data channel authentication GridFTP Where there’s FTP (GWFTP) A proxy server that allows use of any FTP client to transfer data to/from GridFTP server GFork An inetd like service and allows sharing of state between sessions

7 Data Placement Services: Motivation Scientific applications often perform complex computational analyses that consume and produce large data sets Computational and storage resources distributed in the wide area The placement of data onto storage systems can have a significant impact on performance of applications reliability and availability of data sets We want to identify data placement policies that distribute data sets so that they can be staged into or out of computations efficiently replicated to improve performance and reliability

8 Layered Data Placement Architecture Decide where to place objects and replicas in the distributed Grid environment Policy-driven, based on needs of application and the Virtual Organization Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution

9 Higher-Level Data Placement Services Recently released first generation of data placement service Seeking application input on requirements for placement services they need “ Data Placement for Scientific Applications in Distributed Environments, ” Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi, in Proceedings of Grid 2007 Conference, Austin, TX, September 2007.

10 Summary of CEDPS Data Services Goal is to assist DOE applications with petascale data management requirements Help applications evaluate and deploy existing services (GridFTP, RLS, etc.) New development to meet additional application requirements Improvements to GridFTP for better resource management Policy-driven data placement services Actively seeking DOE applications to use services and help define requirements