Globus Toolkit 4: Current Status and Futures Stuart Martin Argonne National Lab.

Slides:



Advertisements
Similar presentations
1 Reliable File Transfer Service Ravi K Madduri Argonne National Laboratory, University of Chicago.
Advertisements

CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
GridFTP: File Transfer Protocol in Grid Computing Networks
A Computation Management Agent for Multi-Institutional Grids
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
Introduction to Globus Toolkit 4
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
1 Globus Developments Malcolm Atkinson for OMII SC 18 th January 2005.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Core Grid Functions: A Minimal Architecture for Grids William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (www-itg.lbl.gov/~wej)
Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Low Level Grid Services (Job Management, Data.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid Services Overview & Introduction Ian Foster Argonne National Laboratory University of Chicago Univa Corporation OOSTech, Baltimore, October 26, 2005.
London e-Science Centre GridSAM Job Submission and Monitoring Web Service William Lee, Stephen McGough.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
What do we mean by the Grid and e-research? An overview of some key aspects and technologies in 30 minutes Jennifer M. Schopf UK National eScience Centre.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Application Hosting Services — Enabling Science 2.0 —
DataGrid is a project funded by the European Commission EDG Conference, Heidelberg, Sep 26 – Oct under contract IST OGSI and GT3 Initial.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
The Holmes Platform and Applications
Workload Management Workpackage
Dynamic Deployment of VO Specific Condor Scheduler using GT4
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Data Bridge Solving diverse data access in scientific applications
Peter Kacsuk – Sipos Gergely MTA SZTAKI
GWE Core Grid Wizard Enterprise (
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Core Grid Functions: A Minimal Architecture for Grids
Core Grid Functions: A Minimal Architecture for Grids
Wide Area Workload Management Work Package DATAGRID project
Grid Computing Software Interface
Condor-G: An Update.
Presentation transcript:

Globus Toolkit 4: Current Status and Futures Stuart Martin Argonne National Lab

2 Globus is Service-Oriented Infrastructure Technology l Software for service-oriented infrastructure –Service enable new & existing resources –E.g., GRAM on computer, GridFTP on storage system, custom application service –Uniform abstractions & mechanisms l Tools to build applications that exploit service- oriented infrastructure –Registries, security, data management, … l Open source & open standards –Each empowers the other –eg – monitoring across different protocols is hard l Enabler of a rich tool & service ecosystem

3 Globus Toolkit V4.0 l Major release on April 29 th 2005 l Precious fifteen months spent on design, development, and testing –1.8M lines of code –Major contributions from five institutions –Hundreds of millions of service calls executed over weeks of continuous operation l Significant improvements over GT3 code base in all dimensions

4 Our Goals for GT4 l Usability, reliability, scalability, … –Web service components have quality equal or superior to pre-WS components –Documentation at acceptable quality level l Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform –WS-I Basic (Security) Profile compliant l New components, platforms, languages –And links to larger Globus ecosystem

5

GT4 Documentation is Much Improved!

7 GRAM l Common WS interface to schedulers –Unix, Condor, LSF, PBS, SGE, … l More generally: interface for process execution management –Lay down execution environment –Stage data –Monitor & manage lifecycle –Kill it, clean up l A basis for application-driven provisioning

8 GT4 WS GRAM l 2nd-generation WS implementation – optimized for performance, stability, scalability l Streamlined critical path –Use only what you need l Flexible credential management –Credential cache & delegation service l GridFTP & RFT used for data operations –Data staging & streaming output –Eliminates redundant GASS code l Single and multi-job support

9 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events

10 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: Made available to the user application

11 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with RFT

12 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with GridFTP

13 Our Performance Goals “GRAM should add little to no overhead compared to an underlying batch system” –Submit as many jobs to GRAM as is possible to the underlying scheduler >Goal - 10,000 jobs to a batch scheduler >Goal – efficiently fill the process table for fork scheduler –Submit/process jobs as fast to GRAM as is possible to the underlying scheduler >Goal - 1 per second l Sofar, so good

14 Design Decisions Design Decisions l Efforts and features towards the goal –Allow job brokers the freedom to optimize >E.g. Condor-G is smarter than globusrun >Protocol steps made optional and shareable –Reduced cost for GRAM service on host >Single WSRF host environment >Better job status monitoring mechanisms l Scheduler Event Generator (SEG) –More scalable/reliable file handling >GridFTP and RFT instead of globus-url-copy >Removal of non-scalable GASS caching l The plan is working –GT4 WS GRAM performs much better than GT3

Performance l Throughput –Test: Simple job to fork scheduler (/bin/date); no staging, streaming, or cleanup –~77 jobs/min sustained –~60 jobs/minute with delegation l Long Running test –Ran 500,000+ sequential jobs over 23 days –These included staging, delegation, fork job manager

Performance (2) l Concurrency –Job submits to Condor scheduler (long running sleep job); no staging, streaming, or cleanup; no delegation –Current limit is 32,000 jobs due to a Linux directory limit >using multiple sub-directories will resolve this, look for this in 4.2

17 Condor-G l Job submission to WS GRAM l Provides credential management to GT delegation service l job workflow runs performed –Thanks to Jens Voeckler, Gaurang Mehta, and Jaime Frey! l Still some kinks –Refreshing delegated cred too often –Occasional client-side job delay of > 5 minutes

18 Command line programs l globusrun-ws –Submit a single or multi job –Delegate, stream stdout of requested l globus-credential-delegate –Delegate a credential to a remote GT container –Same cred can be used for many GRAM or RFT jobs l wsrf-destroy –Remove/destroy a credential l wsrf-query –Query for compute resource information l globus-job-run-ws (coming soon) –Submit simple jobs without writing an XML JDD

19 Short Term Priorities: WS GRAM l Make WS GRAM a “Reliable” service (4.0.x) –Additional controls to limit resource consumption –Out Of Memory (OOM) is not allowed! l Continue to improve performance l WS GRAM version of globus-job-run/submit (4.0.x) l Improved information collection for jobs (4.2) –Nodes allocated by scheduler –Scheduler job ID –Rusage type info l Implement GGF JSDL once finalized

20 GridFTP in GT4 l 100% Globus code –No licensing issues –Stable, extensible l IPv6 Support l XIO for different transports l Striping  multi-Gb/sec wide area transport l Pluggable –Front-end: e.g., future WS control channel –Back-end: e.g., HPSS, cluster file systems –Transfer: e.g., UDP, NetBLT transport

21 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery

22 RFT Performance Stats l Current maximum request size is approx 20,000 entries with a default 64MB heap size. l Infinite transfer - LAN –~120,000 transfers (servers were killed by mistake) –Was a good test. Found a corner case where postgres was not able to perform ~ 3 update queries / sec and was using up CPU l Infinite transfer – WAN – ~67000 transfers (killed because of the same reason as above) l Sloan Digital Sky Survey DR3 archive move –900+K files, 6 TB –Killed the transfer several times for recoverability testing –No human intervention has been required to date

23 Short-Term Priorities: Data Management l Concurrency in globus-url-copy l Priorities in RFT l Data replication service l Enhance policy support in data services l Physical file name creation service l Scalable & distributed metadata manager l OGSA-DAI will become a core component

24 GT4 Container GT4 Monitoring & Discovery GRAMUser Index GT4 Cont. RFT Index GT4 Container Index GridFTP adapter Registration & WSRF/WSN Access Custom protocols for non-WSRF entities Clients (e.g., WebMDS) Automated registration in container WS-ServiceGroup

25 MDS4 Extensibility l Aggregator framework provides –Registration management –Collection of information from Grid Resources –Plug in interface for data access, collection,query, … l WebMDS framework provides for customized display –XSLT transformations

26 With a standard deployment, a project can… l Discover needed data from services in order to make job submission or replica selection decisions by querying the VO-wide Index l Evaluate the status of Grid services by looking at the VO-wide WebMDS setup l Be notified when disks are full or other error conditions happen by being on the list of administrators l Individual projects can examine the state of the resources and services of interest to them

27 Short-Term Priorities: Information Services l Many more information sources, including gateways to other systems l Automated configuration of monitoring l Specialized monitoring displays l Performance optimization of registry l Archiver service l Helper tools to streamline integration of new information sources

and Beyond l We have a solid Web services base l We now want to build, on that base, a open source service-oriented infrastructure –Virtualization –New services for provisioning, data management, security, VO management –End-user tools for application development –Etc., etc.

29 Next Step Plans l Support! l Actively working with user groups to make sure their deployments are stable l Move everyone from GT2 and GT3 to GT4 l Continue to improve documentation –Goal: every support question gets put into the docs

30 THANKS! Questions? Stuart Martin Argonne National Lab