Download presentation
Presentation is loading. Please wait.
Published byFerdinand Bailey Modified over 8 years ago
1
Globus Toolkit 4: Current Status and Futures Stuart Martin smartin@mcs.anl.gov Argonne National Lab
2
2 Globus is Service-Oriented Infrastructure Technology l Software for service-oriented infrastructure –Service enable new & existing resources –E.g., GRAM on computer, GridFTP on storage system, custom application service –Uniform abstractions & mechanisms l Tools to build applications that exploit service- oriented infrastructure –Registries, security, data management, … l Open source & open standards –Each empowers the other –eg – monitoring across different protocols is hard l Enabler of a rich tool & service ecosystem
3
3 Globus Toolkit V4.0 l Major release on April 29 th 2005 l Precious fifteen months spent on design, development, and testing –1.8M lines of code –Major contributions from five institutions –Hundreds of millions of service calls executed over weeks of continuous operation l Significant improvements over GT3 code base in all dimensions
4
4 Our Goals for GT4 l Usability, reliability, scalability, … –Web service components have quality equal or superior to pre-WS components –Documentation at acceptable quality level l Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform –WS-I Basic (Security) Profile compliant l New components, platforms, languages –And links to larger Globus ecosystem
5
5
6
GT4 Documentation is Much Improved!
7
7 GRAM l Common WS interface to schedulers –Unix, Condor, LSF, PBS, SGE, … l More generally: interface for process execution management –Lay down execution environment –Stage data –Monitor & manage lifecycle –Kill it, clean up l A basis for application-driven provisioning
8
8 GT4 WS GRAM l 2nd-generation WS implementation – optimized for performance, stability, scalability l Streamlined critical path –Use only what you need l Flexible credential management –Credential cache & delegation service l GridFTP & RFT used for data operations –Data staging & streaming output –Eliminates redundant GASS code l Single and multi-job support
9
9 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events
10
10 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: Made available to the user application
11
11 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with RFT
12
12 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with GridFTP
13
13 Our Performance Goals “GRAM should add little to no overhead compared to an underlying batch system” –Submit as many jobs to GRAM as is possible to the underlying scheduler >Goal - 10,000 jobs to a batch scheduler >Goal – efficiently fill the process table for fork scheduler –Submit/process jobs as fast to GRAM as is possible to the underlying scheduler >Goal - 1 per second l Sofar, so good
14
14 Design Decisions Design Decisions l Efforts and features towards the goal –Allow job brokers the freedom to optimize >E.g. Condor-G is smarter than globusrun >Protocol steps made optional and shareable –Reduced cost for GRAM service on host >Single WSRF host environment >Better job status monitoring mechanisms l Scheduler Event Generator (SEG) –More scalable/reliable file handling >GridFTP and RFT instead of globus-url-copy >Removal of non-scalable GASS caching l The plan is working –GT4 WS GRAM performs much better than GT3
15
15 4.0 Performance l Throughput –Test: Simple job to fork scheduler (/bin/date); no staging, streaming, or cleanup –~77 jobs/min sustained –~60 jobs/minute with delegation l Long Running test –Ran 500,000+ sequential jobs over 23 days –These included staging, delegation, fork job manager
16
16 4.0 Performance (2) l Concurrency –Job submits to Condor scheduler (long running sleep job); no staging, streaming, or cleanup; no delegation –Current limit is 32,000 jobs due to a Linux directory limit >using multiple sub-directories will resolve this, look for this in 4.2
17
17 Condor-G l Job submission to WS GRAM l Provides credential management to GT delegation service l 1000+ job workflow runs performed –Thanks to Jens Voeckler, Gaurang Mehta, and Jaime Frey! l Still some kinks –Refreshing delegated cred too often –Occasional client-side job delay of > 5 minutes
18
18 Command line programs l globusrun-ws –Submit a single or multi job –Delegate, stream stdout of requested l globus-credential-delegate –Delegate a credential to a remote GT container –Same cred can be used for many GRAM or RFT jobs l wsrf-destroy –Remove/destroy a credential l wsrf-query –Query for compute resource information l globus-job-run-ws (coming soon) –Submit simple jobs without writing an XML JDD
19
19 Short Term Priorities: WS GRAM l Make WS GRAM a “Reliable” service (4.0.x) –Additional controls to limit resource consumption –Out Of Memory (OOM) is not allowed! l Continue to improve performance l WS GRAM version of globus-job-run/submit (4.0.x) l Improved information collection for jobs (4.2) –Nodes allocated by scheduler –Scheduler job ID –Rusage type info l Implement GGF JSDL once finalized
20
20 GridFTP in GT4 l 100% Globus code –No licensing issues –Stable, extensible l IPv6 Support l XIO for different transports l Striping multi-Gb/sec wide area transport l Pluggable –Front-end: e.g., future WS control channel –Back-end: e.g., HPSS, cluster file systems –Transfer: e.g., UDP, NetBLT transport
21
21 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery
22
22 RFT Performance Stats l Current maximum request size is approx 20,000 entries with a default 64MB heap size. l Infinite transfer - LAN –~120,000 transfers (servers were killed by mistake) –Was a good test. Found a corner case where postgres was not able to perform ~ 3 update queries / sec and was using up CPU l Infinite transfer – WAN – ~67000 transfers (killed because of the same reason as above) l Sloan Digital Sky Survey DR3 archive move –900+K files, 6 TB –Killed the transfer several times for recoverability testing –No human intervention has been required to date
23
23 Short-Term Priorities: Data Management l Concurrency in globus-url-copy l Priorities in RFT l Data replication service l Enhance policy support in data services l Physical file name creation service l Scalable & distributed metadata manager l OGSA-DAI will become a core component
24
24 GT4 Container GT4 Monitoring & Discovery GRAMUser Index GT4 Cont. RFT Index GT4 Container Index GridFTP adapter Registration & WSRF/WSN Access Custom protocols for non-WSRF entities Clients (e.g., WebMDS) Automated registration in container WS-ServiceGroup
25
25 MDS4 Extensibility l Aggregator framework provides –Registration management –Collection of information from Grid Resources –Plug in interface for data access, collection,query, … l WebMDS framework provides for customized display –XSLT transformations
26
26 With a standard deployment, a project can… l Discover needed data from services in order to make job submission or replica selection decisions by querying the VO-wide Index l Evaluate the status of Grid services by looking at the VO-wide WebMDS setup l Be notified when disks are full or other error conditions happen by being on the list of administrators l Individual projects can examine the state of the resources and services of interest to them
27
27 Short-Term Priorities: Information Services l Many more information sources, including gateways to other systems l Automated configuration of monitoring l Specialized monitoring displays l Performance optimization of registry l Archiver service l Helper tools to streamline integration of new information sources
28
28 2005 and Beyond l We have a solid Web services base l We now want to build, on that base, a open source service-oriented infrastructure –Virtualization –New services for provisioning, data management, security, VO management –End-user tools for application development –Etc., etc.
29
29 Next Step Plans l Support! l Actively working with user groups to make sure their deployments are stable l Move everyone from GT2 and GT3 to GT4 l Continue to improve documentation –Goal: every support question gets put into the docs
30
30 THANKS! Questions? Stuart Martin smartin@mcs.anl.gov Argonne National Lab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.