The Globus Toolkit™: and its application to GryPhyN Carl Kesselman Director of the Center for Grid Technologies Information Sciences Institute University.

Slides:



Advertisements
Similar presentations
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
High Performance Computing Course Notes Grid Computing.
This product includes material developed by the Globus Project ( Introduction to Grid Services and GT3.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
GriPhyN & iVDGL Architectural Issues GGF5 BOF Data Intensive Applications Common Architectural Issues and Drivers Edinburgh, 23 July 2002 Mike Wilde Argonne.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Distributed Heterogeneous Data Warehouse For Grid Analysis
Globus Toolkit Futures: An Open Grid Services Architecture Ian Foster Carl Kesselman Jeffrey Nick Steven Tuecke Globus Tutorial, Argonne National Laboratory,
Grid Computing & Web Services: A Natural Partnership Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
OGSA : Open Grid Services Architecture Ramya Rajagopalan
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The Challenges of Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
An Introduction to The Grid Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory Oak Park River Forest High School
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
DISTRIBUTED COMPUTING
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,
OGC Meeting Grid Services Overview Keith R. Jackson Distributed Systems Department Lawrence Berkeley National Lab.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
The Anatomy of the Grid: An Integrated View of Grid Architecture Ian Foster, Steve Tuecke Argonne National Laboratory The University of Chicago Carl Kesselman.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
The Anatomy of the Grid Introduction The Nature of Grid Architecture Grid Architecture Description Grid Architecture in Practice Relationships with Other.
전산학과 이재승 The Physiology of the GRID I. Foster, C. Kesselman, J. Nick, and S. Tuecke Open Grid Service Infrastructure.
OGSA Hauptseminar: Data Grid Thema 2: Open Grid Service Architecture
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Services I - Concepts
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Transition and Evolution Moving to Grid Services.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Prof S.Ramachandram Dept of CSE,UCE Osmania University
The Grid Enabling Resource Sharing within Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.
7. Grid Computing Systems and Resource Management
On Using BPEL Extensibility to Implement OGSI and WSRF Grid Workflows Aleksander Slomiski Presented by Onyeka Ezenwoye CIS Advanced Topics in Software.
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 Service oriented computing Gergely Sipos, Péter Kacsuk
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
CMS LNL OGSA INFRASTRUCTURE EVALUATION Specification & Applications Giuliano Rorato INFN – Laboratori Nazionali di Legnaro.
An approach to Web services Management in OGSA environment By Shobhana Kirtane.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
DataGrid is a project funded by the European Commission EDG Conference, Heidelberg, Sep 26 – Oct under contract IST OGSI and GT3 Initial.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Grid Computing B.Ramamurthy 9/22/2018 B.Ramamurthy.
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Introduction to Grid Technology
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

The Globus Toolkit™: and its application to GryPhyN Carl Kesselman Director of the Center for Grid Technologies Information Sciences Institute University of Southern California

June 2, 2015EO Grid Workshop2 Outline l Overview of the Globus toolkit l Application of Globus to virtual data problem (GriPhyN) l Open Grid Services Architecture

June 2, 2015EO Grid Workshop3 Partial Acknowledgements l Open Grid Services Architecture design -Karl USC/ISI -Ian Foster, Steve -Jeff Nick, Steve Graham, Jeff IBM l Grid services collaborators at ANL -Kate Keahey, Gregor von Laszewski -Thomas Sandholm, Jarek Gawor, John Bresnahan l Globus Toolkit R&D also involves many fine scientists & engineers at ANL, USC/ISI, and elsewhere (see l Strong links with many EU, UK, US Grid projects l Support from DOE, NASA, NSF, Microsoft

June 2, 2015EO Grid Workshop4 The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

June 2, 2015EO Grid Workshop5 Grid Computing Concept l New applications enabled by the coordinated use of geographically distributed resources -E.g., distributed collaboration, data access and analysis, distributed computing l Persistent infrastructure for Grid computing -E.g., certificate authorities and policies, protocols for resource discovery/access l Original motivation, and support, from high-end science and engineering; but has wide-ranging applicability

June 2, 2015EO Grid Workshop6 Grids: Why Now? l Moore’s law Þ highly functional end-systems l Ubiquitous Internet Þ universal connectivity l Network exponentials produce dramatic changes in geometry and geography -9-month doubling: double Moore’s law! : x340,000; : x4000? l New modes of working and problem solving emphasize teamwork, computation l New business models and technologies facilitate outsourcing

June 2, 2015EO Grid Workshop7 The Grid World: Current Status l Dozens of major Grid projects in scientific & technical computing/research & education -Deployment, application, technology l Considerable consensus on key concepts and technologies -Open source Globus Toolkit™ a de facto standard for major protocols & services -Far from complete or perfect, but out there, evolving rapidly, and large tool/user base l Global Grid Forum a significant force l Industrial interest emerging rapidly

June 2, 2015EO Grid Workshop8 Layered Grid Architecture (By Analogy to Internet Architecture) Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture

June 2, 2015EO Grid Workshop9 Globus Toolkit l Globus Toolkit is the source of many of the protocols described in “Grid architecture” l Adopted by almost all major Grid projects worldwide as a source of infrastructure l Open source, open architecture framework encourages community development l Active R&D program continues to move technology forward l Developers at ANL, USC/ISI, NCSA, LBNL, and other institutions

June 2, 2015EO Grid Workshop10 Globus Toolkit Components Include … l Core protocols and services –Grid Security Infrastructure –Grid Resource Access & Management –MDS information & monitoring –GridFTP data access & transfer l Other services –Community Authorization Service –DUROC co-allocation service l Other Data Grid technologies –Replica catalog, replica management service

June 2, 2015EO Grid Workshop11 Globus Toolkit Structure GRAMMDS GSI GridFTPMDS GSI ??? GSI Reliable invocation Soft state management Notification Compute Resource Data Resource Other Service or Application Job manager Job manager Service naming

June 2, 2015EO Grid Workshop12 User process #1 Proxy Authenticate & create proxy credential GSI (Grid Security Infrastruc- ture) Gatekeeper (factory) Reliable remote invocation GRAM (Grid Resource Allocation & Management) Reporter (registry + discovery) User process #2 Proxy #2 Create process Register The Globus Toolkit in One Slide l Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services) l Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, … Other service (e.g. GridFTP) Other GSI- authenticated remote service requests GIIS: Grid Information Index Server (discovery) MDS-2 (Meta Directory Service) Soft state registration; enquiry

June 2, 2015EO Grid Workshop13 GriPhyN Project Goals l Amplify science productivity through the Grid -Provide powerful abstractions for scientists: datasets and transformations, not files and programs -Using a grid is harder than using a workstation. GriPhyN seeks to reverse this situation! l These goals challenge the boundaries of computer science in knowledge representation and distributed computing. l Apply these advances to major experiments -Not just developing solutions, but proving them through deployment

June 2, 2015EO Grid Workshop14 GriPhyN Approach l Virtual Data -Tracking the derivation of experiment data with high fidelity -Transparency with respect to location and materialization l Automated grid request planning -Advanced, policy driven scheduling l Achieve this at peta-scale magnitude l We present here a vision that is still 3 years away, but the foundation is starting to come together

June 2, 2015EO Grid Workshop15 Virtual Data l Track all data assets l Accurately record how they were derived l Encapsulate the transformations that produce new data objects l Interact with the grid in terms of requests for data derivations

June 2, 2015EO Grid Workshop16 Request Automation l Request Planning and Execution l High performance -Grid resources are used in efficient ways for high throughput and/or fast response l Based on policy -Policy specifies how resources should be used and how workloads should be treated l Fault tolerant -It’s a grid – so failures are normal l Transparent to the user -Make the grid like a workstation

June 2, 2015EO Grid Workshop17 NCSA Linux cluster 5) Secondary reports complete to master Master Condor job running at Caltech 7) GridFTP fetches data from UniTree NCSA UniTree - GridFTP- enabled FTP server 4) 100 data files transferred via GridFTP, ~ 1 GB each Secondary Condor job on WI pool 3) 100 Monte Carlo jobs on Wisconsin Condor pool 2) Launch secondary job on WI pool; input files via Globus GASS Caltech workstation 6) Master starts reconstruction jobs via Globus jobmanager on cluster 8) Processed objectivity database stored to UniTree 9) Reconstruction job reports complete to master GriPhyN Challenge Problem: CMS Event Reconstruction Work of: Scott Koranda, Miron Livny, Vladimir Litvin, & others

June 2, 2015EO Grid Workshop18 Why is this useful? l Easier to FIND the data -A disciplined approch to tracking massive amounts of data l Can PRODUCE and analyze data easier -Automate details of data production l Can VALIDATE scientific results accurately l Can SHARE data easier l Can produce and analyze MORE data FASTER -Leverage huge storage and computing resources

June 2, 2015EO Grid Workshop19 Why is this hard? l Data derivation tracking -Diversity of transformations -Achieving fidelity of reproduction -Many modes of data storage l Automated request planning -Multiple levels of resource sharing and allocation policy -Faults are the norm in large grids -Resources are constantly in flux -An OS the size of the planet! l Peta-Scale performance level

June 2, 2015EO Grid Workshop20 The Virtual Data Model l Data suppliers publish data to the Grid l Users request raw or derived data from Grid, without needing to know -Where data is located -Whether data is stored or computed on demand l User and applications can easily determine -What it will cost to obtain data -Quality of derived data l Virtual Data Grid serves requests efficiently, subject to global and local policy constraints

June 2, 2015EO Grid Workshop21 GriPhyN: Virtual Data Tracking Complex Dependencies l Dependency graph is: - Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2 - Programs: 8 < psearch, 7 < summarize, (3,4,5) < reformat, 6 < conv, (1,2) < simulate simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

June 2, 2015EO Grid Workshop22 Re-creating Virtual Data l To recreate file 8: Step 1 - simulate > file1, file2 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

June 2, 2015EO Grid Workshop23 Re-creating Virtual Data l To re-create file8: Step 2 - files 3, 4, 5, 6 derived from file 2 - reformat > file3, file4, file5 - conv > file 6 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

June 2, 2015EO Grid Workshop24 Re-creating Virtual Data l To re-create file 8: step 3 - File 7 depends on file 6 - Summarize > file 7 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

June 2, 2015EO Grid Workshop25 Re-creating Virtual Data l To re-create file 8: final step - File 8 depends on files 1, 3, 4, 5, 7 - psearch file 8 simulate – t 10 … file1 file2 psearch – t 10 … reformat – f fz … conv – I esd – o aod file1 File3,4,5 file6 summarize – t 10 … file7 file8 Requested file

June 2, 2015EO Grid Workshop26 GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG (concrete) DAG (abstract) DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus

June 2, 2015EO Grid Workshop27 (evolving) View of Data Grid Stack Data Transport (GridFTP) Storage Element Local Repl Catalog (Flat or Hierarchical) Reliable File Transfer Replica Location Service Publish-Subscribe Service (GDMP) Storage Element Manager Reliable Replication

June 2, 2015EO Grid Workshop28 Initial GriPhyN Virtual Data Implementation Virtual Data Catalog (PostgreSQL) Local File Storage Virtual Data Language VDL Interpreter (VDLI) GSI Job Execution Site U of Chicago GridFTP Client Globus GRAM CondorPool Job Execution Site U of Wisconsin GridFTP Client Globus GRAM CondorPool Job Execution Site U of Florida GridFTP Client Globus GRAM CondorPool Job Sumission Sites ANL, SC, … Condor-G Agent Globus Client GridFTP Server Grid testbed Simulate Physics Simulate CMS Detector Response Copy flat-file to OODBMS Simulate Digitization of Electronic Signals Production DAG of Simulated CMS Data: Architecture of the System:

June 2, 2015EO Grid Workshop29 Virtual Data Catalog Conceptual Data Structure TRANSFORMATION /bin/physapp1 version 1.2.3b(2) created on 12 Oct 1998 owned by physbld.orca DERIVATION ^ paramlist ^ transformation FILE LFN=filename1 PFN1=/store1/ PFN2=/store9/ PFN3=/store4/ ^derivation FILE LFN=filename2 PFN1=/store1/ PFN2=/store9/ ^derivation PARAMETER LIST PARAMETER i filename1 PARAMETER O filename2 PARAMETER E PTYPE=muon PARAMETER p -g

June 2, 2015EO Grid Workshop30 Planner Decision Making l Planner considers: -Policy (fairly static, from CAS/SAS) -Grid resource status: state, load -Job (user/group) resource consumption history -Job profiles (resources over time) from Prophesy planner policy Accounting Records Status Job Usage info Job Profile Records Prohphesy (predictor) Job Profiling Data

June 2, 2015EO Grid Workshop31 Executor Example: Condor DAGMan l Directed Acyclic Graph Manager l Specify the dependencies between Condor jobs using DAG data structure l Manage dependencies automatically -(e.g., “Don’t run job “B” until job “A” has completed successfully.”) l Each job is a “node” in DAG l Any number of parent or children nodes l No loops Job A Job BJob C Job D Slide courtesy Miron Livny, U. Wisconsin

June 2, 2015EO Grid Workshop32 Executor Example: Condor DAGMan (Cont.) l DAGMan acts as a “meta-scheduler” -holds & submits jobs to the Condor queue at the appropriate times based on DAG dependencies l If a job fails, DAGMan continues until it can no longer make progress and then creates a “rescue” file with the current state of the DAG -When failed job is ready to be re-run, the rescue file is used to restore the prior state of the DAG DAGMan Condor Job Queue C D B C B A Slide courtesy Miron Livny, U. Wisconsin

June 2, 2015EO Grid Workshop33 l Abstract DAG -Represents user requests -Simplest case: request for one or more data product -Complex case: request execution of a chained set of applications -No file or execution locations need be present l Concrete DAG -Specifies any application invocations needed to derive data -Specifes locations of all invocations (to the site level) -Includes explicit job steps to move data DAG Usage

June 2, 2015EO Grid Workshop34 Virtual Data in CMS Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN

June 2, 2015EO Grid Workshop Data: 0.5 MB 175 MB 275 MB 105 MB SC2001 Demo Version: pythia cmsim writeHits writeDigis 1 run = 500 events 1 run 1 event CPU: 2 min 8 hours 5 min 45 min truth.ntpl hits.fz hits.DB digis.DB Production Pipeline GriphyN-CMS Demo Work of: Jens Voeckler, Rick Cavanaugh, & others

June 2, 2015EO Grid Workshop36 pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end CMS Pipeline in VDL

June 2, 2015EO Grid Workshop37 GriPhyN CMS SC2001 Demo         Full Event Database of ~100,000 large objects Full Event Database of ~40,000 large objects “Tag” database of ~140,000 small objects Request Parallel tuned GSI FTP Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics Work of: Koen Holtman, J.J. Bunn, H. Newman, & others

June 2, 2015EO Grid Workshop38 SDSS Galaxy Cluster Finding

June 2, 2015EO Grid Workshop39 Cluster-finding Data Pipeline catalog cluster 5 4 core brg field tsObj brg field tsObj 2 1 brg field tsObj 2 1 brg field tsObj 2 1 core 3

June 2, 2015EO Grid Workshop40 Cluster-finding Grid Work of: Yong Zhao, James Annis, & others

June 2, 2015EO Grid Workshop41 GriPhyN-LIGO SC2001 Demo Work of: Ewa Deelman, Gaurang Mehta, Scott Koranda, & others

June 2, 2015EO Grid Workshop42 Globus Toolkit: Evaluation (+) l Good technical solutions for key problems, e.g. -Authentication and authorization -Resource discovery and monitoring -Reliable remote service invocation -High-performance remote data access l This & good engineering is enabling progress -Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support -Growing community code base built on tools

June 2, 2015EO Grid Workshop43 Globus Toolkit: Evaluation (-) l Protocol deficiencies, e.g. -Heterogeneous basis: HTTP, LDAP, FTP -No standard means of invocation, notification, error propagation, authorization, termination, … l Significant missing functionality, e.g. -Databases, sensors, instruments, workflow, … -Virtualization of end systems (hosting envs.) l Little work on total system properties, e.g. -Dependability, end-to-end QoS, … -Reasoning about system properties

June 2, 2015EO Grid Workshop44 Globus Toolkit Structure GRAMMDS GSI GridFTPMDS GSI ??? GSI Reliable invocation Soft state management Notification Compute Resource Data Resource Other Service or Application Job manager Job manager Lots of good mechanisms, but (with the exception of GSI) not that easily incorporated into other systems Service naming

June 2, 2015EO Grid Workshop45 Open Grid Services Architecture l Service orientation to virtualize resources l Define fundamental Grid service behaviors -Core set required, others optional  A unifying framework for interoperability & establishment of total system properties l Integration with Web services and hosting environment technologies  Leverage tremendous commercial base  Standard IDL accelerates community code l Delivery via open source Globus Toolkit 3.0  Leverage GT experience, code, mindshare

June 2, 2015EO Grid Workshop46 “Web Services” l Increasingly popular standards-based framework for accessing network applications -W3C standardization; Microsoft, IBM, Sun, others l WSDL: Web Services Description Language -Interface Definition Language for Web services l SOAP: Simple Object Access Protocol -XML-based RPC protocol; common WSDL target l WS-Inspection -Conventions for locating service descriptions l UDDI: Universal Desc., Discovery, & Integration -Directory for Web services

June 2, 2015EO Grid Workshop47 Web Services Example: Database Service l WSDL definition for “DBaccess” porttype defines operations and bindings, e.g.: -Query(QueryLanguage, Query, Result) -SOAP protocol l Client C, Java, Python, etc., APIs can then be generated DBaccess

June 2, 2015EO Grid Workshop48 Transient Service Instances l “Web services” address discovery & invocation of persistent services -Interface to persistent state of entire enterprise l In Grids, must also support transient service instances, created/destroyed dynamically -Interfaces to the states of distributed activities -E.g. workflow, video conf., dist. data analysis l Significant implications for how services are managed, named, discovered, and used -In fact, much of our work is concerned with the management of service instances

June 2, 2015EO Grid Workshop49 The Grid Service = Interfaces + Service Data Service data element Service data element Service data element GridService… other interfaces … Implementation Service data access Explicit destruction Soft-state lifetime Notification Authorization Service creation Service registry Manageability Concurrency Reliable invocation Authentication Hosting environment/runtime (“C”, J2EE,.NET, …)

June 2, 2015EO Grid Workshop50 Open Grid Services Architecture: Fundamental Structure 1) WSDL conventions and extensions for describing and structuring services -Useful independent of “Grid” computing 2) Standard WSDL interfaces & behaviors for core service activities -portTypes and operations => protocols

June 2, 2015EO Grid Workshop51 WSDL Conventions & Extensions l portType (standard WSDL) -Define an interface: a set of related operations l serviceType (extensibility element) -List of port types: enables aggregation l serviceImplementation (extensibility element) -Represents actual code l service (standard WSDL) -instanceOf extension: map descr.->instance l compatibilityAssertion (extensibility element) -portType, serviceType, serviceImplementation

June 2, 2015EO Grid Workshop52 Structure of a Grid Service service PortType service Standard WSDL … … … Service Description Service Instantiation PortType serviceImplementation … = serviceType … cA cAcA compatibilityAssertion = cAcA instanceOf

June 2, 2015EO Grid Workshop53 Standard Interfaces & Behaviors: Four Interrelated Concepts l Naming and bindings -Every service instance has a unique name, from which can discover supported bindings l Information model -Service data associated with Grid service instances, operations for accessing this info l Lifecycle -Service instances created by factories -Destroyed explicitly or via soft state l Notification -Interfaces for registering interest and delivering notifications

June 2, 2015EO Grid Workshop54 l GridService Required -FindServiceData -Destroy -SetTerminationTime l NotificationSource -SubscribeToNotificationTopic -UnsubscribeToNotificationTopic l NotificationSink -DeliverNotification OGSA Interfaces and Operations Defined to Date l Factory -CreateService l PrimaryKey -FindByPrimaryKey -DestroyByPrimaryKey l Registry -RegisterService -UnregisterService l HandleMap -FindByHandle Authentication, reliability are binding properties Manageability, concurrency, etc., to be defined

June 2, 2015EO Grid Workshop55 Service Data l A Grid service instance maintains a set of service data elements -XML fragments encapsulated in standard containers -Includes basic introspection information, interface- specific data, and application data l FindServiceData operation (GridService interface) queries this information -Extensible query language support l See also notification interfaces -Allows notification of service existence and changes in service data

June 2, 2015EO Grid Workshop56 Grid Service Example: Database Service l A DBaccess Grid service will support at least two portTypes -GridService -DBaccess l Each has service data -GridService: basic introspection information, lifetime, … -DBaccess: database type, query languages supported, current load, …, … Grid Service DBaccess DB info Name, lifetime, etc.

June 2, 2015EO Grid Workshop57 Naming and Bindings l Every service instance has a unique and immutable name: Grid Service Handle (GSH) -Basically just a URL l Handle must be converted to a Grid Service Reference (GSR) to use service -Includes binding information; may expire -Separation of name from implementation facilitates service evolution l The HandleMap interface allows a client to map from a GSH to a GSR -Each service instance has home HandleMap

June 2, 2015EO Grid Workshop58 Registry l The Registry interface may be used to register Grid service instances with a registry -A set of Grid services can periodically register their GSHs into a registry service, to allow for discovery of services in that set l Registrations maintained in a service data element associated with Registry interface -Standard discovery mechanisms can then be used to discover registered services -Returns a WS-Inspection document containing the GSHs of a set of Grid services

June 2, 2015EO Grid Workshop59 Lifetime Management l GS instances created by factory or manually; destroyed explicitly or via soft state -Negotiation of initial lifetime with a factory (=service supporting Factory interface) l GridService interface supports -Destroy operation for explicit destruction -SetTerminationTime operation for keepalive l Soft state lifetime management avoids -Explicit client teardown of complex state -Resource “leaks” in hosting environments

June 2, 2015EO Grid Workshop60 Factory l Factory interface’s CreateService operation creates a new Grid service instance -Reliable creation (once-and-only-once) l CreateService operation can be extended to accept service-specific creation parameters l Returns a Grid Service Handle (GSH) -A globally unique URL -Uniquely identifies the instance for all time -Based on name of a home handleMap service

June 2, 2015EO Grid Workshop61 Transient Database Services Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DBaccess Factory Factory info Instance name, etc. Grid Service Registry Registry info Instance name, etc. Grid Service DBaccess DB info Name, lifetime, etc. “What services can you create?” “What database services exist?” “Create a database service”

June 2, 2015EO Grid Workshop62 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider “I want to create a personal database containing data on e.coli metabolism” Database Factory

June 2, 2015EO Grid Workshop63 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider “Find me a data mining service, and somewhere to store data” Database Factory

June 2, 2015EO Grid Workshop64 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider GSHs for Mining and Database factories Database Factory

June 2, 2015EO Grid Workshop65 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider “Create a data mining service with initial lifetime 10” “Create a database with initial lifetime 1000” Database Factory

June 2, 2015EO Grid Workshop66 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Miner “Create a data mining service with initial lifetime 10” “Create a database with initial lifetime 1000”

June 2, 2015EO Grid Workshop67 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Miner Query

June 2, 2015EO Grid Workshop68 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Miner Query Keepalive

June 2, 2015EO Grid Workshop69 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Miner Keepalive Results

June 2, 2015EO Grid Workshop70 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Miner Keepalive

June 2, 2015EO Grid Workshop71 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service Compute Service Provider Database Keepalive

June 2, 2015EO Grid Workshop72 Notification Interfaces l NotificationSource for client subscription -One or more notification generators >Generates notification message of a specific type >Typed interest statements: E.g., Filters, topics, … >Supports messaging services, 3 rd party filter services, … -Soft state subscription to a generator l NotificationSink for asynchronous delivery of notification messages l A wide variety of uses are possible -E.g. Dynamic discovery/registry services, monitoring, application error notification, …

June 2, 2015EO Grid Workshop73 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source Notification Sink Subscribers

June 2, 2015EO Grid Workshop74 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source “Notify me of new data about membrane proteins” Subscribers Notification Sink

June 2, 2015EO Grid Workshop75 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source Keepalive Notification Sink Subscribers

June 2, 2015EO Grid Workshop76 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service Notification Sink DB info Name, lifetime, etc. Notification Source New data Subscribers

June 2, 2015EO Grid Workshop77 Open Grid Services Architecture: Summary l Service orientation to virtualize resources -Everything is a service l From Web services -Standard interface definition mechanisms: multiple protocol bindings, local/remote transparency l From Grids -Service semantics, reliability and security models -Lifecycle management, discovery, other services l Multiple “hosting environments” -C, J2EE,.NET, …

June 2, 2015EO Grid Workshop78 Recap: The Grid Service Service data element Service data element Service data element GridService… other interfaces … Implementation Service data access Explicit destruction Soft-state lifetime Notification Authorization Service creation Service registry Manageability Concurrency Reliable invocation Authentication Hosting environment/runtime (“C”, J2EE,.NET, …)

June 2, 2015EO Grid Workshop79 OGSA and the Globus Toolkit l Technically, OGSA enables -Refactoring of protocols (GRAM, MDS-2, etc.)—while preserving all GT concepts/features! -Integration with hosting environments: simplifying components, distribution, etc. -Greatly expanded standard service set l Pragmatically, we are proceeding as follows -Develop open source OGSA implementation >Globus Toolkit 3.0; supports Globus Toolkit 2.0 APIs -Partnerships for service development -Also expect commercial value-adds

June 2, 2015EO Grid Workshop80 GT3: An Open Source OGSA- Compliant Globus Toolkit l GT3 Core -Implements Grid service interfaces & behaviors -Reference impln of evolving standard -Java first, C soon, C#? l GT3 Base Services -Evolution of current Globus Toolkit capabilities - Backward compatible l Many other Grid services GT3 Core GT3 Base Services Other Grid Services GT3 Data Services

June 2, 2015EO Grid Workshop81 Hmm, Isn’t This Just Another Object Model? l Well, yes, in a sense -Strong encapsulation -We (can) profit greatly from experiences of previous object-based systems l But -Focus on encapsulation not inheritance -Does not require OO implementations -Value lies in specific behaviors: lifetime, notification, authorization, …, … -Document-centric not type-centric

June 2, 2015EO Grid Workshop82 Grids and OGSA: Research Challenges l Grids pose profound problems, e.g. -Management of virtual organizations -Delivery of multiple qualities of service -Autonomic management of infrastructure -Software and system evolution l OGSA provides foundation for tackling these problems in a rigorous fashion? -Structured establishment/maintenance of global properties -Reasoning about total system properties

June 2, 2015EO Grid Workshop83 Summary l The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations l Globus Toolkit a source of protocol and API definitions—and reference implementations -And many projects applying Grid concepts (& Globus technologies) to important problems l Open Grid Services Architecture represents (we hope!) next step in evolution l An enabling framework for investigations of Internet-scale computing systems

June 2, 2015EO Grid Workshop84 For More Information l The Globus Project™ - l Grid architecture - ers/anatomy.pdf l Open Grid Services Architecture -