Presentation is loading. Please wait.

Presentation is loading. Please wait.

CaGrid Overview and Core Services caGrid Knowledge Center February 2011.

Similar presentations

Presentation on theme: "CaGrid Overview and Core Services caGrid Knowledge Center February 2011."— Presentation transcript:

1 caGrid Overview and Core Services caGrid Knowledge Center February 2011

2 caGrid A Grid software middleware infrastructure consisting of services, toolkits, APIs, and runtime environment Standards Based, Open Source Building blocks to create interoperable, Grid-enabled systems Service Oriented Architecture Web Services Resource Framework standards Model Driven Architecture Object oriented view, published information models, strongly-typed services Rich metadata A production Grid deployment of the core services provided by that infrastructure Security, Data Services Infrastructure, Service Development & Deployment, Metadata, Federated Query, Workflow, Advertisement & Discovery Provides the software foundation which underlies the tools and applications of caBIG

3 Application Scenario A clinician/researcher is involved in a multi-institutional clinical trial of a new targeted therapeutic Microarray, Proteomic, and Image data are collected from patients participating in the trial Researcher wants to carry out a correlative analysis to assess the treatment Query and analyze microarray, image, and protein data from multiple patients to find interesting patterns Look for similar patterns in other microarray, protein, and image databases Patients may have been seen at multiple institutions Datasets may have been collected at different institutions

4 Application Scenario Location A Microarray, Protein, Image data Location B Microarray, Protein, Image data Location C Microarray, Protein, Image data Location C Image Analysis Location D Image Analysis Microarray and protein databases at other institutions Different database systems, different data representations, security Different invocations of programs, remote access, how to transfer data.

5 caGrid Production Environment

6 Infrastructure Core Capabilities Model-Driven and Metadata Enabling and supporting interoperable services Providing service-oriented metadata Service development and deployment Tooling for bringing applications and data to the grid Advertisement and Discovery Publishing services to the Grid Enabling search for services based on service metadata Security Integrating existing systems and applications with Grid security Lowering burden of implementation of grid-wide and local policy Facilitating Grid wide operations Federated query, workflow execution Making services and core infrastructure more accessible Graphical installation and configuration, higher-level object-oriented APIs, web portals, graphical administrative applications

7 Model Driven, Interoperable Services Client and service APIs are object oriented, and operate over well-defined and curated data types Objects are defined in UML and Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

8 Global Model Exchange and Metadata Model Services Global Model Exchange Provides support to store and retrieve schemas for types used in Grid services. Developers should register the schemas defining types used in Grid services with the GME. Metadata Model Service (MMS) Provides support for developers to generate and add service metadata Developers can augment standard caGrid service metadata with information from metadata registries, such as the caDSR External registry provides the means to add, modify, delete, or otherwise manage the UML models and their correspondence to XML Schemas which the MMS leverages

9 Service Development and Deployment: Introduce A framework which enables fast and easy creation of Grid services. Provides easy to use graphical service authoring tool. Hides all “grid-ness” from the developer. Handles all core service architecture requirements for strongly typed and highly interoperable grid services. Integration with other core grid services and architecture components GAARDS Security Infrastructure Globus Index Service Global Model Exchange Metadata Model Service Cancer Data Standards Repository Extension Framework for integrating with other architecture components

10 Introduce Features Supports modification of operations Adding operations Removing Operations Updating Operations Importing Operations Graphical Configuration Advertisement Security Service Metadata Specification Service Metadata Editing Service Configuration Properties Auto Generates Code for Service Auto generates a client API for service. Graphical Deployment of Service Globus Tomcat JBoss

11 Advertisement and Discovery: Index Service All services register their service metadata information to the Index Service Clients can discover services using a discovery API which facilitates inspection of data types Leveraging semantic information in EVS (from which service metadata is drawn), services can be discovered by the semantics of their data types Examples: “Find me all the services from Cancer Center X” “Which Analytical services take Genes as input?” “Find me all the services with some metadata mentioning the string ‘macromolecules’”

12 Service Metadata: Data Service Data Service Metadata Describes the Domain Model being exposed, in terms of a UML model linked to semantics Data types defined in terms of structure and semantics extracted from caDSR and EVS Auto-generated by caGrid service authoring toolkit (Introduce)

13 Security Services Authentication How to identify a client (or a service) Secure login Integrate the Grid with existing institutional login systems! Enforce data sharing policies and access control Local policies Federated access Trust Fabric How to trust a client and what level Dynamically adapt trust if security breach

14 caGrid Security Infrastructure (GAARDS) Dorian Allows accounts managed in external domains to be federated and managed in the Grid. Allows users to use their existing credentials (external to the Grid) to authenticate to the Grid Grid Grouper/CSM Provides a group-based authorization solution for the Grid Grid Trust Service Supports applications and services in deciding whether or not signers of digital credentials can be trusted. Supports the provisioning of trusted certificate authorities and corresponding certificate revocation lists.  Provides services and tools for the administration and enforcement of security policy in an enterprise Grid.

15 Secure Clinical Research Support with GAARDS Use Dorian for grid authentication Integrate with my LDAP user database and authentication Use Grid Grouper (along with local mechanisms) for Grid authorization I let reviewers from institution X access patient data in the “Watson” research trial for review only Data Entry personnel for the research trial have permission to add new data, but not update existing data I bar institution X from accessing any other data I’m sharing on the Grid Use GTS to update the grid trust fabric I trust institution Y after finalizing data sharing agreements for the Watson research

16 caGrid Data Service Infrastructure caGrid Data Services provide capability to expose data resources to the Grid Specialization of caGrid grid services to expose data through a common query interface Introduce extensions to create data services from information models and using caCORE SDK Queries made with caBIG Query Language Query objects. Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties Ability to return full Objects, Set of attributes, count of results, or distinct attribute values Support for Bulk Data Transport for efficient transfer of large data volumes

17 Federated Query Processor Service Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services Can be used to express queries against any combination of caGrid data services, since each service uses CQL Federated queries are expressed using DCQL, an extension to CQL Express joins, aggregations, and target data services Client API provides a means of expressing DCQL queries Federated Query Processor service partitions a DCQL query into queries to respective data services, carries out joins and aggregations, and compiles the results 17

18 Workflow Service Provides capability to describe “orchestrations” of service invocations and data movement Support two workflow execution engines ActiveBPEL (Deprecated in caGrid 1.4) Taverna Coupled with semantic discovery, service metadata, and registration of data type structures in caGrid, provides a powerful framework for analyzing data Services can be dynamically discovered and federated queries can be invoked as part of a workflow

19 Putting It Together for Example Scenario Location A Microarray, Protein, Image data Location B Microarray, Protein, Image data Location C Microarray, Protein, Image data Location C Image Analysis Location D Image Analysis Microarray and protein databases at other institutions caGrid Service Interfaces caGrid Environment Registered Object Definitions Advertisement Log on, Grid credentials Query and Analysis Workflow Discovery

Download ppt "CaGrid Overview and Core Services caGrid Knowledge Center February 2011."

Similar presentations

Ads by Google