0 Cancer Biomedical Informatics Grid (caBIG) – An Approach towards Data Access and Integration Avinash Shanbhag Director, Core Infrastructure Engineering.

Slides:



Advertisements
Similar presentations
Introduction The cancerGrid metadata registry (cgMDR) has proved effective as a lightweight, desktop solution, interoperable with caDSR, targeted at the.
Advertisements

27 June 2005caBIG an initiative of the National Cancer Institute, NIH, DHHS caBIG the cancer Biomedical Informatics Grid Arumani Manisundaram caBIG - Project.
Open Grid Forum 19 January 31, 2007 Chapel Hill, NC Stephen Langella Ohio State University Grid Authentication and Authorization with.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
WEB SERVICES DAVIDE ZERBINO.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Dorian Grid Identity Management and Federation Dialogue Workshop II Edinburgh, Scotland February 9-10, 2006 Stephen Langella Department.
CaGrid Service Metadata Scott Oster - Ohio State
The cancer Biomedical Informatics Grid™ (caBIG™): In Vivo Imaging Workspace Projects Fred Prior, Ph.D. Mallinckrodt Institute of Radiology Washington University.
0 The Cancer Biomedical Informatics Grid From Village to City Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for.
Technical Introduction to caGrid Service Development caGrid 1.3 Justin Permar caGrid Knowledge Center
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CaGrid Executive Introduction caGrid 1.3 Justin Permar caGrid Knowledge Center kc.nci.nih.gov/CaGrid/KC.
Department of Biomedical Informatics Development of Ontology-anchored Grid-based Data Services to Facilitate Integrative Clinical and Translational Science.
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract.
Silver to Grid Data Services Session III: Deploying a Data Service on caGrid and using caGrid Service APIs caBIG™ Annual Meeting June 23-25, 2008.
1 ISO Data Types Adoption - The Plan and the Tools Architecture/VCDE Joint Face-to-Face June 3, 2010 St. Louis, Missouri Sichen Liu CBIIT Core Infrastructure.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
State of Service Oriented Science Tools Open Source Grid Cluster Conference Oakland.
1 1 caCORE: A Common Framework for Creating, Managing and Deploying Semantically Interoperable Systems SCIop April 27, 2006 Denise Warzel Associate Director,
Metadata Tools and Methods Chris Nelson Metanet Conference 2 April 2001.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Interfacing Registry Systems December 2000.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
Grid Trust Service (GTS). Problem How does the grid clients/services know which CA certificates to trust? Should I trust this CA?
Ashish Sharma, Tony Pan, Barla Cambazoglu, Joel Saltz Ohio State University, Columbus, OH (ashish, tpan, October 10, 2007 caBIG In Vivo.
Introduce Grid Service Authoring Toolkit Shannon Hastings, Scott Oster, Stephen Langella, David Ervin Ohio State University Software Research Institute.
Modeling Component-based Software Systems with UML 2.0 George T. Edwards Jaiganesh Balasubramanian Arvind S. Krishna Vanderbilt University Nashville, TN.
Cancer MetaData Standards Peter A. Covitz, Ph.D. HL7 RCRIM October 1, 2002.
CaCORE Software Development Kit George Komatsoulis 25-Feb-2005.
CaDSR Software Users Meeting 3.1 Requirements Review 9/19/2005 caDSR Software Team Host: Denise Warzel NCICB, Assistant Director, caDSR.
Grid Services I - Concepts
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
1 Introduction to the caDSR Presented to HL7 Vocab SIG January 24, 2005 Denise Warzel National Cancer Institute, Center for Bioinformatics caDSR Project.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Design for a High Performance, Configurable caGrid Data Services Platform Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact:
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
CaGrid 1.0 Security Infrastructure Stephen Langella, Scott Oster, Shannon Hastings, David Ervin, Joshua Phillips, Vinay Kumar, Tahsin Kurc, Joel Saltz.
0 caCORE: A Common Framework for Cancer Data Management Denise Warzel Associate Director, Core Infrastructure National Cancer Institute Center for Bioinformatics.
0 Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute.
Challenges and issues with information sharing: The four pillars of semantic interoperability Douglas B. Fridsma, MD, PhD, FACP University of Pittsburgh.
CaCORE In Action: An Introduction to caDSR and EVS Browsers for End Users A Tool Demonstration from caBIG™ caCORE (Common Ontologic Representation Environment)
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
Tony Pan, Stephen Langella, Shannon Hastings, Scott Oster, Ashish Sharma, Metin Gurcan, Tahsin Kurc, Joel Saltz Department of Biomedical Informatics The.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
NCI Center for Biomedical Informatics and Information Technology (CBIIT) The CBIIT is the NCI’s strategic and tactical arm for research information management.
Fred Prior, Ph.D. Mallinckrodt Institute of Radiology
XML Based Interoperability Components
Vision and Infrastructure Behind the
UML profiles.
The Anatomy and The Physiology of the Grid
Presentation transcript:

0 Cancer Biomedical Informatics Grid (caBIG) – An Approach towards Data Access and Integration Avinash Shanbhag Director, Core Infrastructure Engineering National Cancer Institute Center for Bioinformatics

1 National Cancer Institute 2015 Goal Relieve suffering and death due to cancer by the year 2015

2 Origins of caBIG  Need: Enable investigators and research teams nationwide to combine and leverage their findings and expertise in order to meet NCI 2015 Goal.  Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network and data can be seamlessly shared

3 caBIG Challenges  Handle diversity of data types  Precise “Meaning” of data  Provide local hosting of data  Local access control  Provide tools to “publish” and “access” data easily  High Performance computing will be needed in future

4 Semantic interoperability Syntactic interoperability Interoperability ability of a system to access and use the parts or equipment of another system

5 How to Achieve Interoperability for Data Systems?  Well Documented public API access to data  Based on object oriented abstraction of underlying data –No particular technology or tool specified  Abstraction layer must be derived using widely accepted “standards” –Model Driven Architecture  Information Model is the “Metadata” of the data and needs to be persisted and accessible via API  Need to be able to “unambiguously” and programmatically determine the meaning of data

6 OMG Model Driven Architecture (MDA) Approach  Analyze the problem space and develop the artifacts for each scenario –Use Cases  Use Unified Modeling Language (UML) to standardize model representations and artifacts. Design the system by developing artifacts based on the use cases –Class Diagram – Information Model –Sequence Diagram – Temporal Behavior  Use meta-model tools to generate the code

7 Limitations of MDA  Limited expressivity for semantics  No facility for runtime semantic metadata management

8 caCORE Syntactic and Semantic Integration MDA Plus a whole lot more!

9 caCORE Bioinformatics ObjectsEnterprise VocabularyCommon Data Elements SECURITYSECURITY

10 Use Cases  Description  Actors  Basic Course  Alternative Course

11 Bioinformatics Objects

12  What do all those data classes and attributes actually mean, anyway?  Data descriptors or “semantic metadata” required  Computable, commonly structured, reusable units of metadata are “Common Data Elements” or CDEs.  NCI uses the ISO/IEC standard for metadata structure and registration  Semantics all drawn from Enterprise Vocabulary Service resources Common Data Elements

13 Preferred Name Synonyms Definition Relationships Concept Code Enterprise Vocabulary Description Logic

14 Semantic metadata example: Agent Taxol 007

15 Why do you need metadata? Class/ Attribute Example Object Data CIA MetadataNCI Metadata Agent A sworn intelligence agent; a spy Chemical compound administered to a human being to treat a disease or condition, or prevent the onset of a disease or condition Agent nSCNumber 007 Identifier given to an intelligence agent by the National Security Council Identifier given to chemical compound by the US Food and Drug Administration Nomenclature Standards Committee Agent name Taxol CIA code name given to intelligence agents Common name of chemical compound used as an agent

16 Computable Interoperability Agent name nSCNumber FDAIndID CTEPName IUPACName Drug id NDCCode approver approvalDate fdaCode C1708:C41243 C1708 My modelYour model

17 Cancer Data Standards Repository  ISO/IEC Registry for Common Data Elements – units of semantic metadata  Client for Enterprise Vocabulary: metadata constructed from controlled terminology and annotated with concept codes  Precise specification of Classes, Attributes, Data Types, Permissible Values: Strong typing of data objects.

18 caCORE Tools  UML Loader: automatically register UML models as metadata components  CDE Curation: Fine tune metadata and constrain permissible values with data standards  Form Builder: Create standards-based data collection forms  CDE Browser: search and export metadata components  Common Security Module: Provides role based security

19 caCORE Software Development Kit  UML Modeling Tool (any with XMI export)  Semantic Connector (concept binding utility)  UML Loader (model registration in caDSR)  Codegen (middleware code generator)  Security Adaptor (Common Security Module) caCORE SDK generates syntactically and semantically interoperable data service system

20 caGrid caCORE meets grid technology!

21 Use cases not satisfied by caCORE alone  Advertisement –Service Provider composes service metadata describing the service and publishes it to grid.  Discovery –Researcher (or application developer) specifies search criteria describing a service of interest –The research submits the discovery request to a discovery service, which identifies a list of services matching the criteria, and returns the list.  Invocation –Researcher (or application developer) instantiates the grid service and access its resources

22 Gold Cancer Center Cancer Center NCI OTHER caBIG SERVICE PROVIDERS OTHER TOOLKITS Silver

23 caGrid Components  Leverage existing technologies: –caDSR, EVS, Mobius GME: Common data elements, controlled vocabularies, schema management –Globus Toolkit (currently version 4.0.1) Core grid services infrastructure Service deployment, service registry, invocation, base security infrastructure  Additional Core Infrastructure –Higher-level security services (Dorian) –Grid service access to metadata components (caDSR, GME, etc) –Workflow, Identifier services  Service Provider Tooling (Introduce) –Graphical service development and configuration environment –Abstractions from service infrastructure for Data and Analytical services –Deployment wizards  Client Tooling –High-level APIs for interacting with core components and services –Graphical Tools

24 caGrid 0.5 Architecture (May be updated for 1.0) Grid Communication Protocol Service Description Service Business Process Service Registry Security Semantic service Resource Management Functions Quality of Service ID Resolution Transport GSI GUMS GT3 Analytical OGSA-DAIGT3 GLOBUS Toolkit caDSR EVSGT3 UI caDSR Index GME CAMS

25 Data Object Semantics, Metadata, and Schemas  Object oriented, APIs, well-defined data types  Classes defined in UML and converted into ISO/IEC 11179, registered in the caDSR  Definitions drawn from Enterprise Vocabulary Services (EVS), relationships semantically described  XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

26 Introduce Toolkit  A framework which enables fast and easy creation of caGrid compatible services whether they are data, analytical, custom, or core services.  Provide easy to use graphical service authoring tools.  Hide all “grid-ness” from the developer so that they can concentrate on the domain expert implementation.  Utilize best practice layered grid service architectures.  Handle all service architecture requirements of the caGrid. –Strong service interface data typing –Metadata and service registration –Grid security integration

27 Data Service Access on caGrid  Specialization of caGrid grid services to expose data through a common query interface  Present an object view of data sources  Exposed objects are registered in caDSR and their XML representation in GME  Queries made with caBIG Query Language (CQL) Query objects  Results returned as objects (or identifiers) nested in a CQL Query Result Set

28 Data Service Query Language  Specialization of caGrid grid services to expose data through a common query interface  Present an object view of data sources  Exposed objects are registered in caDSR and their XML representation in GME  Queries made with CQL Query objects  Results returned as objects (or identifiers) nested in a CQL Query Result Set

29 Data Service Interface public CQLQueryResultsType processQuery(CQLQueryType query)  Data Provider’s only responsibility is to implement CQL over their local data resource –A default implementation will be provided for caCORE SDK created systems  caGrid provides grid service implementation to invoke provider’s CQL implementation  Service provides all features necessary for compliance, such as advertisement of data service metadata, and security integration

30 Data Service Query Scenario 4.Data Source is queried by the Grid Data Service 5.Grid Data Service Builds a CQL Result Set 6.Result Set is serialized and returned to the client 7.Client deserializes result set 8.Result set is iterated with client tools to retrieve objects 1.Client builds a CQL Query 2.CQL Query is serialized and submitted to the Grid Data Service 3.Grid Data Service deserializes the CQL Query Object and processes it

31 Federated and Aggregated Queries  Componentized library being developed to facilitate limited federating and aggregating queries  An extension language used to describe distributed queries  Library creates and executes a Query Plan for the distributed query, using multiple CQL queries to targeted data services

32 Data Service Client Tooling  APIs provided to discover available data services on the grid based on client-defined criteria (such exposed data models and concepts)  Object-Oriented API for building queries, querying a given data service, and processing the results  Client tools available to iterate query result sets –Object iterator deserializes XML into registered objects –XML iterator simply returns XML documents

33 Acknowledgements (caGrid Team)  Ohio State University - Department of BioMedical Informatics –Dave Ervin –Shannon Hastings –Tahsin Kurc –Stephen Langella –Scott Oster –Joel Saltz  Argonne National Lab / University of Chicago –William Allcock –Jarek Gawor –Ravi Madduri –Frank Siebenlist –Michael Wilde  Duke University – A. Jamie Cuticchia –Patrick McConnell  Georgetown University –Colin Freas –Paul A. Kennedy –Chad La Joie  SAIC ( –Manav Kher  ScenPro/Semantic Bits –Vinay Kumar –David Wellborn –Valerie Bragg  Booz | Allen | Hamilton ( –Arumani Manisundaram –Michael Keller –Reechik Chatterjee

34 Acknowledgements NCI Andrew von Eschenbach Anna Barker Wendy Patterson OC DCTD DCB DCP DCEG DCCPS CCR Industry Partners SAIC BAH Oracle ScenPro Ekagra Apelon Terrapin Systems Panther Informatics NCICB Ken Buetow Peter Covitz George Komatsoulis Denise Warzel Frank Hartel Sherri De Coronado Dianne Reeves Gilberto Fragoso Jill Hadfield Leslie Derr