0 The Cancer Biomedical Informatics Grid From Village to City Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for.

Slides:



Advertisements
Similar presentations
Introduction The cancerGrid metadata registry (cgMDR) has proved effective as a lightweight, desktop solution, interoperable with caDSR, targeted at the.
Advertisements

27 June 2005caBIG an initiative of the National Cancer Institute, NIH, DHHS caBIG the cancer Biomedical Informatics Grid Arumani Manisundaram caBIG - Project.
CVRG Presenter Disclosure Information Joel Saltz MD, PhD Director Comprehensive Informatics Center Emory University Translational Research Informatics.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
NCCCP Informatics Ken Buetow, Ph.D. Leslie Derr, Ph.D. John Speakman NCI Center for Biomedical Informatics and Information Technology June 25, 2007.
Overview of Biomedical Informatics Rakesh Nagarajan.
CaGrid Service Metadata Scott Oster - Ohio State
Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An.
Measurable Interoperability for Archival Data Lewis J. Frey, PhD
CaGrid, Fog and Clouds Joel Saltz MD, PhD Director Center for Comprehensive Informatics.
CaBIG: the cancer Biomedical Informatics Grid Ken Buetow NCICB/NCI/NIH/DHHS.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Vanderbilt University NascaBIG ® TBPT Face-to-Face Meeting December 6-7, 2011 Nashville, TN caBIG ® TBPT Face-to-Face Meeting December 6-7, 2011 Vanderbilt.
Department of Biomedical Informatics Development of Ontology-anchored Grid-based Data Services to Facilitate Integrative Clinical and Translational Science.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
1 1 caCORE: A Common Framework for Creating, Managing and Deploying Semantically Interoperable Systems SCIop April 27, 2006 Denise Warzel Associate Director,
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Digital Object Architecture
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
CaBIG Semantic Infrastructure 2.0: Supporting TBPT Needs Dave Hau, M.D., M.S. Acting Director, Semantic Infrastructure NCI Center for Biomedical Informatics.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Cancer Clinical Trial Suite (CCTS): An Introduction for Users A Tool Demonstration from caBIG™ Bill Dyer (NCI/Pyramed Research) June 2008.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Clinical Data Interchange Standards Consortium (CDISC) uses NCIt for its Study Data Tabulation Model (SDTM) and other global data standards for medical.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
1 LS DAM Overview and the Specimen Core February 16, 2012 Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Elaine Freund,
Cancer MetaData Standards Peter A. Covitz, Ph.D. HL7 RCRIM October 1, 2002.
CaCORE Software Development Kit George Komatsoulis 25-Feb-2005.
0 Cancer Biomedical Informatics Grid (caBIG) – An Approach towards Data Access and Integration Avinash Shanbhag Director, Core Infrastructure Engineering.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CaDSR Software Users Meeting 3.1 Requirements Review 9/19/2005 caDSR Software Team Host: Denise Warzel NCICB, Assistant Director, caDSR.
Data Integration and Management A PDB Perspective.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Interchange vs Interoperability Main Entry: in·ter·op·er·a·bil·i·ty : ability of a system... to use the parts or equipment of another system Source: Merriam-Webster.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
What is NCIA? National Cancer Imaging Archive Searchable repository of in vivo cancer images in DICOM format Publicly available at no cost over the Internet.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Protégé 3.4 Plug-in for Editing and Maintaining the NCI Thesaurus Protégé Conference June 23, 2009 Amsterdam Sherri de Coronado, Gilberto Fragoso.
0 caCORE: A Common Framework for Cancer Data Management Denise Warzel Associate Director, Core Infrastructure National Cancer Institute Center for Bioinformatics.
High throughput biology data management and data intensive computing drivers George Michaels.
May 2007 CTMS / Imaging Interoperability Scenarios March 2009.
Welcome to the caBIG Community! The cancer Biomedical Informatics Grid (caBIG ® ) offers more than 120 open source tools, technologies and infrastructure.
0 Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute.
Challenges and issues with information sharing: The four pillars of semantic interoperability Douglas B. Fridsma, MD, PhD, FACP University of Pittsburgh.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
CaCORE In Action: An Introduction to caDSR and EVS Browsers for End Users A Tool Demonstration from caBIG™ caCORE (Common Ontologic Representation Environment)
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
1 caBIG®-aligned Enterprise Metadata Infrastructure to Support Commercial Clinical Trials Management Software: A Pilot Implementation September 11, 2009.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.
C3PR: An Introduction for Users A Tool Demonstration from caBIG™ Vijaya Chadaram Duke Cancer Center April 29, 2008.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Enhancements to Galaxy for delivering on NIH Commons
NCI Center for Biomedical Informatics and Information Technology (CBIIT) The CBIIT is the NCI’s strategic and tactical arm for research information management.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Networking and Health Information Exchange
Vision and Infrastructure Behind the
Presentation transcript:

0 The Cancer Biomedical Informatics Grid From Village to City Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for Bioinformatics

1 National Cancer Institute 2015 Goal Relieve suffering and death due to cancer by the year 2015

2 Origins of caBIG  Need: Enable investigators and research teams to broadly combine and leverage their findings and expertise in order to meet NCI 2015 Goal.  Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

3 Scenario from Strategic Plan A researcher involved in a phase II clinical trial of a new molecularly targeted therapeutic for brain tumors observes that cancers derived from one specific tissue progenitor appear to be strongly affected. The trial has been generating proteomic and microarray data. The researcher would like to identify potential biochemical and signaling pathways that might be different between this cell type and other potential progenitors in cancer, deduce whether anything similar has been observed in other clinical trials involving agents known to affect these specific pathways, and identify any studies in model organisms involving tissues with similar pathway activity.

4 From Village to City

5 caBIG Principles  Open Source –Publicly-funded development must yield openly distributable products.  Open Development –Community-driven development aligns needs with development priorities  Open Access –Data has value beyond original purpose for collection. Scientific method demands verification by peers. Obligation to share publicly- funded data products.  Federated –Local control of deployments. No central “Ministry of Information.” Scalable.

6 Community Priorities Clinical Data Management Tools & Databases Staff Resources Distributed General Data Sharing & Analysis Tools Translational Research Tools Access to Data Tissue & Pathology Tools Center Integration & Management Common Data Elements (CDE) & Architecture Meta-Project Vocabulary & Ontology Tools & Databases Statistical Data Analysis Tools Visualization & Front-End Tools Remote/Bandwidth Proteomics Microarray & Gene Expression Tools Meeting Laboratory Information Management Systems (LIMS) Licensing Issues Pathways High Performance Computing Integration Imaging Tools & Databases Database & Datasets Number of Needs Reported Clinical Trial Management Systems Tissue Banks & Pathology Integrative Cancer Research

7 caBIG Organization Structure Architecture Vocabularies & Common Data Elements Working Group General Contractor Strategic Working Groups Clinical Trial Mgmt Integrative Cancer Research Tissue Banks & Pathology Tools Working Group caBIG Oversight = Project

8 Interoperability Semantic interoperability Syntactic interoperability Courtesy: Charlie Mead  in·ter·op·er·a·bil·i·ty –ability of a system...to use the parts or equipment of another system Source: Merriam-Webster web site  interoperability –ability of two or more systems or components to exchange information and to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE, 1990]

9 SYNTACTIC SEMANTIC caBIG Compatibility Guidelines

10 Model-Driven Architecture

11

12 MDA Approach  Analyze the problem space and develop the artifacts for each scenario –Use Cases  Use Unified Modeling Language (UML) to standardize model representations and artifacts. Design the system by developing artifacts based on the use cases –Class Diagram – Information Model –Sequence Diagram – Temporal Behavior  Use meta-model tools to generate the code

13 Limitations of MDA  Limited expressivity for semantics  No facility for runtime semantic metadata management

14 caCORE MDA plus a whole lot more!

15 caCORE Bioinformatics ObjectsEnterprise VocabularyCommon Data Elements SECURITYSECURITY

16 Use Cases  Description  Actors  Basic Course  Alternative Course

17 Bioinformatics Objects

18  What do all those data classes and attributes actually mean, anyway?  Data descriptors or “semantic metadata” required  Computable, commonly structured, reusable units of metadata are “Common Data Elements” or CDEs.  NCI uses the ISO/IEC standard for metadata structure and registration Common Data Elements

19 Semantic metadata example: Agent Taxol 007

20 Why do you need metadata? Class/ Attribute NCI MetadataCIA MetadataExample Value AgentChemical compound administered to a human being to treat a disease or condition, or prevent the onset of a disease or condition A sworn intelligence agent; a spy Agent nSCNumber Identifier given to chemical compound by the US Food and Drug Administration (FDA) Nomenclature Standards Committee (NSC) Identifier given to an intelligence agent by the National Security Council 007 Agent name Common name of chemical compound used as an agent CIA code name given to intelligence agents Taxol

21 Cancer Data Standards Repository  ISO/IEC Registry for Common Data Elements – units of semantic metadata  Precise definitions of Classes, Attributes, Data Types, Permissible Values: Strong typing of data objects.  Tools: –UML Loader: automatically register UML models as metadata components –CDE Curation: Fine tune metadata and constrain permissible values with data standards –Form Builder: Create standards-based data collection forms –CDE Browser: search and export metadata components  Client for Enterprise Vocabulary: metadata constructed from ontology terms and concepts.

22 Preferred Name Synonyms Definition Relationships Concept Code Enterprise Vocabulary Description Logic Ontologies

23 Tying it all together: The caCORE semantic management framework Ontology Metadata IDConcept Codes C C1708:C C1708:C C1708:C C1708:C42614 Enterprise Vocabulary Common Data Elements Bioinformatics Objects

24 Computable Interoperability Agent name nSCNumber FDAIndID CTEPName IUPACName Drug id NDCCode approver approvalDate fdaCode C1708:C41243 C1708 My modelYour model

25 caCORE Software Development Kit

26 caCORE SDK Components  UML Modeling Tool (we use Enterprise Architect) –Information domain model defines data classes, attributes and relationships  Semantic Connector (included in download) –Annotates UML model with ontology concepts: bridges the world of databases to that of structured semantics  UML Loader (run by NCICB staff for now) –Loads model into the caDSR metadata registry –Model and associated semantics are available as metadata at runtime  Code Generator (included in download) –UML model used as input into code generator –Produces object-oriented middleware that instantiates model –Object-relational mappings tie middleware to databases and other storage/retrieval systems. –Programming interfaces provide access to system for application developers (Java APIs currently implemented; Web Services in upcoming release)

27 Java Applications Data Access Objects Web Application Server Interfaces Java SOAP XML HTTP Clients SOAP Clients Data Clients Perl Clients Enterprise Vocabulary Common Data Elements Middleware APIAPI APIAPI APIAPI APIAPI Data Access Objects Domain Objects [Gene, Disease, etc.] Domain Objects [Gene, Disease, Agent, etc.] caCORE Architecture Biomedical Data

28 Cancer Center Cancer Center NCI caGrid OTHER caBIG SERVICE PROVIDERS OTHER TOOLKITS

29 Grid Communication Protocol Service Description Service Workflow Service Registry Security Semantic Service Resource Management Functions Quality of Service ID Resolution OGSA Compliant - Service Oriented Architecture Transport caGrid Service-Oriented Architecture GSI CAS myProxy Globus OGSA-DAIGlobusGRAM Globus Toolkit caCORE Mobius Globus

30 caBIG Compatible Software and Data Resources  caArray – Cancer microarray data management system  C3D – Clinical Trials data capture application  C3PR - Clinical trial participant registry tool  caWorkbench - Microarray analysis suite  caTIES - Automated free-text pathology data extraction tool  caTISSUE - Biospecimen database and tracking system  RProteomics - MALDI-TOF proteomics analysis tool  Gene Ontology Miner (GOMiner) - Tool for aggregate analysis of gene sets  HapMap - caBIG accessible map of haplotypes in human genome  Promoter Database  UniProt-PIR - Protein sequence and annotation database  Curated Cancer Pathways Data - Data sets generated from NCI 60 cell lines  Human-Mouse Anatomy Ontology  Nutritional Compound Ontology *Note: Examples of upcoming 2006 Products and Data Sets  Distance Weighted Discrimination - Microarray data analysis integrator  Cancer Molecular Pages Prototype - Cancer gene annotation with web-based visualization  Magellan - Tool for the analysis of heterogeneous data types (e.g., microarray)  Visual and Statistical Data Analyzer (VISDA) - Multivariate statistical visualization tool for the analysis of complex data  FunctionExpress - Tool for integrated analysis and visualization of Microarray data  Quantitative Pathway Analysis in Cancer (QPACA) - Pathway modeling and analysis tool  TrAPSS - Disease gene mutation discovery and analysis tool  Proteomics Laboratory Information Management System Prototype  SEED - Peer-to-Peer genome annotation tool  Pathways Tool Project - Pathway visualization tools  LexGrid – Ontology hosting software

31 NCI Andrew von Eschenbach Anna Barker Wendy Patterson OC DCTD DCB DCP DCEG DCCPS CCR Industry Partners SAIC BAH Oracle ScenPro Ekagra Apelon Terrapin Systems Panther Informatics NCICB Ken Buetow Sue Dubman Leslie Derr Frank Hartel George Komatsoulis Avinash Shanbhag Denise Warzel Sherri De Coronado Dianne Reeves Gilberto Fragoso Jill Hadfield

32 caBIG Participant Community 9Star Research Albert Einstein Ardais Argonne National Laboratory Burnham Institute California Institute of Technology-JPL City of Hope Clinical Trial Information Service (CTIS) Cold Spring Harbor Columbia University-Herbert Irving Consumer Advocates in Research and Related Activities (CARRA) Dartmouth-Norris Cotton Data Works Development Department of Veterans Affairs Drexel University Duke University EMMES Corporation First Genetic Trust Food and Drug Administration Fox Chase Fred Hutchinson GE Global Research Center Georgetown University-Lombardi IBM Indiana University Internet 2 Jackson Laboratory Johns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan Kettering Meyer L. Prentis-Karmanos New York University Northwestern University-Robert H. Lurie Ohio State University-Arthur G. James/Richard Solove Oregon Health and Science University Roswell Park Cancer Institute St Jude Children's Research Hospital Thomas Jefferson University-Kimmel Translational Genomics Research Institute Tulane University School of Medicine University of Alabama at Birmingham University of Arizona University of California Irvine-Chao Family University of California, San Francisco University of California-Davis University of Chicago University of Colorado University of Hawaii University of Iowa-Holden University of Michigan University of Minnesota University of Nebraska University of North Carolina-Lineberger University of Pennsylvania-Abramson University of Pittsburgh University of South Florida-H. Lee Moffitt University of Southern California-Norris University of Vermont University of Wisconsin Vanderbilt University-Ingram Velos Virginia Commonwealth University-Massey Virginia Tech Wake Forest University Washington University-Siteman Wistar Yale University