Vision and Infrastructure Behind the

Slides:



Advertisements
Similar presentations
Introduction The cancerGrid metadata registry (cgMDR) has proved effective as a lightweight, desktop solution, interoperable with caDSR, targeted at the.
Advertisements

27 June 2005caBIG an initiative of the National Cancer Institute, NIH, DHHS caBIG the cancer Biomedical Informatics Grid Arumani Manisundaram caBIG - Project.
Open Grid Forum 19 January 31, 2007 Chapel Hill, NC Stephen Langella Ohio State University Grid Authentication and Authorization with.
CVRG Presenter Disclosure Information Joel Saltz MD, PhD Director Comprehensive Informatics Center Emory University Translational Research Informatics.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Dorian Grid Identity Management and Federation Dialogue Workshop II Edinburgh, Scotland February 9-10, 2006 Stephen Langella Department.
NCCCP Informatics Ken Buetow, Ph.D. Leslie Derr, Ph.D. John Speakman NCI Center for Biomedical Informatics and Information Technology June 25, 2007.
Overview of Biomedical Informatics Rakesh Nagarajan.
CaGrid Service Metadata Scott Oster - Ohio State
SOA & BPM Business Architecture, SOA & BPM Learn about SOA and Business Process Management (BPM) Learn how to build process diagrams.
0 The Cancer Biomedical Informatics Grid From Village to City Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for.
CaBIG: the cancer Biomedical Informatics Grid Ken Buetow NCICB/NCI/NIH/DHHS.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Vanderbilt University NascaBIG ® TBPT Face-to-Face Meeting December 6-7, 2011 Nashville, TN caBIG ® TBPT Face-to-Face Meeting December 6-7, 2011 Vanderbilt.
Department of Biomedical Informatics Development of Ontology-anchored Grid-based Data Services to Facilitate Integrative Clinical and Translational Science.
CILogon and InCommon: Technical Update Jim Basney This material is based upon work supported by the National Science Foundation under grant numbers
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
DMSO Technical Exchange 3 Oct 03 1 Web Services Supporting Simulation to Global Information Grid Mark Pullen George Mason University with support from.
1 1 caCORE: A Common Framework for Creating, Managing and Deploying Semantically Interoperable Systems SCIop April 27, 2006 Denise Warzel Associate Director,
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Research: Intramural and Extramural Jacquelyn Goldberg, JD CIRB Review Board Administrator Clinical Investigations Branch, CTEP Division of Cancer Treatment.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
Interoperability Framework Overview Health Information Technology (HIT) Standards Committee June 24, 2010 Presented by: Douglas Fridsma, MD, PhD Acting.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
1 LS DAM Overview and the Specimen Core February 16, 2012 Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Elaine Freund,
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Cancer MetaData Standards Peter A. Covitz, Ph.D. HL7 RCRIM October 1, 2002.
CaCORE Software Development Kit George Komatsoulis 25-Feb-2005.
0 Cancer Biomedical Informatics Grid (caBIG) – An Approach towards Data Access and Integration Avinash Shanbhag Director, Core Infrastructure Engineering.
CaDSR Software Users Meeting 3.1 Requirements Review 9/19/2005 caDSR Software Team Host: Denise Warzel NCICB, Assistant Director, caDSR.
1 ECCF Training 2.0 Introduction ECCF Training Working Group January 2011.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Technical Support to SOA Governance E-Government Conference May 1-2, 2008 John Salasin, Ph.D. DARPA
Protégé 3.4 Plug-in for Editing and Maintaining the NCI Thesaurus Protégé Conference June 23, 2009 Amsterdam Sherri de Coronado, Gilberto Fragoso.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
CaGrid 1.0 Security Infrastructure Stephen Langella, Scott Oster, Shannon Hastings, David Ervin, Joshua Phillips, Vinay Kumar, Tahsin Kurc, Joel Saltz.
0 caCORE: A Common Framework for Cancer Data Management Denise Warzel Associate Director, Core Infrastructure National Cancer Institute Center for Bioinformatics.
Structured Protocol Representation for the Cancer Biomedical Informatics Grid: caSPR and caPRI.
Welcome to the caBIG Community! The cancer Biomedical Informatics Grid (caBIG ® ) offers more than 120 open source tools, technologies and infrastructure.
0 Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
1 HL7 SAIF Enterprise Conformance and Compliance Framework (ECCF) Overview Baris E. Suzek Bob Freimuth VCDE Monthly Meeting December, 2010.
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
Tony Pan, Stephen Langella, Shannon Hastings, Scott Oster, Ashish Sharma, Metin Gurcan, Tahsin Kurc, Joel Saltz Department of Biomedical Informatics The.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.
VCDE WS in EY2 Where we are, where we’re going ICR WS Teleconference Brian Davis – VCDE WS Lead March 26, 2008.
Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
NCI Center for Biomedical Informatics and Information Technology (CBIIT) The CBIIT is the NCI’s strategic and tactical arm for research information management.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Networking and Health Information Exchange
XML Based Interoperability Components
Wsdl.
WEB SERVICES DAVIDE ZERBINO.
From Innovation to Commercialization Access to Data
Presentation transcript:

Vision and Infrastructure Behind the Cancer Biomedical Informatics Grid Peter A. Covitz, Ph.D. Director, Core Infrastructure National Cancer Institute Center for Bioinformatics

The Center for Bioinformatics is the NCI’s strategic and tactical arm for research information management We collaborate with both intramural and extramural groups Mission to integrate and harmonize disparate research data Production, service-oriented organization. Evaluated based upon customer and partner satisfaction.

NCICB Operations teams Systems and Hardware Support Database Administration Software Development Quality Assurance Technical Writing Application Support and Training caBIG Management

National Cancer Institute 2015 Goal Relieve suffering and death due to cancer by the year 2015

Origins of caBIG Need: Enable investigators and research teams nationwide to combine and leverage their findings and expertise in order to meet NCI 2015 Goal. Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

Scenario from caBIG Strategic Plan A researcher involved in a phase II clinical trial of a new targeted therapeutic for brain tumors observes that cancers derived from one specific tissue progenitor appear to be strongly affected. The trial has been generating proteomic and microarray data. The researcher would like to identify potential biochemical and signaling pathways that might be different between this cell type and other potential progenitors in cancer, deduce whether anything similar has been observed in other clinical trials involving agents known to affect these specific pathways, and identify any studies in model organisms involving tissues with similar pathway activity.

caBIG Governance and Organization

caBIG Governance Models Feudalism X Warlord culture offers little incentive to cooperate

X Forced Collectivization Centralized monolithic approach Governance Models Forced Collectivization X Centralized monolithic approach not flexible or scalable

Alexander Hamilton, James Madison, John Jay Governance Models Balance between central management and local control. Best fit for caBIG Principles. Federalist Papers Alexander Hamilton, James Madison, John Jay Federal Democracy

caBIG Organization Structure caBIG Oversight General Contractor Clinical Trial Mgmt Integrative Cancer Research Tissue Banks & Pathology Tools = Project Working Group Working Group Working Group Architecture Workspace/Project Level Management – Workspace Managers, Project Managers Each Workspace will be overseen by a Workspace Manager who will be responsible for overall budget, contracts and scheduling. Within a Workspace, individual project teams will be led by Project Managers, most of whom will be members of the Cancer Center community. Project Managers will be required to comply with management and reporting standards and processes according to agreed caBIG policy. Intermediate Level Management – Master Contractor A Master Contractor will be engaged to oversee the operational aspects of the caBIG pilot – including the activities of Workspaces and Working Groups. They will provide funding to Centers via subcontract mechanisms. Individual managers to oversee the operation of each Workspace will be appointed by the Master Contractor. The Master Contractor will be directly accountable to the caBIG oversight structure. Governance Level Management – caBIG Oversight Board The Oversight Board will be led by NCICB and NCI senior leadership and will provide strategic and programmatic guidance for the initiative as a whole, with the support and guidance of strategic level planning groups composed of members of the Cancer Center Community. The caBIG Oversight Board will be the final arbiter of all caBIG pilot activities and decisions and will follow processes and procedures consistent with government contract management standards. Vocabularies & Common Data Elements Working Group Working Group Strategic Working Groups

Interoperability ability of a system to access and use the parts or equipment of another system Our terms are Interoperability and Innovation. Lets start with some definitions. Interoperability is a relatively new concept, centered on the idea that one system can use parts of another. If you focus on computer interoperability, the IEEE definition includes two concepts. Functional interoperability – the ability to exchange information – and semantic interoperability – the ability to use the information that has been exchanged. As an aside, standards are frequently thought of as focusing primarily on functional interoperability, whereas the critical sine qua non for health care is semantic interoperability - if you don’t know exactly what the information you received means, having received it is useless, or worse, maybe even dangerous! Syntactic interoperability Semantic interoperability

caBIG Compatibility Guidelines SYNTACTIC SEMANTIC caBIG Compatibility Guidelines

Model-Driven Architecture

MDA Approach Analyze the problem space and develop the artifacts for each scenario Use Cases Use Unified Modeling Language (UML) to standardize model representations and artifacts. Design the system by developing artifacts based on the use cases Class Diagram – Information Model Sequence Diagram – Temporal Behavior Use meta-model tools to generate the code

Limitations of MDA Limited expressivity for semantics No facility for runtime semantic metadata management

MDA plus a whole lot more! caCORE MDA plus a whole lot more!

Bioinformatics Objects Enterprise Vocabulary caCORE S E C U R I T Y Bioinformatics Objects Common Data Elements Enterprise Vocabulary

Use Cases Description Actors Basic Course Alternative Course

Bioinformatics Objects

What do all those data classes and attributes actually mean, anyway? Common Data Elements What do all those data classes and attributes actually mean, anyway? Data descriptors or “semantic metadata” required Computable, commonly structured, reusable units of metadata are “Common Data Elements” or CDEs. NCI uses the ISO/IEC 11179 standard for metadata structure and registration Semantics all drawn from Enterprise Vocabulary Service resources

Enterprise Vocabulary Description Logic Enterprise Vocabulary Concept Code Relationships Preferred Name Definition Synonyms

Semantic metadata example: Agent <name>Taxol</name> <nSCNumber>007</nSCNumber> </Agent>

Why do you need metadata? Class/ Attribute Example Object Data CIA Metadata NCI Metadata Agent A sworn intelligence agent; a spy Chemical compound administered to a human being to treat a disease or condition, or prevent the onset of a disease or condition nSCNumber 007 Identifier given to an intelligence agent by the National Security Council Identifier given to chemical compound by the US Food and Drug Administration Nomenclature Standards Committee name Taxol CIA code name given to intelligence agents Common name of chemical compound used as an agent The segue from here is so how do we utilize this to provide semantic interoperability?

Computable Interoperability Agent C1708:C41243 C1708 Drug name id nSCNumber NDCCode CTEPName approvalDate FDAIndID approver IUPACName fdaCode My model Your model

Tying it all together: The caCORE semantic management framework Desc. Logic CDEs Concept Codes 2223333 C1708 2223866 C1708:C41243 2223869 C1708:C25393 2223870 C1708:C25683 2223871 C1708:C42614 Enterprise Vocabulary Common Data Elements Bioinformatics Objects

Cancer Data Standards Repository ISO/IEC 11179 Registry for Common Data Elements – units of semantic metadata Client for Enterprise Vocabulary: metadata constructed from controlled terminology and annotated with concept codes Precise specification of Classes, Attributes, Data Types, Permissible Values: Strong typing of data objects. Tools: UML Loader: automatically register UML models as metadata components CDE Curation: Fine tune metadata and constrain permissible values with data standards Form Builder: Create standards-based data collection forms CDE Browser: search and export metadata components

Common Security Module Authorization Schema

Web Application Server caCORE Architecture Clients Middleware Data HTTP Clients A P I Web Application Server Biomedical Data Interfaces Java SOAP XML A P I SOAP Clients Common Data Elements Domain Objects [Gene, Disease, etc.] Domain Objects [Gene, Disease, Agent, etc.] Data Access Objects Perl Clients A P I Enterprise Vocabulary Data Access Objects A P I Java Applications Authorization

Development and Deployment DEV………..………………………………..|QA…..…....|STAGE...|PROD PRODUCTION Use Cases Design Test Plans Iterative Development Modeling Unit Testing User Guides System Testing Staging Packaging

caCORE Software Development Kit

caBIG Silver-Compliant System caCORE SDK Components UML Modeling Tool (any with XMI export) Semantic Connector (concept binding utility) UML Loader (model registration in caDSR) Codegen (middleware code generator) Security Adaptor (Common Security Module) caCORE SDK Generates a caBIG Silver-Compliant System

Professional Documentation

caBIG UML Models Completed and in the Works at Cancer Centers for Silver Systems mzXML mass spec proteomics data scanFeatures Proteomics AML Proteomics statml Statistical markup model CAP College of American Pathologists protocols for Breast, Lung, Prostate GoMiner Text mining tool for GO caTISSUE Tissue banking protLIMS Laboratory Information Management System for proteomics BRIDG Clinical Trials caBIO General bioinformatics caDSR ISO11179 metadata EVS Vocabulary caMOD Cancer Models MAGE 1.2 Microarray data CSM Security Common Provenance, DBxrefs caTIES Pathology reports. gridPIR Protein Information

From Silver to Gold: caGrid

caBIG Use Cases Advertisement Discovery Query and Invocation Security Service Provider composes service metadata describing the service and publishes it to grid. Discovery Researcher (or application developer) specifies search criteria describing a service of interest The research submits the discovery request to a discovery service, which identifies a list of services matching the criteria, and returns the list. Query and Invocation Researcher (or application developer) instantiates the grid service and access its resources Security Service Provider restricts access to service based upon authentication and authorization rules

Gold NCI Silver OTHER caBIG SERVICE PROVIDERS Cancer Center TOOLKITS Silver NCI OTHER caBIG SERVICE PROVIDERS Gold Cancer Center Cancer Center Cancer Center Cancer Center Cancer Center

caGrid Service-Oriented Architecture Functions Management GSI CAS myProxy Globus OGSA-DAI GRAM Globus Toolkit BPEL Mobius caCORE Schema Management Metadata Management ID Resolution Workflow Security Resource Management Service Registry Service Service Description Grid Communication Protocol Transport OGSA Compliant - Service Oriented Architecture

Two types of top-level grid services defined Service Data Elements Service Data Elements (SDEs) describe services so clients can discover what they do Two types of top-level grid services defined Data Services Analytical Services Three models for SDEs have been designed Data service-specific Analytical Service-specific Common (all services)

Silver to Gold: Data Services caBIG Gold data service EVS caGrid Infrastructure Query Adaptor Silver Data Service

Data Object Semantics, Metadata, and Schemas Client and service APIs are object oriented, and operate over well-defined and curated data types Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) Object definitions draw from vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

Analytical Services Accept and emit strongly typed data objects that conform to Gold data service requirements Analytical method implementation is defined by service provider Toolkit to assist with creating a caGrid Analytical Service will come with caGrid 0.5 download

Analytical Service Creation Wizard

Method Implementation Insert method code here

Test bed Infrastructure caGrid 0.5 Test Bed

Acknowledgements NCI Andrew von Eschenbach Anna Barker Wendy Patterson OC DCTD DCB DCP DCEG DCCPS CCR Industry Partners SAIC BAH Oracle ScenPro Ekagra Apelon Terrapin Systems Panther Informatics NCICB Ken Buetow Avinash Shanbhag George Komatsoulis Denise Warzel Frank Hartel Sherri De Coronado Dianne Reeves Gilberto Fragoso Jill Hadfield Sue Dubman Leslie Derr

Acknowledgements – caGrid Ohio State Univ. Scott Oster Shannon Hastings Steve Langella Tahsin Kurc Joel Saltz SAIC William Sanchez Manav Kher Rouwei Wu Jijin Yan Tara Akhavan Panther Informatics Brian Gilman Nick Encina Oracle Ram Chilukuri BAH Arumani Manisundaram Acknowledgements – caGrid Georgetown Baris Suzek Scott Shung Colin Freas Nick Marcou Arnie Miles Cathy Wu Robert Clarke Duke Patrick McConnell UPMC Rebecca Crawley Kevin Mitchell NCICB Avinash Shanbhag George Komatsoulis Denise Warzel Frank Hartel TerpSys Gavin Brennan Troy Smith Wei Lu Doug Kanoza

caBIG Participant Community 9Star Research Albert Einstein Ardais Argonne National Laboratory Burnham Institute California Institute of Technology-JPL City of Hope Clinical Trial Information Service (CTIS) Cold Spring Harbor Columbia University-Herbert Irving Consumer Advocates in Research and Related Activities (CARRA) Dartmouth-Norris Cotton Data Works Development Department of Veterans Affairs Drexel University Duke University EMMES Corporation First Genetic Trust Food and Drug Administration Fox Chase Fred Hutchinson GE Global Research Center Georgetown University-Lombardi IBM Indiana University Internet 2 Jackson Laboratory Johns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan Kettering Meyer L. Prentis-Karmanos New York University Northwestern University-Robert H. Lurie Ohio State University-Arthur G. James/Richard Solove Oregon Health and Science University Roswell Park Cancer Institute St Jude Children's Research Hospital Thomas Jefferson University-Kimmel Translational Genomics Research Institute Tulane University School of Medicine University of Alabama at Birmingham University of Arizona University of California Irvine-Chao Family University of California, San Francisco University of California-Davis University of Chicago University of Colorado University of Hawaii University of Iowa-Holden University of Michigan University of Minnesota University of Nebraska University of North Carolina-Lineberger University of Pennsylvania-Abramson University of Pittsburgh University of South Florida-H. Lee Moffitt University of Southern California-Norris University of Vermont University of Wisconsin Vanderbilt University-Ingram Velos Virginia Commonwealth University-Massey Virginia Tech Wake Forest University Washington University-Siteman Wistar Yale University

From Village to City