Collaborative and Open Source Software Development NCI’s caBIG™ Collaborative Environment Sharon Gaheen, SAIC Program Manager Himanso Sahni, SAIC Chief.

Slides:



Advertisements
Similar presentations
Introduction The cancerGrid metadata registry (cgMDR) has proved effective as a lightweight, desktop solution, interoperable with caDSR, targeted at the.
Advertisements

Medical Image Resource Center. What is MIRC? Medical Image Resource Center Makes it easier to locate and share electronic medical images and related information.
27 June 2005caBIG an initiative of the National Cancer Institute, NIH, DHHS caBIG the cancer Biomedical Informatics Grid Arumani Manisundaram caBIG - Project.
Open Grid Forum 19 January 31, 2007 Chapel Hill, NC Stephen Langella Ohio State University Grid Authentication and Authorization with.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
1 caAdapter Jan 24, caAdapter The caAdapter is an open source tool that facilitates HL7 version 3 message building, parsing and validation based.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Building an Operational Enterprise Architecture and Service Oriented Architecture Best Practices Presented by: Ajay Budhraja Copyright 2006 Ajay Budhraja,
LexGrid for cBIO Division of Biomedical Informatics Mayo Clinic Rochester, MN.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
CaGrid Service Metadata Scott Oster - Ohio State
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The cancer Biomedical Informatics Grid™ (caBIG™): In Vivo Imaging Workspace Projects Fred Prior, Ph.D. Mallinckrodt Institute of Radiology Washington University.
Version Enterprise Architect Redefines Modeling in 2006 An Agile and Scalable modeling solution Provides Full Lifecycle.
Product Offering Overview CONFIDENTIAL AND PROPRIETARY Copyright ©2004 Universal Business Matrix, LLC All Rights Reserved The duplication in printed or.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
Technical Introduction to caGrid Service Development caGrid 1.3 Justin Permar caGrid Knowledge Center
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CaGrid Executive Introduction caGrid 1.3 Justin Permar caGrid Knowledge Center kc.nci.nih.gov/CaGrid/KC.
Department of Biomedical Informatics Development of Ontology-anchored Grid-based Data Services to Facilitate Integrative Clinical and Translational Science.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CaGrid 2.0 December What is caGrid 2.0??? Provides a patch for caGrid 1.x to support SHA2 OSGi implementation of WSRF on the new technical stack.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Imaging Workspace An Overview and Roadmap Eliot L. Siegel, MD Imaging Workspace Lead SME January 23, 2008.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
material assembled from the web pages at
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
OEI’s Services Portfolio December 13, 2007 Draft / Working Concepts.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
Ashish Sharma, Tony Pan, Barla Cambazoglu, Joel Saltz Ohio State University, Columbus, OH (ashish, tpan, October 10, 2007 caBIG In Vivo.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
What is NCIA? National Cancer Imaging Archive Searchable repository of in vivo cancer images in DICOM format Publicly available at no cost over the Internet.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Design for a High Performance, Configurable caGrid Data Services Platform Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact:
CaGrid 1.0 Security Infrastructure Stephen Langella, Scott Oster, Shannon Hastings, David Ervin, Joshua Phillips, Vinay Kumar, Tahsin Kurc, Joel Saltz.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Imaging Workspace An Overview and Roadmap Eliot L. Siegel, MD Imaging Workspace Lead SME January 23, 2008.
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
1 caBIG®-aligned Enterprise Metadata Infrastructure to Support Commercial Clinical Trials Management Software: A Pilot Implementation September 11, 2009.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
CTTI PROJECT Emory University, Quality Assurance and Review Center (QARC) and Washington University in St. Louis.
© Akaza Research, LLC : 1 :: 10 Professional open source for clinical research.
Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Fred Prior, Ph.D. Mallinckrodt Institute of Radiology
Remedy Integration Strategy Leverage the power of the industry’s leading service management solution via open APIs February 2018.
Presentation transcript:

Collaborative and Open Source Software Development NCI’s caBIG™ Collaborative Environment Sharon Gaheen, SAIC Program Manager Himanso Sahni, SAIC Chief Architect SAIC Health Solutions

SAIC Proprietary NCI caBIG™ NCI caBIG™: What is it? A virtual web of interconnected data, individuals, and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise NCI caBIG™: What is it? A virtual web of interconnected data, individuals, and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise caBIG™ Pilot to Enterprise Launched as a Pilot Phase in Initiated Enterprise Phase in 2007 involving over 46 NCI-designated cancer centers and 16 community cancer centers. caBIG™ Structure Clinical Trials Mgmt Systems caBIG™ Goals Connect cancer research communities through shareable and interoperable infrastructure Develop standards to facilitate information sharing Build or adapt tools for collecting, integrating, and analyzing biomedical data caBIG™ Stakeholders Cancer researchers, clinicians, patients caBIG™ Principles Federated, open-development, open-access, open-standards Integrative Cancer Research Tissue Bank & Pathology Tools In Vivo Imaging Vocabulary and CDEsArchitecture Data Sharing and Intellectual Capital Strategic Planning Training

SAIC Proprietary Pre-caBIG™ Environment  Stovepipe applications operating as information “silos” Heterogeneous distributed systems within and across centers  Lack of systems interoperability Limited API access to data Insufficient standards facilitating semantic data exchange  Duplication of efforts Reduced productivity via non-re-use of software and services Lack of knowledge of existing applications and repositories  Resistance to data sharing  Inefficient environment for novel scientific discoveries Clinical Data Images Tissue & Pathology Data Integrative Cancer Research Data

SAIC Proprietary caBIG™ Collaborative Environment  Encourages application and information sharing  Enforces standards enabling semantic interoperability  Provides governance activities for establishing software best practices  Provides costs savings due to software re-use and assists in portfolio management  Enables novel scientific discoveries though information sharing  Provides hope for personalized cancer treatment and care “What is the best therapy for a patient based on the patient’s molecular signature” Clinical Data HL7 CDISC BRIDG Images DICOM Tissue & Pathology Data CAP Integrative Cancer Research Data MAGE BioPAX GO mzXML caBIG™

SAIC Proprietary caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services Best Practices Information Modeling Software Development App & Service Testing caBIG™ Repository caBIG™ Governance

SAIC Proprietary caBIG™ Enterprise Network Infrastructure Grid Framework Grid Framework Metadata Services Metadata Services Workflow Engine Workflow Engine Security Services Security Services

SAIC Proprietary Grid Framework  The caBIG architecture employs a federated model leveraging and extending open source grid technologies (Globus WS) Federated technologies are ideal for collaborative environments of heterogeneous distributed systems  caBIG leverages a data service grid (caGrid) to facilitate data and analytical tool sharing across the research enterprise  Facilities are provided for programmatic and non- programmatic service advertisement, discovery, invocation, query, and security caBIG grid extensions Toolkits for grid service generation and deployment (Introduce), workflow creation, and security administration Portal for service discovery and query APIs for programmatic access to the grid and grid services Interfaces to NCI metadata services enabling semantic service discovery Support for service metadata including information on contributing research center High level services supporting data transfer, workflow, and federated query processing Security interfaces to connect local and global security models

SAIC Proprietary caGrid Access caGrid Portal (Liferay Portal) Application Access (caNanoLab) caGrid Toolkits and Wiki

SAIC Proprietary Metadata Services  Metadata Development The NCI provides services for developing and leveraging common data elements and vocabulary NCI Metadata Services include: –Enterprise Vocabulary Services (EVS) – NCI services and resources for controlled vocabulary –Cancer Data Standards Repository (caDSR) - A database and tool set used to create, edit and deploy common data elements The NCI provides tools supporting terminology development (Stanford’s Protégé, NCI BioMedGT Wiki-New federated ontology development Wiki)  Semantic-Based Application Development The NCI’s Semantic Integration Workbench (SIW) assists in developing semantically interoperable applications –The SIW maps information model class names and attributes to the NCI’s EVS enabling concept re-use across applications –All new concepts are registered in the NCI’s EVS –Information models are stored in the NCI’s cancer Data Standards Repository (caDSR) Semantically interoperable applications are also developed leveraging standard data exchange formats (HL7, DICOM) Applications leverage ontology browsers allowing users to utilize controlled vocabulary UML EVS Annotated Model Semantic Integration Workbench Semantic Integration Workbench UML Loader UML Loader NCI EVS Browser NCI Semantic-based Application Development Application Concept Browser and API

SAIC Proprietary 10 Workflow Engine  The Taverna Workbench allows users to construct complex workflows consisting of multiple types of components (processors) Components may be located on different machines and are orchestrated by Taverna. The results are aggregated and displayed in the workbench.  Taverna provides a set of Service Provider Interfaces (SPI) which are extending points for developers to provide additional functionality for a specific purpose  A plug-in (taverna-gt4-processor) is available for caGrid users to add grid services in a Taverna workflow The plug-in allows a Taverna workflow to be aware of the caGrid services, and could orchestrate the grid services in caGrid

SAIC Proprietary 11 Security Services  Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) Provides services and tools for the administration and enforcement of security policy in an enterprise Developed on top of the Globus Toolkit and extends the Grid Security Infrastructure (GSI)  GAARDS core components: Authentication Service – A local service for managing and enforcing access control policy authorization such as the NCI’s Common Security Module (CSM) Dorian – A grid service for the provisioning and management of grid users accounts. Allows users to use existing credentials (external to the grid) to authenticate to the grid. Credential Delegation Service (CDS) – A WS compliant grid service that enables users/services to delegate grid credentials to other users/services. Grid Trust Service (GTS) - A grid-wide mechanism for maintaining and provisioning a federated trust fabric consisting of trusted certificate authorities, such that the grid services may make authentication decisions against up to date information. Grid Grouper - Provides a group-based authorization solution for the Grid

SAIC Proprietary 12 caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services

SAIC Proprietary 13 Data Services  Data services provide a means for sharing and integrating biomedical data Applications provide API access to data services at the local and grid level caBIG toolkits facilitate the generation of Java, REST, SOAP, and grid APIs Data services leverage and extend metadata services in support of semantic interoperability  Data services also include tools, services, and protocols for data extraction, transformation, and loading (ETL) The caAdapter mapping tool provides services to map models to information standards and transform data via application of the mapping file caAdapter is leveraged to transform clinical data-to-HL7, HL7 v2 data-to-HL7 v3, object models-to-data models, and data models-to-XML standards Standard procedures are leveraged and re-used for data extraction, metadata mapping, data transformation, quality control, and data loading of diverse translational research data types Source (CSV) Specification HL7 v3 XML HL7 v3 Specification 3 4 Source Data (CSV) 1 Clinical Data System Mapping File caAdapter Mapping Tool HL7 Normative Edition Transformation Service

SAIC Proprietary 14 Analytical Services  Analytical services allow for the application of analysis tools to complex and integrative data sets  Analytical services take as input and return strongly typed and semantically harmonized data types  Analytical services can participate as services within in a workflow  Example analytical services include: High Order Analysis (HOA) Services –Scalable run time analysis service enabling high order analysis of large volume multi-dimensional data –Uses JMS/R-Server/R-Binary GenePattern –A powerful scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data Analysis Node Analysis Node JBoss App Server JBoss MQ (JMS) JBoss App Server JBoss MQ (JMS) Analysis Request Queue Analysis Result Queue Translational Research Portal Analysis Server Client Manager Analysis Server Client Manager

SAIC Proprietary 15 caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services Best Practices Information Modeling Software Development App & Service Testing caBIG™ Repository caBIG™ Governance

SAIC Proprietary 16 Information Modeling  caBIG™ best software practices include the development of scientific and functional use cases and information models (UML) to describe system behavior  Concepts (class names, attributes) derived from information models are maintained under controlled terminology in the EVS and mapped to existing concepts for standardization  Information Models (UML) models are annotated with EVS concepts and loaded into the caDSR metadata repository to facilitate re-use  APIs (Java, SOAP, HTTP-XML) are generated from annotated information models using the caCORE SDK The caCORE SDK facilitates the development of services following caBIG principles of open access, open architecture, and open source  caCORE SDK generated services are connected to the grid via caGrid toolkits  Applications are created leveraging APIs and grid services

SAIC Proprietary 17 Requirements, Analysis & Design Software Development Use Cases/ Wireframes Information Modeling (EA - UML) Software Design Specification Implementation Middle-Tier (caCORE SDK) Semantic Interoperability (SIW, EVS, caDSR) Security (CSM, UPT) Testing Unit Testing (JUnit, Cobertura) Systems Testing (Selenium) Performance Testing (JMeter) Presentation- Tier (STRUTS, AJAX, etc.) Data-Tier (Hibernate, MySQL, caAdapter) Grid Tier (Introduce) UAT Testing Iterative Development Methodology (RUP) Environment and Configuration Software Dev Env (Eclipse IDE) Web Containers (JBOSS, Tomcat, Apache-AXIS) Build/Deployment (AnthillPro, CruiseControl) Configuration & Portfolio Mgmt (SVM, GForge)

SAIC Proprietary 18 Presentation Tier & Re-use Standard Templates Charting Libraries (JFreeCharts) List Management Spreadsheet Manipulation

SAIC Proprietary 19 Utilities Framework Interfaces caCORE SDK UML System Properties Clients EVS Annotated Model Application Server Semantic Integration Workbench UML Loader Java APIWeb Services Domain Objects Data Access Objects Delegation Service App Server Log/NotifyTest Utility DB

SAIC Proprietary 20 App & Service Testing  Selenium Open source test automation tool for executing scenarios against web applications to validate browser compatibility and system functionality  CruiseControl Open source framework for automating a continuous build process  JMeter An Apache Java desktop application designed to load test functional behavior and measure performance  Appscan Commercial vulnerability scanner tool which can detect many common server misconfigurations as well as vulnerabilities  Cobertura Open source Java tool that calculates the percentage of code accessed by tests. Used to identify which parts of a Java program are lacking test coverage

SAIC Proprietary 21 caBIG™ Repository  Subversion (SVN) Subversion is a version control system used to maintain current and historical versions of files such as source code, web pages, and documentation The goal of SVN is to be a mostly-compatible successor to the widely used Concurrent Versions System (CVS)  GForge GForge has tools to assist teams in collaboration including message forums and mailing lists; tools to create and control access to Source Code Management repositories like CVS and SVN. GForge automatically creates a repository and controls access to it depending on the role settings of the project.  Ivy Ivy is a popular dependency manager focusing on flexibility and simplicity

SAIC Proprietary 22 caBIG™ Governance  caBIG Compatibility guidelines are provided for guidance on achieving various levels of semantic interoperability  Tools are engineered to enable the development of services adhering to standards and best practices  Best practices will facilitate use of applications in a regulatory environment (21 CFR Part 11 compliance) caBIG™ Compatibility Guidelines

SAIC Proprietary 23 Lessons Learned  Semantic interoperability requires an investment with great payoffs  Technology choices should be use case driven  Re-use requires investment in coordination, training, and technical support Coordinated deployment schedules and technology stack Product training required Technical support required  Project sites (GForge, Wiki’s) assists in organizing project artifacts and project tracking and in cross project collaboration  Automate where possible and early on Automated build/deployment scripts, test scripts  Invest in hardware for performance efficiency  Initial startup in training workforce in enterprise level technologies is required Economies of scale achieved in training investment

SAIC Proprietary 24 caBIG™ Today  Over 45 available tools advertised on the caBIG™ web site 17 Silver Level Compatible 11 CTMS Related Tools 3 Tissue Bank and Pathology Tools 27 Integrative Cancer Research Tools 2 In Vivo Imaging Tools  Over 45 grid services available 27 Data Services 18 Analytical Services  Over 122 participant centers and organizations