Download presentation
Presentation is loading. Please wait.
Published byAbraham Burke Modified over 8 years ago
1
Collaborative and Open Source Software Development NCI’s caBIG™ Collaborative Environment Sharon Gaheen, SAIC Program Manager Himanso Sahni, SAIC Chief Architect SAIC Health Solutions
2
SAIC Proprietary 2 2 2 2 NCI caBIG™ NCI caBIG™: What is it? A virtual web of interconnected data, individuals, and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise NCI caBIG™: What is it? A virtual web of interconnected data, individuals, and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise caBIG™ Pilot to Enterprise Launched as a Pilot Phase in 2004. Initiated Enterprise Phase in 2007 involving over 46 NCI-designated cancer centers and 16 community cancer centers. caBIG™ Structure Clinical Trials Mgmt Systems caBIG™ Goals Connect cancer research communities through shareable and interoperable infrastructure Develop standards to facilitate information sharing Build or adapt tools for collecting, integrating, and analyzing biomedical data caBIG™ Stakeholders Cancer researchers, clinicians, patients caBIG™ Principles Federated, open-development, open-access, open-standards Integrative Cancer Research Tissue Bank & Pathology Tools In Vivo Imaging Vocabulary and CDEsArchitecture Data Sharing and Intellectual Capital Strategic Planning Training
3
SAIC Proprietary 3 3 3 3 Pre-caBIG™ Environment Stovepipe applications operating as information “silos” Heterogeneous distributed systems within and across centers Lack of systems interoperability Limited API access to data Insufficient standards facilitating semantic data exchange Duplication of efforts Reduced productivity via non-re-use of software and services Lack of knowledge of existing applications and repositories Resistance to data sharing Inefficient environment for novel scientific discoveries Clinical Data Images Tissue & Pathology Data Integrative Cancer Research Data
4
SAIC Proprietary 4 4 4 4 caBIG™ Collaborative Environment Encourages application and information sharing Enforces standards enabling semantic interoperability Provides governance activities for establishing software best practices Provides costs savings due to software re-use and assists in portfolio management Enables novel scientific discoveries though information sharing Provides hope for personalized cancer treatment and care “What is the best therapy for a patient based on the patient’s molecular signature” Clinical Data HL7 CDISC BRIDG Images DICOM Tissue & Pathology Data CAP Integrative Cancer Research Data MAGE BioPAX GO mzXML caBIG™
5
SAIC Proprietary 5 5 5 5 caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services Best Practices Information Modeling Software Development App & Service Testing caBIG™ Repository caBIG™ Governance
6
SAIC Proprietary 6 6 6 6 caBIG™ Enterprise Network Infrastructure Grid Framework Grid Framework Metadata Services Metadata Services Workflow Engine Workflow Engine Security Services Security Services
7
SAIC Proprietary 7 7 7 7 Grid Framework The caBIG architecture employs a federated model leveraging and extending open source grid technologies (Globus WS) Federated technologies are ideal for collaborative environments of heterogeneous distributed systems caBIG leverages a data service grid (caGrid) to facilitate data and analytical tool sharing across the research enterprise Facilities are provided for programmatic and non- programmatic service advertisement, discovery, invocation, query, and security caBIG grid extensions Toolkits for grid service generation and deployment (Introduce), workflow creation, and security administration Portal for service discovery and query APIs for programmatic access to the grid and grid services Interfaces to NCI metadata services enabling semantic service discovery Support for service metadata including information on contributing research center High level services supporting data transfer, workflow, and federated query processing Security interfaces to connect local and global security models
8
SAIC Proprietary 8 8 8 8 caGrid Access caGrid Portal (Liferay Portal) Application Access (caNanoLab) caGrid Toolkits and Wiki
9
SAIC Proprietary 9 9 9 9 Metadata Services Metadata Development The NCI provides services for developing and leveraging common data elements and vocabulary NCI Metadata Services include: –Enterprise Vocabulary Services (EVS) – NCI services and resources for controlled vocabulary –Cancer Data Standards Repository (caDSR) - A database and tool set used to create, edit and deploy common data elements The NCI provides tools supporting terminology development (Stanford’s Protégé, NCI BioMedGT Wiki-New federated ontology development Wiki) Semantic-Based Application Development The NCI’s Semantic Integration Workbench (SIW) assists in developing semantically interoperable applications –The SIW maps information model class names and attributes to the NCI’s EVS enabling concept re-use across applications –All new concepts are registered in the NCI’s EVS –Information models are stored in the NCI’s cancer Data Standards Repository (caDSR) Semantically interoperable applications are also developed leveraging standard data exchange formats (HL7, DICOM) Applications leverage ontology browsers allowing users to utilize controlled vocabulary UML EVS Annotated Model Semantic Integration Workbench Semantic Integration Workbench UML Loader UML Loader NCI EVS Browser NCI Semantic-based Application Development Application Concept Browser and API
10
SAIC Proprietary 10 Workflow Engine The Taverna Workbench allows users to construct complex workflows consisting of multiple types of components (processors) Components may be located on different machines and are orchestrated by Taverna. The results are aggregated and displayed in the workbench. Taverna provides a set of Service Provider Interfaces (SPI) which are extending points for developers to provide additional functionality for a specific purpose A plug-in (taverna-gt4-processor) is available for caGrid users to add grid services in a Taverna workflow The plug-in allows a Taverna workflow to be aware of the caGrid services, and could orchestrate the grid services in caGrid
11
SAIC Proprietary 11 Security Services Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) Provides services and tools for the administration and enforcement of security policy in an enterprise Developed on top of the Globus Toolkit and extends the Grid Security Infrastructure (GSI) GAARDS core components: Authentication Service – A local service for managing and enforcing access control policy authorization such as the NCI’s Common Security Module (CSM) Dorian – A grid service for the provisioning and management of grid users accounts. Allows users to use existing credentials (external to the grid) to authenticate to the grid. Credential Delegation Service (CDS) – A WS compliant grid service that enables users/services to delegate grid credentials to other users/services. Grid Trust Service (GTS) - A grid-wide mechanism for maintaining and provisioning a federated trust fabric consisting of trusted certificate authorities, such that the grid services may make authentication decisions against up to date information. Grid Grouper - Provides a group-based authorization solution for the Grid
12
SAIC Proprietary 12 caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services
13
SAIC Proprietary 13 Data Services Data services provide a means for sharing and integrating biomedical data Applications provide API access to data services at the local and grid level caBIG toolkits facilitate the generation of Java, REST, SOAP, and grid APIs Data services leverage and extend metadata services in support of semantic interoperability Data services also include tools, services, and protocols for data extraction, transformation, and loading (ETL) The caAdapter mapping tool provides services to map models to information standards and transform data via application of the mapping file caAdapter is leveraged to transform clinical data-to-HL7, HL7 v2 data-to-HL7 v3, object models-to-data models, and data models-to-XML standards Standard procedures are leveraged and re-used for data extraction, metadata mapping, data transformation, quality control, and data loading of diverse translational research data types Source (CSV) Specification HL7 v3 XML HL7 v3 Specification 3 4 Source Data (CSV) 1 Clinical Data System Mapping File caAdapter Mapping Tool HL7 Normative Edition 2006 2 Transformation Service
14
SAIC Proprietary 14 Analytical Services Analytical services allow for the application of analysis tools to complex and integrative data sets Analytical services take as input and return strongly typed and semantically harmonized data types Analytical services can participate as services within in a workflow Example analytical services include: High Order Analysis (HOA) Services –Scalable run time analysis service enabling high order analysis of large volume multi-dimensional data –Uses JMS/R-Server/R-Binary GenePattern –A powerful scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data Analysis Node Analysis Node JBoss App Server JBoss MQ (JMS) JBoss App Server JBoss MQ (JMS) Analysis Request Queue Analysis Result Queue Translational Research Portal Analysis Server Client Manager Analysis Server Client Manager
15
SAIC Proprietary 15 caBIG™ Enterprise Network Infrastructure Grid Framework Metadata Services Workflow Engine Security Services Data Services Analytical Services Best Practices Information Modeling Software Development App & Service Testing caBIG™ Repository caBIG™ Governance
16
SAIC Proprietary 16 Information Modeling caBIG™ best software practices include the development of scientific and functional use cases and information models (UML) to describe system behavior Concepts (class names, attributes) derived from information models are maintained under controlled terminology in the EVS and mapped to existing concepts for standardization Information Models (UML) models are annotated with EVS concepts and loaded into the caDSR metadata repository to facilitate re-use APIs (Java, SOAP, HTTP-XML) are generated from annotated information models using the caCORE SDK The caCORE SDK facilitates the development of services following caBIG principles of open access, open architecture, and open source caCORE SDK generated services are connected to the grid via caGrid toolkits Applications are created leveraging APIs and grid services
17
SAIC Proprietary 17 Requirements, Analysis & Design Software Development Use Cases/ Wireframes Information Modeling (EA - UML) Software Design Specification Implementation Middle-Tier (caCORE SDK) Semantic Interoperability (SIW, EVS, caDSR) Security (CSM, UPT) Testing Unit Testing (JUnit, Cobertura) Systems Testing (Selenium) Performance Testing (JMeter) Presentation- Tier (STRUTS, AJAX, etc.) Data-Tier (Hibernate, MySQL, caAdapter) Grid Tier (Introduce) UAT Testing Iterative Development Methodology (RUP) Environment and Configuration Software Dev Env (Eclipse IDE) Web Containers (JBOSS, Tomcat, Apache-AXIS) Build/Deployment (AnthillPro, CruiseControl) Configuration & Portfolio Mgmt (SVM, GForge)
18
SAIC Proprietary 18 Presentation Tier & Re-use Standard Templates Charting Libraries (JFreeCharts) List Management Spreadsheet Manipulation
19
SAIC Proprietary 19 Utilities Framework Interfaces caCORE SDK UML System Properties Clients EVS Annotated Model Application Server Semantic Integration Workbench UML Loader Java APIWeb Services Domain Objects Data Access Objects Delegation Service App Server Log/NotifyTest Utility DB
20
SAIC Proprietary 20 App & Service Testing Selenium Open source test automation tool for executing scenarios against web applications to validate browser compatibility and system functionality CruiseControl Open source framework for automating a continuous build process JMeter An Apache Java desktop application designed to load test functional behavior and measure performance Appscan Commercial vulnerability scanner tool which can detect many common server misconfigurations as well as vulnerabilities Cobertura Open source Java tool that calculates the percentage of code accessed by tests. Used to identify which parts of a Java program are lacking test coverage
21
SAIC Proprietary 21 caBIG™ Repository Subversion (SVN) Subversion is a version control system used to maintain current and historical versions of files such as source code, web pages, and documentation The goal of SVN is to be a mostly-compatible successor to the widely used Concurrent Versions System (CVS) GForge GForge has tools to assist teams in collaboration including message forums and mailing lists; tools to create and control access to Source Code Management repositories like CVS and SVN. GForge automatically creates a repository and controls access to it depending on the role settings of the project. Ivy Ivy is a popular dependency manager focusing on flexibility and simplicity
22
SAIC Proprietary 22 caBIG™ Governance caBIG Compatibility guidelines are provided for guidance on achieving various levels of semantic interoperability Tools are engineered to enable the development of services adhering to standards and best practices Best practices will facilitate use of applications in a regulatory environment (21 CFR Part 11 compliance) caBIG™ Compatibility Guidelines
23
SAIC Proprietary 23 Lessons Learned Semantic interoperability requires an investment with great payoffs Technology choices should be use case driven Re-use requires investment in coordination, training, and technical support Coordinated deployment schedules and technology stack Product training required Technical support required Project sites (GForge, Wiki’s) assists in organizing project artifacts and project tracking and in cross project collaboration Automate where possible and early on Automated build/deployment scripts, test scripts Invest in hardware for performance efficiency Initial startup in training workforce in enterprise level technologies is required Economies of scale achieved in training investment
24
SAIC Proprietary 24 caBIG™ Today Over 45 available tools advertised on the caBIG™ web site 17 Silver Level Compatible 11 CTMS Related Tools 3 Tissue Bank and Pathology Tools 27 Integrative Cancer Research Tools 2 In Vivo Imaging Tools Over 45 grid services available 27 Data Services 18 Analytical Services Over 122 participant centers and organizations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.