Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010.

cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010

Agenda General Overview Overview Goal Architecture Product Components Data Sources Integrated Tools caBIO Enterprise Compliance and Conformance Framework (ECCF) Pilot Effort Molecular Annotation (MA) Service

What is caBIO? The cancer Bioinformatics Infrastructure Objects (caBIO) is a repository of molecular biology data for disease research caBIO objects and data types represent entities found in biomedical research such as: Gene, Protein, Chromosome, Sequence, SNP, Pathways, Array Annotations, Trials, and Agents caBIO provides access to data obtained from 10+ sources, updated every other month through Extract, Transform, and Load (ETL) processes caBIO Domain Model

To provide a robust resource for accessing biomedical annotations from curated data sources in an integrated view in support of knowledge discovery caBIO Goal

Architecture Model Driven Architecture (MDA) N-tier architecture with APIs Controlled Vocabularies Registered Metadata Open APIs free the user from needing to understand the underlying system Canonical caCORE SDK-generated data service Semantic interoperability via metadata registration caBIO Architecture

Product Components caBIO Home Page caBIO Portlet caBIO iPhone AppcaBIO APIs

caBIO Home Page The caBIO Home page provides an entry point for browsing caBIO using a Query- by-Example (QBE)-style interface The home page is auto-generated from the caCORE SDK and customized to support unique features such as FreestyleLM (Freestyle Lexical Mine), a full text “Google-like” search facilities Results are displayed in tabular form QBE Search Interface FreestyleLM Search Query Results http://cabioapi.nci.nih.gov/cabio43/Home.action

caBIO Portlet The caBIO Portlet is a UI component that operates within the caGrid web portal as a community service A simple search leveraging the FreestyleLM search capability is provided for keyword searches The simple search leverages the caBIO REST API with AJAX A templated search feature is provided for pre-defined searches categorized by functionality: Genome Range Queries Microarray Annotations Genomic Annotations Pathways Cancer Gene Index Results are displayed in tabular form http://cagrid-portal.nci.nih.gov/web/guest/community Simple Search Templated Search

caBIO iPhone App The caBIO iPhone App provides mobile access to caBIO data in support of “tele- research” The iPhone App queries the caBIO REST API and parses the XML output Native iPhone libraries (written in Objective C) are leveraged for caching in support of performance efficiency The iPhone App UI is based on the caBIO Portlet simple search query and follows Apple Human Interface Guidelines and other interface conventions Options are provided to allow users to choose the caBIO version and maximum number of results The caBIO iPhone App is available for download on the Apple App store Pathway Search and Visualization Simple Search User Options http://gforge.nci.nih.gov/frs/download.php/6593/caBIO_iPhone_Ap p_Demo.wmv

caBIO APIs The caBIO API is built on top of the caCORE SDK service layer which provides a single, common access paradigm to clients using any of the provided interfaces caBIO provided APIs include: Java API REST API Web Services (SOAP) API Grid API (caBIO 4.0 Only).NET API (Prototype Only) Python API (Prototype Only) Service methods are provided for programmatic access to the caBIO server for custom query needs. Service methods include: Simple Searches Nested Searches Detached Criteria Searches HQL Searches SDK Query Object Criteria Searches Obtain knowledge of the objects in the domain space Formulate the query criteria using the domain objects Establish a connection to the server Submit the query objects and specify the objects returned Use and manipulate the result set caBIO Usage and Access

Primary Data Sources Biological EntitycaBIO ObjectData Source Gene, NucleicAcidSequence, Clone NCBI’s Entrez Gene, Unigene SNP NCBI’s dbSNP Protein, ProteinSequence Uniprot-Swissprot Cytoband UCSC’s Genome Sequencing Center Bio-MarkerMarkerNCBI’s UniSTS cDNA libraries from MGC, ORESTES and dbEST collections, GO Ontology Associations and HomoloGene Library, GeneOntology, HomologousAssociationCGAP Chromosomal Locations of ESTs and mRNAs NucleicAcidSequencePhysicalLocationUCSC’s Genome Viewer Chromosomal Locations of Genes and Markers GenePhysicalLocation, MarkerPhysicalLocationNCBI’s MapView Clinical Trial Protocol NCI CTEP Location of SNPs and Genes in terms of Markers/Cytobands SNP/Gene-RelativeLocation SNP/Gene-CytogeneticLocation Data Transformation of Primary Objects Genes-Diseases-AgentsGene, GeneAgentAssociation, Agent, DiseaseOntology, Evidence Cancer Gene Index; Canada Drug Bank PathwayPathway Interactions, Pathway Participant, Pathway Physical Participant, Pathway Physical Entity, Pathway Protein Entity NCI Pathway Interaction Database (PID)

Sample Data Transformation Data Exemplar (Unigene)caBIO ID Hs.2Gene’s Unigene clusterId Taxon’s name (Human) TITLE N-acetyltransferase 2 (arylamine N-acetyltransferase)Gene’s name GENE NAT2 (Gene)Gene’s symbol CYTOBAND 8p22Start and End Cytoband (GeneCytogeneticLocation) LOCUSLINK 10DatabaseCrossReference’s CrossReferenceId (CrossReferenceDatabase = “Locus Link”) CHROMOSOME 8Chromosome’s chromosomeNumber STS ACC=PMC310725P3 UNISTS=272646Associate a Gene with a UniSTS Marker SEQUENCE ACC=BC067218.1; NID=g45501306; PID=g45501307; SEQTYPE=mRNA SEQUENCE ACC=BG569293.1; NID=g13576946; CLONE=IMAGE:4722596; END=5'; LID=6989; SEQTYPE=EST SEQUENCE ACC=AJ581147.1; NID=g73759744; PID=g73759745; SEQTYPE=mRNA NucleicAcidSequence’s accessionNumber, accessionNumberVersion (mRNA or EST) Clone’s name, Library-Id and Type (IMAGE) CloneRelativeLocation’s Type (5’) and associated sequence accession Number Data Exemplar (dbSNP)caBIO rs242 | human | 9606 | in-del | genotype=NO | submitterlink=YES updated 2007-07-10 12:5 SNP’s dbSNPId SNP | alleles='-/T' | het=0.18 | se(het)=0.24 VAL | validated=YES | min_prob=? | max_prob=? | notwithdrawn | byFrequency CTG | assembly=reference | chr=1 | chr-pos=20742048 | NT_004610.18 | ctg-start=3693803 | ctg-end=3693803 | loctype=2 | orient=+ SNP’s alleleA, SNP’s alleleB, SNP’s validationStatus SNPPhysicalLocation’s start, SNPPhysicalLocation’s stop Associated Chromosome (chr 1)

Integrated Tools: caIntegrator2 Allows users to use the caBIO FreestyleLM search API to find genes of interest to use in gene expression queries.

Integrated Tools: Rembrandt Allows users to search for patient samples that have genes expressed in cellular pathway(s) Leverages caBIO to obtain a list of cellular pathways and pathway details Rembrandt Gene Expression Query caBIO Pathways

Integrated Tools: geWorkbench The Marker Annotations component enables the retrieval of biological annotation information from caBIO for a collection of genes. For every gene, the following data can be retrieved: A set of gene- disease and gene- compound associations derived from the literature articles (The Cancer Gene Index) A set of pathways containing the gene. A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab.

Integrated Tools: caMOD Leverages caBIO to obtain gene information for targeted modifications Obtains additional drug information from caBIO in support of therapeutic approaches

caBIO Enterprise Conformance and Compliance Framework (ECCF) Pilot Molecular Annotation (MA) Service

Goals for the caBIO ECCF Pilot Leverage caBIO as a reference implementation of the NCI CBIIT ECCF Develop a set of ECCF-based Molecular Annotation Service specifications Implement and deploy a service based on service specifications Provide guidelines to assist other NCI CBIIT products in leveraging ECCF processes and developing ECCF artifacts Provide input on the ECCF Implementation Guide Develop guidelines that are pragmatic and useful Identify list of tools and infrastructure that will assist in the development of services and specifications 1 2 3

19 ECCF Artifact Matrix Enterprise/ Business Viewpoint Information Viewpoint Computational Viewpoint Engineering Viewpoint Computation Independent Model (CIM) Platform Independent Model (PIM) Platform Specific Model (PSM)

20 RM-ODP Viewpoints Enterprise/Business Viewpoint Purpose / Scope Business cases /Storyboards Industry standards Information Viewpoint Information Models (DAM, PIM, PSM) Semantic Profiles Computational Viewpoint Capabilities / Operations Functional Profiles Engineering Viewpoint Non-functional Requirements Deployment model

21 Levels of Abstraction Computation Independent Model (CIM) Service Scope and Description Document CIM Service Specification Document (CIMSS) Platform Independent Model (PIM) PIM Service Specification Document (PIMSS) Platform Specific Model (PSM) PSM Service Specification Document (PSMSS) Implementation Service Integration Guide Deployable System

22 Enterprise Service Specification Process

23 Scope and Service Description “The Molecular Annotation Service provides a set of interfaces for the annotation of experimental or other types of data with molecular information. ” “The purpose of Molecular Annotations service is to provide specifications for a set of molecular annotations that may be integrated with user-facing applications.” “The development of a common, reusable set of interfaces provided by this service will facilitate standardization, integration, and interoperability between various systems that provide and consume molecular annotations.”

24 Mapping to the LSBAM LS BAM Use CaseService Mapping Description Characterize/Organize the Data The molecular annotations service supports the Characterize/Organize the Data use cases by providing annotations for molecular entities associated with data. For example, in characterizing experimental data, a researcher may look up reference annotations with the service to find which genes are mapped to the microarray used in the experiment. Integrate Data SetsThe molecular annotations service supports the Integrate Data Sets use case as it will provide the capability of retrieving annotations from the service to use as join points, or to display as an additional reference. Annotate Findings/ResultsThe molecular annotations service supports the Annotate Findings/Results use case as the service provides direct support for obtaining information associated with molecular entities to assist in annotating findings/results. Identify and Review Knowledge Bases and /or Databases The molecular annotations service supports the Identify and Review Knowledge Bases and/or Databases use case as the service provides support for knowledge discovery via the integration of annotations across disparate data sources.

25 MA Service CIM: Business Storyboards OutlineBioinformatics developer wants to retrieve all diseases and agents associated with a target gene DetailJohn Smith is developing a web site that allows researchers to find all of the diseases associated with a specific gene. The site will also allow researchers to select a gene and obtain a list of agents (drugs) used to target that gene. By querying the molecular annotations service, John’s web application can retrieve a list of diseases and agents associated with a gene.

26 MA Service CIM: Scope ItemsScope / Out of Scope Source Provide the ability to retrieve molecular annotationsScopeMolecular Annotation Service Scope and Description Provide the ability functional associations, cellular locations, and biological processes associated with a gene ScopeMolecular Annotation Service Scope and Description Provide the ability to retrieve disease and agents associated with a gene ScopeMolecular Annotation Service Scope and Description Provide the ability to retrieve variations associated with a geneScopeMolecular Annotation Service Scope and Description ………

27 MA Service CIM: Semantic Profiles Semantic Profile No. Semantic Profile Name Constrained Information Model Semantic Profile Description MA-SP1Molecular Annotation Domain Analysis Model LSDAM v1.1 The molecular annotation service will use semantics from the Life Science DAM. The following classes are included in the project-specific DAM (grouped by sub-domain):  Gene  NucleicAcidSequenceFeature  MolecularSequenceAnnotation  GeneticVariation  SingleNucleotidePolymorphism  NucleicAcidPhysicalLocation  …

28 MA Service CIM: Project Analysis Model

29 MA Service CIM: Capabilities NameDescription Get Genes By Symbol or AliasReturns the genes named by the specified gene symbol or gene alias Get Genes By Microarray ReporterReturns the genes associated with the specified microarray reporter Get Functional AssociationsReturns annotations describing a gene's molecular function Get Cellular LocationsReturns annotations describing a gene's location within a cell Get Biological ProcessesReturns annotations describing a gene's role in biological processes Get Disease AssociationsReturns findings about a gene's role in diseases Get Agent AssociationsReturns findings about agents which target a given gene Get Structural VariationsReturns variations which are located on a given gene Get Homologous GenesReturns a gene’s homologous genes in a specified organism

30 MA Service CIM: Capability Details Name [M]Get Genes By Symbol or Alias Description [M]Returns the gene named by the specified gene symbol or gene alias and the gene’s organism Pre-Conditions [M]None Security Pre-Conditions [M]None Inputs [M]Gene Symbol or Alias Organism Identifier Outputs [M]A collection of Gene objects Post-Conditions [O]None Exception Conditions [M]No matching genes found Aspects left for Technical Bindings [O] Format and data type for the Organism Identifier Notes [O]NA

31 MA Service CIM: Functional Profiles Functional Profile No. Functional Profile Name Functional Profile Description Capability Names MA-FP1Gene Annotation Query Profile Contains all the capabilities for retrieving gene annotations  Get Genes By Symbol or Alias  Get Genes By Microarray Reporter  Get Functional Associations  Get Cellular Locations  Get Biological Processes  Get Disease Associations  Get Agent Associations  Get Structural Variations  Get Homologous Genes

32 MA Service CIM: Conformance Profiles Conformance NoMA-CP1 Conformance NameLSDAM-based Gene Annotation Conformance Profile DescriptionThis conformance profile defines the functionality for the Gene Annotation Service using LSDAM semantics Usage ContextThis profile would be used by a researcher wishing to access gene annotations MandatoryNo Functional Profile(s)MA-FP1 : Gene Annotation Query Profile Semantic Profile(s)MA-SP1 : LSDAM v1.1

33 MA Service CIM: Activity Diagrams

34 MA Service CIM: Conformance Statements NameTypeViewpointDescriptionTest method Query Performance ObligationEngineeringThe MA service should provide a response within 0.5 seconds to support a synchronous UI based client Test cases to include performance testing. Additional Functionality PermissionComputationalThe MA service can provide additional functionality other than specified in these specifications Design Review Semantic Model ObligationInformationalThe MA service must provide traceability to classes in the LSDAM where applicable. Design Review Data TypesObligationInformationalThe MA service must conform to NCI’s constrained list of ISO 21090 data types. Design Review Functional Profiles ObligationComputationalFunctional Profiles shall be deployed as functional wholes. Ignoring or omitting functional behavior defined within a functional profile is not permitted, nor is diverging from the detailed functional specifications provided in Section 4. 1.Design Review 2.Test cases

35 MA Service PIM: Relation to CIM Conceptual Functional Service Specification Name Conceptual Functional Service Specification Version Description & Link to the Conceptual Functional Service Specification Molecular Annotation Computation Independent Service Specification 0.0.6 https://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/con ceptual/CIMSS_Molecular_Annotation_Service.doc Deviation from the Conceptual Functional Service Specification Reason for Deviation None

36 MA Service PIM: Relationship to Standards StandardsDescription LSBAM v1.0Service conforms to NCI’s Life Science Business Architecture Model LSDAM v1.1Service conforms to the Life Sciences DAM version 1.1 LSPIM v0.1Service conforms to the Life Sciences PIM version 0.1 BRIDG v3.0.1Service conforms to the NCI’s version of the Biomedical Research Integrated Domain Group v3.0.1 ISO 21090Service conforms to NCI’s version of ISO 21090 data types HUGO Gene SymbolsService leverages gene symbols from the Human Genome Organization MGI Gene SymbolsService leverages gene symbols from the International Committee on Standardized Genetic Nomenclature for MiceInternational Committee on Standardized Genetic Nomenclature for Mice

37 MA Service PIM: Information Model PIM is based on the LSPIM but it may be constrained and localized: Add any attributes that are needed Remove attributes which are unnecessary Add associations Add new classes LSPIMMAPIM

38 PIM Example: NucleicAcidPhysicalLocation TraceAttribute NameTypeDescription LSDAMstartCoordinateINTThe start coordinate of the range (inclusive), given as an integer offset from the start of the sequence. LSDAMendCoordinateINTThe end coordinate of the range (inclusive), given as an integer offset from the start of the sequence. NewfeatureTypeCDThe type of gene feature located, e.g. GENE, CDS, UTR, RNA, PSEUDO. NewassemblySTThe genome assembly which this location is defined in reference to.

39 Traceability for Information Models

40 MA Service PIM: Operations Operation No. Operation NameInterface NameOperation Description MA-INF1- OP1 getGenesBySymbolMAGeneAnnotationQueryReturns the genes named by the specified gene symbol or gene alias MA-INF1- OP2 getGenesByMicroarrayReporterMAGeneAnnotationQueryReturns the genes associated with the specified microarray reporter MA-INF1- OP3 getFunctionalAssociationsMAGeneAnnotationQueryReturns annotations describing a gene’s molecular function MA-INF1- OP4 getCellularLocationsMAGeneAnnotationQueryReturns annotations describing a gene’s location within a cell …… ……

41 MA Service PIM: Operation Behavior Description Behavior Description  Client supplies a GeneSearchCriteria instance with a gene symbol or alias and an Organism to search within  The case of the symbol or alias is ignored  If the Organism is null then all Organisms are searched  The system returns the matching Gene object(s), if any Pre-ConditionsNone Security Pre- Conditions None Inputs  GeneSearchCriteria Outputs  Return:  Fully-populated instance(s) of the Gene class Post-Conditions None Exception Conditions  None Additional DetailsNone NotesNone getGenesBySymbol Returns the genes named by the specified gene symbol or gene alias and the gene’s organism

42 MA Service PIM: Search Criteria (Inputs)

43 MA Service PSM: Relation to PIM Platform Independent Model Name and Service Specification Platform Independent Model and Service Specification Version Description & Link to the Platform Independent Model and Service Specification Molecular Annotation Service Platform Independent Model and Service Specification 0.1.2http://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/logical/PIMSS_ Molecular_Annotation_Service.doc

44 MA Service PSM: Information Model Example

45 MA Service PSM: Service Interface Implemented Interface No. Supported Interface NameInterface DescriptionLink MA-INF1MAGeneAnnotationQueryIncludes all operations for retrieving gene annotations. N/A DS-INF1Data Service QueryContains the CQL query operation https://ncisvn.nci.nih.gov/sv n/cagrid/branches/caGrid- 1_3_release/cagrid-1- 0/caGrid/projects/data/sche ma/Data/DataService.wsdl

46 MA Service: Implementation 1.Leverage new releases of: NCI localized ISO 21090 caCORE SDK caGrid / Introduce 2.Create new MA database and map to MA PSM 3.Populate MA database with data from caBIO database 4.Generate caCORE-like system from the MA PSM 5.Generate grid data service with Introduce

MA Service Database The MA Service database was populated using the Pentaho Data Integration Extract, Transform, Load (ETL) tool Data was transformed from the caBIO database and external data sources and loaded into the MA database Pentaho ETL Tool and MA Data Workflow

48 MA Service: Deployment Plan

caBIO Resources caBIO Home Page http://cabioapi.nci.nih.gov/cabio43/ caBIO Portlet http://cagrid-portal.nci.nih.gov/web/guest/community Wiki https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page cBIO ECCF Pilot Wiki https://wiki.nci.nih.gov/display/caBIO/caBIO+ECCF GForge Site https://gforge.nci.nih.gov/projects/cabiodb/ Download Site http://ncicb.nci.nih.gov/download/cabiolicenseagreement.jsp Technical Guide http://gforge.nci.nih.gov/docman/view.php/51/18313/caBIO_4.3_TechnicalGuide.pd fhttp://gforge.nci.nih.gov/docman/view.php/51/18313/caBIO_4.3_TechnicalGuide.pd f Subscribe to the caBIO Users Listserv for data refresh and software release announcements https://list.nih.gov/archives/cabio_users.html

Acknowledgements Development Jim Sun - SAIC Konrad Rokicki - SAIC Liqun Qi – SAIC Testing Matt Tiller – ESAC, Inc. Quy Phung – ESAC, Inc. David Li – ESAC, Inc. Documentation Carolyn Klinger Management Juli Klemm – NCI CBIIT Sharon Gaheen – SAIC Avinash Shanbhag – NCICB ECCF Team Baris Suzek - Georgetown Brian Davis – 3 rd Millennium Elain Freund- 3 rd Millennium Systems Support Norval Johnson - TerpSys Sriram Kalyanasundaram - TerpSys

Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010.

Similar presentations

Presentation on theme: "Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010.

Similar presentations

Presentation on theme: "Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010."— Presentation transcript:

Similar presentations

About project

Feedback