Cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010.

Slides:

Advertisements

Similar presentations

2/11/2014 8:44 AM The CDA Release 3 Specification Stack September 2009 HL7 Services-Aware Enterprise Architecture Framework (SAEAF)

Advertisements

CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.

CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.

Looking ahead: caGrid community requirements in the context of caGrid 2.0 Lawrence Brem 7 February 2011.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.

CaGrid Service Metadata Scott Oster - Ohio State

1 ECCF Training 2.0 Introduction ECCF Training Working Group January 2011.

Roles and Responsibilities Jahangheer Shaik. Service Specification Specification requires development of three inter-related documents CIM, PIM and PSM.

OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.

Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.

1 ISO Data Types Adoption - The Plan and the Tools Architecture/VCDE Joint Face-to-Face June 3, 2010 St. Louis, Missouri Sichen Liu CBIIT Core Infrastructure.

OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.

Sept 13-15, 2004IHE Interoperability Workshop 1 Integrating the Healthcare Enterprise Overview of IHE IT Infrastructure Patient Synchronized Applications.

Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.

1 ECCF Training 2.0 Platform Specific Model (PSM) ECCF Training Working Group January 2011.

Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.

Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.

 BRIDG R3.0.2 was released in August 2010  The BRIDG Model passed the initial ISO Joint Initiative Council ballot as a Draft International Standard (DIS)

Introduction to MDA (Model Driven Architecture) CYT.

CaBIG Semantic Infrastructure 2.0: Supporting TBPT Needs Dave Hau, M.D., M.S. Acting Director, Semantic Infrastructure NCI Center for Biomedical Informatics.

LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.

Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.

Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.

GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Copyright OpenHelix. No use or reproduction without express written consent1.

Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,

1 ECCF Training 2.0 Guidance for the Logical Perspective Specification ECCF Training Working Group January 2011.

Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.

GeWorkbench Highlights caBIG ® Molecular Analysis Tools Knowledge Center AACR Annual Meeting, April 3, 2011.

CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.

CaNanoLab Users Group February 2012 Use of Informatics to Expedite and Validate the Application of Nanotechnology in Biomedicine.

1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.

Browsing the Genome Using Genome Browsers to Visualize and Mine Data.

1 LS DAM Overview and the Specimen Core February 16, 2012 Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Elaine Freund,

1 Open Ontology Repository: Architecture and Interfaces Ken Baclawski Northeastern University 1.

1 ECCF Training Computationally Independent Model (CIM) ECCF Training Working Group March 2011.

The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.

1 SAIF-Effects on Data Service Specifications Baris Suzek Georgetown University Architecture/VCDE Joint Face-to-Face June,3, 2010 St. Louis, Missouri.

1 ECCF Training 2.0 Implemental Perspective (IP) ECCF Training Working Group January 2011.

1 ECCF Training 2.0 Introduction ECCF Training Working Group January 2011.

Copyright OpenHelix. No use or reproduction without express written consent1.

CaGrid Overview and Core Services caGrid Knowledge Center February 2011.

SPOREs Specialized Programs of Research Excellence Ryan Landy Qinyan Pan -SAIC 2003 NCICB Jamboree.

1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.

GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.

Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.

Introduction to caIntegrator caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.

1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.

1 ECCF Training 2.0 Guidance for the Platform Independent Model (PIM) ECCF Training Working Group January 2011.

What is NCIA? National Cancer Imaging Archive Searchable repository of in vivo cancer images in DICOM format Publicly available at no cost over the Internet.

1 ECCF Training 2.0 Guidance for the Logical Perspective Specification ECCF Training Working Group January 2011.

GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.

Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.

1 ECCF Training Computationally Independent Model (CIM) ECCF Training Working Group January 2011.

Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.

1 ECCF Training Computationally Independent Model (CIM) ECCF Training Working Group March 2011.

The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.

Welcome to the caBIG Community! The cancer Biomedical Informatics Grid (caBIG ® ) offers more than 120 open source tools, technologies and infrastructure.

CaBIO iPhone App Konrad Rokicki SAIC. Why a native app? Current web UIs are cumbersome to use from a mobile device This could be addressed by developing.

1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.

1 HL7 SAIF Enterprise Conformance and Compliance Framework (ECCF) Overview Baris E. Suzek Bob Freimuth VCDE Monthly Meeting December, 2010.

National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.

0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.

Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.

International Planetary Data Alliance Registry Project Update September 16, 2011.

CaNanoLab Users Group April 2012 Use of Informatics to Expedite and Validate the Application of Nanotechnology in Biomedicine.

Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.

Microsoft Azure Data Catalog

Presentation transcript:

cancer Bioinformatics Infrastructure Objects (caBIO) Overview caBIG® 2010 World’s Fair September 2010

Agenda General Overview Overview Goal Architecture Product Components Data Sources Integrated Tools caBIO Enterprise Compliance and Conformance Framework (ECCF) Pilot Effort Molecular Annotation (MA) Service

What is caBIO? The cancer Bioinformatics Infrastructure Objects (caBIO) is a repository of molecular biology data for disease research caBIO objects and data types represent entities found in biomedical research such as: Gene, Protein, Chromosome, Sequence, SNP, Pathways, Array Annotations, Trials, and Agents caBIO provides access to data obtained from 10+ sources, updated every other month through Extract, Transform, and Load (ETL) processes caBIO Domain Model

To provide a robust resource for accessing biomedical annotations from curated data sources in an integrated view in support of knowledge discovery caBIO Goal

Architecture Model Driven Architecture (MDA) N-tier architecture with APIs Controlled Vocabularies Registered Metadata Open APIs free the user from needing to understand the underlying system Canonical caCORE SDK-generated data service Semantic interoperability via metadata registration caBIO Architecture

Product Components caBIO Home Page caBIO Portlet caBIO iPhone AppcaBIO APIs

caBIO Home Page The caBIO Home page provides an entry point for browsing caBIO using a Query- by-Example (QBE)-style interface The home page is auto-generated from the caCORE SDK and customized to support unique features such as FreestyleLM (Freestyle Lexical Mine), a full text “Google-like” search facilities Results are displayed in tabular form QBE Search Interface FreestyleLM Search Query Results

caBIO Portlet The caBIO Portlet is a UI component that operates within the caGrid web portal as a community service A simple search leveraging the FreestyleLM search capability is provided for keyword searches The simple search leverages the caBIO REST API with AJAX A templated search feature is provided for pre-defined searches categorized by functionality: Genome Range Queries Microarray Annotations Genomic Annotations Pathways Cancer Gene Index Results are displayed in tabular form Simple Search Templated Search

caBIO iPhone App The caBIO iPhone App provides mobile access to caBIO data in support of “tele- research” The iPhone App queries the caBIO REST API and parses the XML output Native iPhone libraries (written in Objective C) are leveraged for caching in support of performance efficiency The iPhone App UI is based on the caBIO Portlet simple search query and follows Apple Human Interface Guidelines and other interface conventions Options are provided to allow users to choose the caBIO version and maximum number of results The caBIO iPhone App is available for download on the Apple App store Pathway Search and Visualization Simple Search User Options p_Demo.wmv

caBIO APIs The caBIO API is built on top of the caCORE SDK service layer which provides a single, common access paradigm to clients using any of the provided interfaces caBIO provided APIs include: Java API REST API Web Services (SOAP) API Grid API (caBIO 4.0 Only).NET API (Prototype Only) Python API (Prototype Only) Service methods are provided for programmatic access to the caBIO server for custom query needs. Service methods include: Simple Searches Nested Searches Detached Criteria Searches HQL Searches SDK Query Object Criteria Searches Obtain knowledge of the objects in the domain space Formulate the query criteria using the domain objects Establish a connection to the server Submit the query objects and specify the objects returned Use and manipulate the result set caBIO Usage and Access

Primary Data Sources Biological EntitycaBIO ObjectData Source Gene, NucleicAcidSequence, Clone NCBI’s Entrez Gene, Unigene SNP NCBI’s dbSNP Protein, ProteinSequence Uniprot-Swissprot Cytoband UCSC’s Genome Sequencing Center Bio-MarkerMarkerNCBI’s UniSTS cDNA libraries from MGC, ORESTES and dbEST collections, GO Ontology Associations and HomoloGene Library, GeneOntology, HomologousAssociationCGAP Chromosomal Locations of ESTs and mRNAs NucleicAcidSequencePhysicalLocationUCSC’s Genome Viewer Chromosomal Locations of Genes and Markers GenePhysicalLocation, MarkerPhysicalLocationNCBI’s MapView Clinical Trial Protocol NCI CTEP Location of SNPs and Genes in terms of Markers/Cytobands SNP/Gene-RelativeLocation SNP/Gene-CytogeneticLocation Data Transformation of Primary Objects Genes-Diseases-AgentsGene, GeneAgentAssociation, Agent, DiseaseOntology, Evidence Cancer Gene Index; Canada Drug Bank PathwayPathway Interactions, Pathway Participant, Pathway Physical Participant, Pathway Physical Entity, Pathway Protein Entity NCI Pathway Interaction Database (PID)

Sample Data Transformation Data Exemplar (Unigene)caBIO ID Hs.2Gene’s Unigene clusterId Taxon’s name (Human) TITLE N-acetyltransferase 2 (arylamine N-acetyltransferase)Gene’s name GENE NAT2 (Gene)Gene’s symbol CYTOBAND 8p22Start and End Cytoband (GeneCytogeneticLocation) LOCUSLINK 10DatabaseCrossReference’s CrossReferenceId (CrossReferenceDatabase = “Locus Link”) CHROMOSOME 8Chromosome’s chromosomeNumber STS ACC=PMC310725P3 UNISTS=272646Associate a Gene with a UniSTS Marker SEQUENCE ACC=BC ; NID=g ; PID=g ; SEQTYPE=mRNA SEQUENCE ACC=BG ; NID=g ; CLONE=IMAGE: ; END=5'; LID=6989; SEQTYPE=EST SEQUENCE ACC=AJ ; NID=g ; PID=g ; SEQTYPE=mRNA NucleicAcidSequence’s accessionNumber, accessionNumberVersion (mRNA or EST) Clone’s name, Library-Id and Type (IMAGE) CloneRelativeLocation’s Type (5’) and associated sequence accession Number Data Exemplar (dbSNP)caBIO rs242 | human | 9606 | in-del | genotype=NO | submitterlink=YES updated :5 SNP’s dbSNPId SNP | alleles='-/T' | het=0.18 | se(het)=0.24 VAL | validated=YES | min_prob=? | max_prob=? | notwithdrawn | byFrequency CTG | assembly=reference | chr=1 | chr-pos= | NT_ | ctg-start= | ctg-end= | loctype=2 | orient=+ SNP’s alleleA, SNP’s alleleB, SNP’s validationStatus SNPPhysicalLocation’s start, SNPPhysicalLocation’s stop Associated Chromosome (chr 1)

Integrated Tools: caIntegrator2 Allows users to use the caBIO FreestyleLM search API to find genes of interest to use in gene expression queries.

Integrated Tools: Rembrandt Allows users to search for patient samples that have genes expressed in cellular pathway(s) Leverages caBIO to obtain a list of cellular pathways and pathway details Rembrandt Gene Expression Query caBIO Pathways

Integrated Tools: geWorkbench The Marker Annotations component enables the retrieval of biological annotation information from caBIO for a collection of genes. For every gene, the following data can be retrieved: A set of gene- disease and gene- compound associations derived from the literature articles (The Cancer Gene Index) A set of pathways containing the gene. A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab.

Integrated Tools: caMOD Leverages caBIO to obtain gene information for targeted modifications Obtains additional drug information from caBIO in support of therapeutic approaches

caBIO Enterprise Conformance and Compliance Framework (ECCF) Pilot Molecular Annotation (MA) Service

Goals for the caBIO ECCF Pilot Leverage caBIO as a reference implementation of the NCI CBIIT ECCF Develop a set of ECCF-based Molecular Annotation Service specifications Implement and deploy a service based on service specifications Provide guidelines to assist other NCI CBIIT products in leveraging ECCF processes and developing ECCF artifacts Provide input on the ECCF Implementation Guide Develop guidelines that are pragmatic and useful Identify list of tools and infrastructure that will assist in the development of services and specifications 1 2 3

19 ECCF Artifact Matrix Enterprise/ Business Viewpoint Information Viewpoint Computational Viewpoint Engineering Viewpoint Computation Independent Model (CIM) Platform Independent Model (PIM) Platform Specific Model (PSM)

20 RM-ODP Viewpoints Enterprise/Business Viewpoint Purpose / Scope Business cases /Storyboards Industry standards Information Viewpoint Information Models (DAM, PIM, PSM) Semantic Profiles Computational Viewpoint Capabilities / Operations Functional Profiles Engineering Viewpoint Non-functional Requirements Deployment model

21 Levels of Abstraction Computation Independent Model (CIM) Service Scope and Description Document CIM Service Specification Document (CIMSS) Platform Independent Model (PIM) PIM Service Specification Document (PIMSS) Platform Specific Model (PSM) PSM Service Specification Document (PSMSS) Implementation Service Integration Guide Deployable System

22 Enterprise Service Specification Process

23 Scope and Service Description “The Molecular Annotation Service provides a set of interfaces for the annotation of experimental or other types of data with molecular information. ” “The purpose of Molecular Annotations service is to provide specifications for a set of molecular annotations that may be integrated with user-facing applications.” “The development of a common, reusable set of interfaces provided by this service will facilitate standardization, integration, and interoperability between various systems that provide and consume molecular annotations.”

24 Mapping to the LSBAM LS BAM Use CaseService Mapping Description Characterize/Organize the Data The molecular annotations service supports the Characterize/Organize the Data use cases by providing annotations for molecular entities associated with data. For example, in characterizing experimental data, a researcher may look up reference annotations with the service to find which genes are mapped to the microarray used in the experiment. Integrate Data SetsThe molecular annotations service supports the Integrate Data Sets use case as it will provide the capability of retrieving annotations from the service to use as join points, or to display as an additional reference. Annotate Findings/ResultsThe molecular annotations service supports the Annotate Findings/Results use case as the service provides direct support for obtaining information associated with molecular entities to assist in annotating findings/results. Identify and Review Knowledge Bases and /or Databases The molecular annotations service supports the Identify and Review Knowledge Bases and/or Databases use case as the service provides support for knowledge discovery via the integration of annotations across disparate data sources.

25 MA Service CIM: Business Storyboards OutlineBioinformatics developer wants to retrieve all diseases and agents associated with a target gene DetailJohn Smith is developing a web site that allows researchers to find all of the diseases associated with a specific gene. The site will also allow researchers to select a gene and obtain a list of agents (drugs) used to target that gene. By querying the molecular annotations service, John’s web application can retrieve a list of diseases and agents associated with a gene.

26 MA Service CIM: Scope ItemsScope / Out of Scope Source Provide the ability to retrieve molecular annotationsScopeMolecular Annotation Service Scope and Description Provide the ability functional associations, cellular locations, and biological processes associated with a gene ScopeMolecular Annotation Service Scope and Description Provide the ability to retrieve disease and agents associated with a gene ScopeMolecular Annotation Service Scope and Description Provide the ability to retrieve variations associated with a geneScopeMolecular Annotation Service Scope and Description ………

27 MA Service CIM: Semantic Profiles Semantic Profile No. Semantic Profile Name Constrained Information Model Semantic Profile Description MA-SP1Molecular Annotation Domain Analysis Model LSDAM v1.1 The molecular annotation service will use semantics from the Life Science DAM. The following classes are included in the project-specific DAM (grouped by sub-domain):  Gene  NucleicAcidSequenceFeature  MolecularSequenceAnnotation  GeneticVariation  SingleNucleotidePolymorphism  NucleicAcidPhysicalLocation  …

28 MA Service CIM: Project Analysis Model

29 MA Service CIM: Capabilities NameDescription Get Genes By Symbol or AliasReturns the genes named by the specified gene symbol or gene alias Get Genes By Microarray ReporterReturns the genes associated with the specified microarray reporter Get Functional AssociationsReturns annotations describing a gene's molecular function Get Cellular LocationsReturns annotations describing a gene's location within a cell Get Biological ProcessesReturns annotations describing a gene's role in biological processes Get Disease AssociationsReturns findings about a gene's role in diseases Get Agent AssociationsReturns findings about agents which target a given gene Get Structural VariationsReturns variations which are located on a given gene Get Homologous GenesReturns a gene’s homologous genes in a specified organism

30 MA Service CIM: Capability Details Name [M]Get Genes By Symbol or Alias Description [M]Returns the gene named by the specified gene symbol or gene alias and the gene’s organism Pre-Conditions [M]None Security Pre-Conditions [M]None Inputs [M]Gene Symbol or Alias Organism Identifier Outputs [M]A collection of Gene objects Post-Conditions [O]None Exception Conditions [M]No matching genes found Aspects left for Technical Bindings [O] Format and data type for the Organism Identifier Notes [O]NA

31 MA Service CIM: Functional Profiles Functional Profile No. Functional Profile Name Functional Profile Description Capability Names MA-FP1Gene Annotation Query Profile Contains all the capabilities for retrieving gene annotations  Get Genes By Symbol or Alias  Get Genes By Microarray Reporter  Get Functional Associations  Get Cellular Locations  Get Biological Processes  Get Disease Associations  Get Agent Associations  Get Structural Variations  Get Homologous Genes

32 MA Service CIM: Conformance Profiles Conformance NoMA-CP1 Conformance NameLSDAM-based Gene Annotation Conformance Profile DescriptionThis conformance profile defines the functionality for the Gene Annotation Service using LSDAM semantics Usage ContextThis profile would be used by a researcher wishing to access gene annotations MandatoryNo Functional Profile(s)MA-FP1 : Gene Annotation Query Profile Semantic Profile(s)MA-SP1 : LSDAM v1.1

33 MA Service CIM: Activity Diagrams

34 MA Service CIM: Conformance Statements NameTypeViewpointDescriptionTest method Query Performance ObligationEngineeringThe MA service should provide a response within 0.5 seconds to support a synchronous UI based client Test cases to include performance testing. Additional Functionality PermissionComputationalThe MA service can provide additional functionality other than specified in these specifications Design Review Semantic Model ObligationInformationalThe MA service must provide traceability to classes in the LSDAM where applicable. Design Review Data TypesObligationInformationalThe MA service must conform to NCI’s constrained list of ISO data types. Design Review Functional Profiles ObligationComputationalFunctional Profiles shall be deployed as functional wholes. Ignoring or omitting functional behavior defined within a functional profile is not permitted, nor is diverging from the detailed functional specifications provided in Section 4. 1.Design Review 2.Test cases

35 MA Service PIM: Relation to CIM Conceptual Functional Service Specification Name Conceptual Functional Service Specification Version Description & Link to the Conceptual Functional Service Specification Molecular Annotation Computation Independent Service Specification ceptual/CIMSS_Molecular_Annotation_Service.doc Deviation from the Conceptual Functional Service Specification Reason for Deviation None

36 MA Service PIM: Relationship to Standards StandardsDescription LSBAM v1.0Service conforms to NCI’s Life Science Business Architecture Model LSDAM v1.1Service conforms to the Life Sciences DAM version 1.1 LSPIM v0.1Service conforms to the Life Sciences PIM version 0.1 BRIDG v3.0.1Service conforms to the NCI’s version of the Biomedical Research Integrated Domain Group v3.0.1 ISO 21090Service conforms to NCI’s version of ISO data types HUGO Gene SymbolsService leverages gene symbols from the Human Genome Organization MGI Gene SymbolsService leverages gene symbols from the International Committee on Standardized Genetic Nomenclature for MiceInternational Committee on Standardized Genetic Nomenclature for Mice

37 MA Service PIM: Information Model PIM is based on the LSPIM but it may be constrained and localized: Add any attributes that are needed Remove attributes which are unnecessary Add associations Add new classes LSPIMMAPIM

38 PIM Example: NucleicAcidPhysicalLocation TraceAttribute NameTypeDescription LSDAMstartCoordinateINTThe start coordinate of the range (inclusive), given as an integer offset from the start of the sequence. LSDAMendCoordinateINTThe end coordinate of the range (inclusive), given as an integer offset from the start of the sequence. NewfeatureTypeCDThe type of gene feature located, e.g. GENE, CDS, UTR, RNA, PSEUDO. NewassemblySTThe genome assembly which this location is defined in reference to.

39 Traceability for Information Models

40 MA Service PIM: Operations Operation No. Operation NameInterface NameOperation Description MA-INF1- OP1 getGenesBySymbolMAGeneAnnotationQueryReturns the genes named by the specified gene symbol or gene alias MA-INF1- OP2 getGenesByMicroarrayReporterMAGeneAnnotationQueryReturns the genes associated with the specified microarray reporter MA-INF1- OP3 getFunctionalAssociationsMAGeneAnnotationQueryReturns annotations describing a gene’s molecular function MA-INF1- OP4 getCellularLocationsMAGeneAnnotationQueryReturns annotations describing a gene’s location within a cell …… ……

41 MA Service PIM: Operation Behavior Description Behavior Description  Client supplies a GeneSearchCriteria instance with a gene symbol or alias and an Organism to search within  The case of the symbol or alias is ignored  If the Organism is null then all Organisms are searched  The system returns the matching Gene object(s), if any Pre-ConditionsNone Security Pre- Conditions None Inputs  GeneSearchCriteria Outputs  Return:  Fully-populated instance(s) of the Gene class Post-Conditions None Exception Conditions  None Additional DetailsNone NotesNone getGenesBySymbol Returns the genes named by the specified gene symbol or gene alias and the gene’s organism

42 MA Service PIM: Search Criteria (Inputs)

43 MA Service PSM: Relation to PIM Platform Independent Model Name and Service Specification Platform Independent Model and Service Specification Version Description & Link to the Platform Independent Model and Service Specification Molecular Annotation Service Platform Independent Model and Service Specification 0.1.2http://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/logical/PIMSS_ Molecular_Annotation_Service.doc

44 MA Service PSM: Information Model Example

45 MA Service PSM: Service Interface Implemented Interface No. Supported Interface NameInterface DescriptionLink MA-INF1MAGeneAnnotationQueryIncludes all operations for retrieving gene annotations. N/A DS-INF1Data Service QueryContains the CQL query operation n/cagrid/branches/caGrid- 1_3_release/cagrid-1- 0/caGrid/projects/data/sche ma/Data/DataService.wsdl

46 MA Service: Implementation 1.Leverage new releases of: NCI localized ISO caCORE SDK caGrid / Introduce 2.Create new MA database and map to MA PSM 3.Populate MA database with data from caBIO database 4.Generate caCORE-like system from the MA PSM 5.Generate grid data service with Introduce

MA Service Database The MA Service database was populated using the Pentaho Data Integration Extract, Transform, Load (ETL) tool Data was transformed from the caBIO database and external data sources and loaded into the MA database Pentaho ETL Tool and MA Data Workflow

48 MA Service: Deployment Plan

caBIO Resources caBIO Home Page caBIO Portlet Wiki cBIO ECCF Pilot Wiki GForge Site Download Site Technical Guide fhttp://gforge.nci.nih.gov/docman/view.php/51/18313/caBIO_4.3_TechnicalGuide.pd f Subscribe to the caBIO Users Listserv for data refresh and software release announcements

Acknowledgements Development Jim Sun - SAIC Konrad Rokicki - SAIC Liqun Qi – SAIC Testing Matt Tiller – ESAC, Inc. Quy Phung – ESAC, Inc. David Li – ESAC, Inc. Documentation Carolyn Klinger Management Juli Klemm – NCI CBIIT Sharon Gaheen – SAIC Avinash Shanbhag – NCICB ECCF Team Baris Suzek - Georgetown Brian Davis – 3 rd Millennium Elain Freund- 3 rd Millennium Systems Support Norval Johnson - TerpSys Sriram Kalyanasundaram - TerpSys