Download presentation
Presentation is loading. Please wait.
Published byRatna Hartanti Darmali Modified over 6 years ago
1
Cancer Bioinformatics Infrastructure Objects (caBIO)
Architecting the Future of Genomics Himanso Sahni & Scott Gustafson December 7, 2001 Science Applications International Corporation (SAIC)
2
caBIO Team caBIO was developed as “open source” technology by the National Cancer Institute Center for Bioinformatics (NCICB) Software developers teamed with scientists from the NCICB Cancer Genome Anatomy Project (CGAP) to define caBIO requirements and use cases, and model the genomics domain
3
Purpose “To provide an overview of caBIO, its current role in genomics, and future transitions towards intelligent genomics”
4
The Genomic Past Little or no data integration Lack of standards
Little or no software code re-use Duplication of efforts Duplication of data Lack of common vocabulary Maintenance nightmares
5
caBIO ! The Genomic Challenge INTELLIGENCE! Data Integration Standards
Ease of Maintenance caBIO ! Normalized Data Code Re-Use Common Vocabulary Increased Productivity INTELLIGENCE!
6
caBIO Agenda caBIO Overview Development Process Architecture
Data Sources Usage Benefits Future
7
caBIO Overview caBIO is a standards based set of genomic components
caBIO objects simulate the behavior of actual genomic components such as genes, chromosomes, sequences, libraries, clones, ontologies, etc. caBIO objects provide access to a variety of genomic data sources including GenBank, Unigene, LocusLink, Homologene, Ensemble, GoldenPath, and NCICB’s CGAP (Cancer Genome Anatomy Project) data repositories caBIO is “open source”
8
Development Process Iterative Approach
An iterative software design and development approach leveraging concepts from RUP and XP was used Initial Vision and High Level Analysis form the overall goals of the project Iterative functional design and implementation processes allows for rapid and segmented development of the application
9
Development Process High Level Use Cases
Initial high level Use Cases were developed for gene components Additional high level Use Cases were added as additional functional areas are mapped (e.g. pathways, therapies, microarray objects, mouse models) Detailed Use Cases are derived from working with domain experts in requirements gathering.
10
Development Process Detailed Use Cases
Detailed Use Cases were derived from teaming with domain experts during requirements gathering Detailed Use Cases include Characteristic Information Main Success Scenario Extensions Related Information
11
Development Process Class Diagrams
Class Diagrams were developed using the Unified Modeling Language (UML) The class diagram is continually updated as new objects and relationships are defined
12
Development Process Objects
Each object consists of many attributes, operations, and associations For example a gene object has many attributes such as symbols and keywords. A gene also has operations including getName and getSequnces, and associations which indicate that a gene “has” sequences and “has a ” location
13
Development Process Sequence Diagrams
Additional methods are discovered as the objects are further explored modeling the use cases
14
Development Process Object Development
Java Code is written to implement the Object Model Documentation (API) is built from the Java Code
15
Development Process Application Development
Applications leveraging caBIO can be developed using multiple programming languages
16
Development Process Example Application
caBIO was used to add intelligence to BioCarta pathways Gene, pathway, and expression objects were used Users can select a pathway, and view pathway components that are mutated or expressed Users can drill down to detailed gene information
17
Architecture Clients Presentation Layer Object Layer Data Sources
Web Server Servlet Container JSPs HTML/HTTP Databases Browsers Servlets Object Managers Data Access Objects SOAP Engine JDBC XML/HTTP External Java Apps RMI UI Bean HTTP Domain Objects URLs Agents RDF XML Builder Genes Chromosomes XSLT Engine FTP Other Apps Tissues Clusters SOAP Sequences Libraries Flat Files DTDs XML Docs XSL Style Sheet Diseases Internal Java Apps Other
18
Architecture The core infrastructure exhibits an n-tiered architecture with client interfaces, server components, back-end objects and data sources Clients (browsers, applications) can receive information (HTML and XML) from back-end objects over HTTP Client applications can also communicate with back-end objects via Java RMI (Java applications) Non-Java based applications will communicate via SOAP Server components communicate with back-end objects via Java RMI Back-end objects communicate directly with data sources (database, URLs, flat files) A UDDI registry will be configured to advertise services RDF is currently used to advertise services to crawlers and agents
19
Architecture Client Technology
Industry Standard Web Browsers - Netscape 4+ and IE 4+ Java Applications – Applications that implement the Java programming language, an object-oriented language which provides portability and many other features Non-Java Applications - Applications usually written in Perl, C, etc. Non-Java applications use SOAP clients to interface with caBIO SOAP::Lite for Perl – A collection of Perl modules which provides a simple and lightweight interface to the Simple Object Access Protocol (SOAP) Agents - Software programs that perform a function for a user in a trusted fashion, can learn or be taught its function, and can perform actions for the user without permission RDF – The Resource Description Framework is a foundation that advertises Web services via the Web. RDF is used to describe the content and services available at a particular Web site. (passive) UDDI – Universal Description, Discovery and Integration is a foundation to enable businesses to discover and transact business with each other using preferred applications (active)
20
Architecture Presentation Layer Technology
Jakarta Tomcat - Servlet+JSP Engine which is a subproject of the Jakarta Project JSPs - Java Server Pages are web pages with Java embedded in the HTML to incorporate dynamic content in the page Java Servlets – Server-side Java programs that web servers can run to generate content in response to client requests Java Beans – Reusable software components that work with Java XML – The Extensible Markup Language is a universal format for structured data on the Web XSL/XSLT – The Extensible Stylesheet Language is a language for expressing stylesheets. XSL Transformations (XSLT) is a language for transforming XML documents XLink – The XML Linking Language allows elements to be inserted into XML documents in order to create and describe links between resources DOM – The Document Object Model is a platform and language independent interface that allows programs to dynamically access content, structure, and document style
21
Architecture Presentation Layer Technology (cont.)
SOAP – The Simple Object Access Protocol is a lightweight XML based protocol for the exchange of information in a decentralized, distributed environment. It consists of an envelope that describes the message and a framework for message transport Apache SOAP – An implementation of the W3C SOAP specification. The Apache SOAP provides a server-side infrastructure for deploying, managing, and running SOAP enabled services.
22
Architecture Presentation Layer
The Mediator Servlet manages the user session, forwards requests to the JSP, and returns HTML JSPs forward requests to the UI Bean, and returns XML or HTML to the client The UI Bean receives the a Domain Object and converts the object to XML or HTML The DOM Writer performs the Conversion to XML while the Conversion Servlet represents the Domain Object as a Document Object (DOM) The conversion is performed by calling the Domain Objects toXML() method HTML/HTTP Mediator Servlet Domain Objects XML/HTTP JSPs RMI RDF UI Bean Manager Proxies SOAP DOM Writer ConversionServlet XML Builder
23
Architecture Object Layer Technology
Java Applications/Objects – Applications/Objects that implement the Java programming language, an object-oriented language which provides portability and many other features Java RMI – Remote Method Invocation is distributed computing in Java. It is a simple technology for object communication that allows remote objects to act as local Java objects JDBC – Java Database Connectivity is a Java API for executing SQL statements and connecting to databases SOAP – Simple Object Access Protocol extends object interoperability to other programming languages (Perl, Python, C++) DAS – The Distributed Annotation System is an emergent system for retrieving genomic annotations from a variety of data sources (GoldenPath, Ensembl)
24
Remote Interface Manager
Architecture Object Layer Object Managers are created to implement complex scientific concepts The Role Model object verifies user permissions to the data if applicable The Remote Manager interface abstracts the RMI layer Allows RMI to be easily replaced by EJB or other communication venues Domain Objects are serialized and XML enabled Object Managers Role Model GeneManager Data Access Objects SequenceManager RMI LibraryManager Remote Interface Manager Other Managers Domain Objects
25
Architecture Data Access Objects Data Access Objects Data Sources
Provides the object relational mapping Database independent persistence Allows for the introduction of new data sources Abstracts persistence details from the domain objects Allows for federated database topology Data Access Objects Data Sources LibraryPersistence ClonePersistence Databases TissuePersistence JDBC SequencePersistence HTTP GenePersistence URLs ProteinPersistence FTP Other Flat Files
26
Data Sources CGAP (NCI Cancer Genome Anatomy Project) GenBank (NCBI)
Provides gene expression profiles of normal, pre-cancer, and cancer cells GenBank (NCBI) Sequence submission software Unigene (NCBI) Partitions sequences into unique gene-oriented clusters LocusLink (NCBI) Provides an interface to curated sequence and descriptive info about genetic loci
27
Data Sources Homologene (NCBI) GoldenPath (UCSC)
Provides curated orthologs and homologs for UniGene and LocusLink genes GoldenPath (UCSC) Contains a working draft of the human genome
28
Usage Java developers use the caBIO jar file for immediate access to genomic object attributes and methods caBIO objects can easily be extended to support customized objects Non-Java and Java developers can use SOAP to access genomic object attributes and methods over HTTP Developers can obtain detailed descriptions of available services via the RDF site URIs of desired services can be obtained via RDF Developers can host a caBIO server or leverage the existing NCICB caBIO server
29
Benefits Provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details Permits access to allow developers to obtain the information they need from a variety of data sources without data management Manages the display of large volumes of data to assist in load balancing Provides an effective mechanism for comparing similar objects that rely on diverse data sources Facilitates information sharing without managing linkages between multiple data sources
30
Future The Semantic Web Natural Language Processing (NLP) Concept
“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila (W3C) Implementation Agent Based Technology may be used to create programs that implement semantic web concepts Natural Language Processing (NLP) NLP involves understanding natural language and extracting information NLP can be used as a tool to help automate tasks, refine searches and improve genomic algorithms
31
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.