Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Slides:



Advertisements
Similar presentations
PIONIER 2003, Poznan, , PROGRESS Grid Access Environment for SUN Computing Cluster Poznań Supercomputing and Networking Center Cezary Mazurek.
Advertisements

Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Exploiting the WWW: Lessons from a UK Research Project on a Health Record BrokerExploiting the WWW: Lessons from a UK Research Project on a Health Record.
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Grid Database Projects Paul Watson, Newcastle Norman Paton, Manchester.
Chapter 20 Oracle Secure Backup.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
ICS 434 Advanced Database Systems
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Database System Concepts and Architecture
Database Architectures and the Web
SWE 316: Software Design and Architecture
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
Session – 6 DISTRIBUTED DATABASE ARCHITECTURE Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Two main requirements: 1. Implementation Inspection policies (scheduling algorithms) that will extand the current AutoSched software : Taking to account.
TAMBIS Transparent Access to Multiple Biological Information Sources This presentation will take about five minutes.
8/28/2001Database Management -- Fall R. Larson Database Management: Introduction University of California, Berkeley School of Information Management.
TAMBIS Transparent Access to Multiple Biological Information Sources.
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Application architectures
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
10/26/00Splitting Access Databases...1 Preparing for Access 2000 Windows 2000/Office 2000 Roll-out.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
DISTRIBUTED COMPUTING
SeLeNe - Architecture George Samaras Kyriakos Karenos Larnaca – April 2003 THE UNIVERSITY OF CYPRUS.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.

Dr. Azeddine Chikh IS444: Modern tools for applications development.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
SE-02 COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Genomes to Grids Thoughts on Building Data Grids for Biology Biologists have discovered many millions of genes and genome features, now part of the bio-data.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Database Concepts Track 3: Managing Information using Database.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
14 1 Chapter 14 Web Database Development Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
Chapter 5: MULTIMEDIA DATABASE MANAGEMENT SYSTEM ARCHITECTURE BIT 3193 MULTIMEDIA DATABASE.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 5 : MULTIMEDIA DATABASE MANAGEMENT SYSTEM ARCHITECTURE.
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 6 - Essentials of Design an the Design Activities.
May 7-8, 2007ICVCI 2007 RTP Autonomic Approach to IT Infrastructure Management in a Virtual Computing Lab Environment H. Abdel SalamK. Maly R. MukkamalaM.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Database Architectures and the Web
Defining and tracking requirements for New Communities
Database Architectures and the Web
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Oracle Architecture Overview
Data Warehousing and Data Mining
The Globus Toolkit™: Information Services
Introduction to Databases Transparencies
1st International Conference on Semantics, Knowledge and Grid
Support for ”interactive batch”
Tiers vs. Layers.
Analysis models and design models
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Database Architecture
Web Mining Department of Computer Science and Engg.
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Structure of Presentation Current integration proposals. What they support. What they dont support, and why. Requirements for integration. What could be useful, and why. Grid opportunities. Relevant Grid technologies. Absent Grid technologies.

Current Integration Proposals

Classification FeatureValues Data LocationIn-situ, Replicated, Reorganised Integration ModelNone, Relational, Semi- Structured, Object-Oriented ArchitectureThin Client, Client-Server, Multi-Tier Analysis SupportFunction Call, Query, Workflow

SRS Sequence Retrieval System

SRS In Use List of Databases Search Interfaces Selected Databases

SRS Results Links to Result Records

Classification of SRS FeatureValues Data LocationReplicated Integration ModelNone ArchitectureThin Client Analysis SupportFunction Call, Query

BioNavigator BioNavigator combines data sources and the tools that act over them. As tools act on specific kinds of data, the interface makes available only tools that are applicable to the data in hand. Online trial from:

Initiating Navigation Select database Enter accession number

Viewing Selected Data Relevant display options Navigate to related programs

Chaining Analyses in Macros Chained collections of navigations can be saved as macros and restored for later use.

Classification of BioNavigator FeatureValues Data LocationReplicated Integration ModelNone ArchitectureThin Client Analysis SupportFunction Call, Workflow

Current Public Integration Systems Location: data is replicated – under control. Integration model: often minimal. Architecture: The architecture is often two- tier. Analysis support: Query and analysis access is carefully contained. Only very careful instantiation of the classification yields sufficiently predictable performance.

GIMS

GIMS – recent experience FeatureValues Data LocationReorganised Integration ModelObject-Oriented ArchitectureMulti-tier Analysis SupportFunction Call

Example Analysis Data: Yeast genome sequence. Protein-protein interaction data. 350 transcriptome experiments. Overall database ~350Mb. Analysis: Correlate transcription of interacting proteins.

Features of Experience Challenging to conduct single runs of analyses – must break into bits. These are modest data sets compared with what is coming. Environment has been designed with analysis in mind. These analyses will never make it into the public release!

Requirements for Integration

Location: replication is transparent. Integration model: standards. Architecture: Flexible, multiple tier. Analysis support: Arbitrary analyses over diverse data sets. True integration in bioinformatics should not just be data oriented, but involve integration of analyses.

Three Tier Architecture Clients handle user interaction and presentation. Application servers perform computation and analysis. Data servers manage and query databases. Client Application Server Data Server

Three Tier Architecture Scaleability: Replace/Upgrade components as needed. Replace/Upgrade layers independently. Flexibility: Application server layer protects clients from changes in database layer. Classical three tier architectures are configured statically, and are adapted slowly as needs evolve.

Grid Opportunities

Necessary and Missing Necessary: Directory services. Discovery services. Co-allocation. Data replication. Workload management. Accounting and payment. Missing: Databases. Data models. Heterogeneity resolution. Personalisation. Web services. Standards.

Dynamic Multi-Tier Client Application Server Data Server Application Server Application Server Data Server Resources need to be identified, selected and scheduled dynamically.

Grid Classification FeatureValues Data LocationIn-situ, Replicated Integration ModelNone ArchitectureMulti-Tier Analysis SupportFunction Call, Workflow The current Grid is not the answer, but the answer subsumes the current facilities of the Grid.

Summary Current integration facilities in biology: Are cunningly restrictive. Make the most of limited distributed computational architectures. The Grid is bringing to the table: Resource description facilities. Resource scheduling and workflow management facilities. The Grid does not directly address current needs in biology, but its descendents may.