A Database Platform for Bioinformatics

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Distributed Data Processing
Database System Concepts and Architecture
Lecture plan Information retrieval (from week 11)
ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Chapter 12 USING TECHNOLOGY TO ENHANCE BUSINESS PROCESSES.
Chapter 14 The Second Component: The Database.
System Analysis and Design
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Passage Three Introduction to Microsoft SQL Server 2000.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
UNIT-V The MVC architecture and Struts Framework.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
IT – DBMS Concepts Relational Database Theory.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Module Title? DBMS Introduction to Database Management System.
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Components of Database Management System
Fundamentals of Database Chapter 7 Database Technologies.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Copyright © 2012 UNICOM Systems, Inc. Confidential Information z/Ware Product Overview illustro Systems International A Division of UNICOM Global.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Chapter 9 : Application Areas. 2 Some Advance Application Areas of Computers  Software Development  Artificial Intelligence  Robotics  Industrial.
Christoph F. Eick: Final Words COSC Topics Covered in COSC 3480  Data models (ER, Relational, XML)  Using data models; learning how to store real.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
J2EE Platform Overview (Application Architecture)
Databases (CS507) CHAPTER 2.
Chapter 2: Database System Concepts and Architecture - Outline
Netscape Application Server
Chapter 2 Database System Concepts and Architecture
MATLAB Distributed, and Other Toolboxes
Jacobsen, Saleeba, Poidinger and Littlejohn
Software Design and Architecture
The Client/Server Database Environment
Introduction to Operating System (OS)
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Database System Concepts and Architecture
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Tools for Memory: Database Management Systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Ch 15 –part 3 -design evaluation
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Data Warehousing and Data Mining
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
What's New in eCognition 9
Data Model.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
McGraw-Hill Technology Education
Database System Concepts and Architecture
Geographical information system: Definition and components
What's New in eCognition 9
What's New in eCognition 9
Presentation transcript:

A Database Platform for Bioinformatics Sandeepan Banerjee Oracle Corporation Talked about the needs of the bioinformatics database, and gave four aspects of technologies by Oracle focus on how to resolve the problems. Extensibility architecture to store gene sequence data natively and perform high-dimensional structure-searches in the database. Warehousing technologies and data mining on genetic patterns. Data integration technologies to enable heterogeneous queries across distributed biological sources Internet portal technologies that allow life sciences information to be published and managed across intranets and the internet

Background Need massive storage for more and more genomic and proteomic data generated in database Need high-performance computing platform to search data, identify similarities and patterns within genomic data and unify the slices of distributed developed knowledge The atlas of the human genome promises to revolutionize medical practice and biological research for the next millennium. All over the world.

Steps to genomic projects Divide the chromosomes into smaller fragments that can be isolated Order these fragments to correspond to their respective locations on the chromosomes. Determine the sequence of bases A,T,C & G in each fragment. Annotate the regions of sequenced chromosomes with their function Catalogue the differences in sequences to make a series of descriptive diagrams maps of each human chromosome at increasingly finer resolutions

Computing for cataloguing Any two individuals differ in about 1/1000 of their genetic material, i.e. about 3 million base pairs. The global population is now about 6 billion. So a full cataloguing of all sequence differences will run to 18*1015 entries. So we can see there is a huge computing needed.

Traditional Database Few databases have had a native ability to deal with complex data Hard to handle high-dimensional data Ex. Query on structural similarity: Given a particular sequence, what other sequences resembling this sequence exist in the database? They mainly manage simple business data like numbers, characters or dates. Such as performing similarity queries on gene sequences, spatial queries on locations, or looks-like queries on images.

BLAST A set of similarity search programs Hard to handle due to too complex, too large and far too custom-built Degrade performing when interactions with database increase Query optimizations not easy Hard to manage with database as a whole system. Not the blast virus we got couples of weeks ago Basic Local Alignment Search Tool A set of similarity search programs to detect relationships between sequences, and rank statistically Not easy because blast severs don’t know about textual annotations.

Four technologies needed Extensibility database architecture Data mining and Data Warehousing Data integration technologies Internet portal technologies The author argued it was a better way to add these functions within the database. SO there are four To store gene sequence data natively and perform high-dimensional structure-searches within the database Warehousing technologies and data mining on genetic patterns due to its huge size To enable heterogeneous queries across distributed biological sources That allow lief sciences information to be published and managed over the internet.

Extending Databases User-defined Types User-defined operators Domain-specific indexing Optimizer extensibility

User-defined Types Oracle Type System Object types – structure is fully known to the database Opaque types – not known to the database OTS provides a high-level sql-based interface for defining types. Two central consstructs in OTS are For example, combine number, and date into semantic behaviour. Type method or functions is written in the 3GL like C. Learning curve? Benefit is to use external data model and behaviour available to store and minipulate the sequences.

User-defined operators Define domain-specific operators resembles() Can be invoked anywhere built-in operators can be used. Like in Select command: SELECT ID FROM DNATABLE WHERE Contains(fragment, `GCCATA`); Like resembles() operator for comparing sequences

Extensible Indexing Cooperative indexing User-supplied implementations and the Oracle server cooperate to build and maintain indexes for complex types such as genetic, text or spatial data. User implemented Indextype. The framework to develop new index types is based on the conecpt of cooperative indexing where a user-supplied implementations and the oracle server cooperate to build and maintain indexes for complex types such as genetic, text or spatial data. Efficient to build the index? Must be a professional programmer first.

Extensible Optimizer Gives developers control over the three main inputs used by the optimizer: statistics, selectivity, and cost. The user has also to to be a database expert in order to efficiently implement opitmizer function for statistics, selectivity and cost.

Mining Sequence Data Oracle Darwin for bioinformatics Darwin was built to address the terabyte databases found in genomics database. It provides classification and regression trees, neural networks, and k-nearest neighbours algorithms, k-means Naïve-Bayes and enhanced clustering algorithms. Darwin is based on a distributed-memory SPMD( single program multiple data) paradigm. Those memeory communciate with each other by message-passing library. And stardata is unifed data access and manipulation library, provides most of the data access and transformation infrastructure that supports the machine learning modules startree, … The author also mentioned that mining of sequence data is still ijn its infancy

Integrating Heterogeneous Data Sequence data will be distributed all over the institutes. Annotations to this data will make it change and grow all the time. Oracle use ODBC and OLEDB to connect non-Oracle database system to do query, search, insert, delete.

Portal Technologies ‘Soft Goods’ Sales Visualization Security & Access Control Oracle provides dynamical html and xml through java servlet. There are some other requirements for portals related to genomics and bioinformatics. Portals use soft goods like the number and complexity of queries against a sequence database to measure its usage. Oracle provides a wide array of server-based features to enable soft goods transactions overt he internet, how? Oracle enables the publishing of graphical data informats such as the vector markup language thatn can be used to display sequences. VLM will be transferred from by XML through XSL Oracle provides comprehensive PKI-based security to protect not only the results of queries, but the queries themselves. Public Key Infrastructure.

Questions? What’s the benefit for Oracle compared with BLAST? Are there any other technologies required for the Bioinformatics database platform? Is there anything Darwin can’t do for data mining in the bioinformatics database?