Bio68: Bioinformatics Databases

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

C6 Databases.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Management Information Systems, Sixth Edition
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
Bioinformatics Databases: Fundamental Concepts of Database Technology & Data Organization Kristen Anton Director of BioInformatics Dartmouth Medical School.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Bioinformatics Databases: Fundamentals of Database Technology & Data Organization Kristen Chambers Director of Bioinformatics Dartmouth Medical School.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Database Management COP4540, SCS, FIU An Introduction to database system.
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
Database Technical Session By: Prof. Adarsh Patel.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Object storage and object interoperability
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Managing Data Resources File Organization and databases for business information systems.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
1 SQL SERVER 2005 Express CE-105 SPRING 2007 Engr. Faisal ur Rehman.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Building a Data Warehouse
Introduction to DBMS Purpose of Database Systems View of Data
Database Systems: Design, Implementation, and Management Tenth Edition
Fundamentals of Information Systems, Sixth Edition
Client/Server Databases and the Oracle 10g Relational Database
An Introduction to database system
Chapter 1: Introduction
Chapter 1: Introduction
Databases Chapter 16.
Database Management:.
Chapter 9 Database Systems
Fundamentals of Information Systems, Sixth Edition
Fundamentals & Ethics of Information Systems IS 201
What is an attribute? How is it related to an entity?
Information Systems Database Management
Chapter 4 Relational Databases
Database Management  .
Databases and Data Warehouses Chapter 3
Tools for Memory: Database Management Systems
Databases and Information Management
Chapter 2 Database Environment Pearson Education © 2009.
Database Management System (DBMS)
Introduction to Database Systems
MANAGING DATA RESOURCES
Chapter 1 Database Systems
Database.
File Systems and Databases
Teaching slides Chapter 8.
Database Environment Transparencies
MANAGING DATA RESOURCES
Databases.
Databases and Information Management
Introduction to DBMS Purpose of Database Systems View of Data
Lecture 1 File Systems and Databases.
Database Design Hacettepe University
DATABASES WHAT IS A DATABASE?
The ultimate in data organization
Chapter 2 Database Environment Pearson Education © 2014.
Chapter 3 Database Management
Database System Concepts and Architecture
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Bio68: Bioinformatics Databases 05/15/01 Bioinformatics Databases: Fundamentals of Database Technology & Data Organization Kristen Chambers Director of Bioinformatics Dartmouth Medical School BioInformatics @ Dartmouth Medical School

How can data be organized? Paper (i.e. in notebooks) Flat files Collection of data records Minimal structure, no metadata Application program must contain relationship information Database Hierarchical Network Relational BioInformatics @ Dartmouth Medical School

BioInformatics @ Dartmouth Medical School

How can data be organized? Paper (i.e. in notebooks) Flat files Collection of data records Minimal structure, no metadata Application program must contain relationship information Database Hierarchical Network Relational BioInformatics @ Dartmouth Medical School

What is a relational database? A database composed of relations and conforming to a set of principles governing how such relations are supposed to behave (“Codd’s 12 Rules”). There are many database systems that use tables but don’t conform to all of the principles. These are often called “semirelational” systems. from Understanding SQL, Martin Gruber BioInformatics @ Dartmouth Medical School

Practically speaking... A database is a body of information stored in two dimensions (rows and columns) Rows are records Columns are attributes of those record entities The groups of rows and columns, or tables, are largely independent of each other The power of the database lies in the relationships that you construct among the tables A database is self-describing: it contains metadata, which is a description of its own structure BioInformatics @ Dartmouth Medical School

What is a Database Management System (DBMS)? A set of programs which define, administer and process databases and their associated applications A scalable DBMS can run on multiple platforms (varying sizes) A DBMS that supports interoperability uses industry-standard language and standard ways of exchanging data Examples: Oracle, Sybase, 4D, MS Access … BioInformatics @ Dartmouth Medical School

Features of a Relational Database Rows (records) are in no particular order Columns (fields) are ordered, numbered and named; names should indicate content of the field Primary key uniquely identifies each row - ensures that no row is empty, and that every row is different from every other row Two-step commit process BioInformatics @ Dartmouth Medical School

Features of a Relational Database A view is a subset of the database that an application (or user) can process The database schema is the structure of the entire database A constraint is a condition you apply to an attribute of a table BioInformatics @ Dartmouth Medical School

Relationships between tables One-to-One, Many-to-One, Many-to-Many A “join” is an operation that combines data from multiple tables into a singe result table E-R (entity-relationship) diagram is the basic graphic to describe the structure of a database SELECT Sequence.sname, KnownGenes.gname, KnownGenes.length FROM Sequence, KnownGenes WHERE KnownGenes.length = Sequence.length BioInformatics @ Dartmouth Medical School

E-R Diagram BioInformatics @ Dartmouth Medical School

The tool for communicating with relational databases: SQL Standard Query Language (SQL) A query is a question you ask the database, and SQL retrieves the appropriate answer set Interactive SQL (command line) vs. RAD tool Standardization issue: ANSI (American National Standards Institute) BioInformatics @ Dartmouth Medical School

Data Types Types of data indicate functions that are possible between related fields Each field is assigned one data type (imposes structure on data) Examples: text (CHAR, VARCHAR), number (INT, DEC); date, time, money binary Standardization issue: ANSI (American National Standards Institute) BioInformatics @ Dartmouth Medical School

A word about database design: Designing a database is not trivial The value is not in the data, but in the structure Design to facilitate the retrieval and interpretation of the data BioInformatics @ Dartmouth Medical School

Example: BioInformatics Core Technology Reusable ‘core’ modules, with customizable components Standard business logic framework controls transactions (middle layer) Metadata-based back-end data storage (facilitates data sharing) BioInformatics @ Dartmouth Medical School

BioInformatics Core Technology BioInformatics @ Dartmouth Medical School

How can methods for data organization help to solve this problem? Life science has become a field which generates an enormous amount of un-integrated data. How can methods for data organization help to solve this problem? BioInformatics @ Dartmouth Medical School

What is Data Integration? Creating a system which allows the extraction of a piece or set of information (query result) across multiple domains (possibly disparate data sources - flat files, databases, spreadsheets, URLs...) BioInformatics @ Dartmouth Medical School

Sample integration problem: Cancer Biomarker Discovery Clinical center collects blood samples from 1000 individuals with colon cancer Expression analysis reveals that protein ‘x’ is over-expressed in these samples, relative to controls Could this be a colon cancer biomarker? BioInformatics @ Dartmouth Medical School

Understanding transcription factors for protein ‘x’ production Show me all genes in the public literature that are putatively related to protein ‘x’, have more than 4-fold expression differential between affected and normal tissue and are homologous to known transcription factors. Q1: Find homologs Q2: Find genes with 4-fold differential Q3: Show me genes in public literature SEQUENCE EXPRESSION LITERATURE (Q1  Q2  Q3) BioInformatics @ Dartmouth Medical School

Key components to integration Accessing without modifying original data sources Handling redundant, conflicting, missing, changing (versions) data Normalizing analytical data from different data sources Conforming terminology to industry standards Accessing the integrated data as a single repository Including metadata in repository BioInformatics @ Dartmouth Medical School

Approaches to Integration where are the key issues addressed? Federated database (poses constraints on original data sources; fragility in reliance on source systems) Data warehousing (ETL layer, original data sources untouched, required understanding of domain, sophisticated update/archive processes) Integrating data source profiles Indexed Flat Files Others…. BioInformatics @ Dartmouth Medical School

Data Warehousing BioInformatics @ Dartmouth Medical School

Metadata one key to success Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Data value: 55 BioInformatics @ Dartmouth Medical School

Metadata one key to success Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Data value: 55 Metadata values: Data element name: vehicle speed BioInformatics @ Dartmouth Medical School

Metadata one key to success Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Data value: 55 Metadata values: Data element name: vehicle speed Unit: miles per hour BioInformatics @ Dartmouth Medical School

Metadata one key to success Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Data value: 55 Metadata values: Data element name: vehicle speed Unit: miles per hour Description: the average velocity of a vehicle BioInformatics @ Dartmouth Medical School

Standards the final frontier Naming conventions Standard coordinate systems Unify interpretations of single object types Unify software solutions to the same problem (also data formats) Standards for metadata (incompatible or missing metadata) BioInformatics @ Dartmouth Medical School

Developing Standards for Life Sciences Research Discovery science does not lend well to constraints (especially system constraints) Decentralized data management infrastructure, competition Wildly varying skill levels for data and information management Several groups (Bio-Ontologies, HGNC, OMG, etc.) and national research initiatives (EDRN, caBIG, etc.) are taking the lead in the effort to create ‘workable’ standards. BioInformatics @ Dartmouth Medical School

New approach to integration: Cancer Biomarker Discovery Network of distributed data ‘silos’ (does not perturb data sources) Centralized query and ‘business logic’ servers, accessed through web interface CORBA framework ‘manages’ XML profile definitions across the web A profile is a set of resource definitions implemented in XML for data sources residing in one or more distributed systems BioInformatics @ Dartmouth Medical School