Bioinformatics Databases: Fundamental Concepts of Database Technology & Data Organization Kristen Anton Director of BioInformatics Dartmouth Medical School.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

C6 Databases.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Management Information Systems, Sixth Edition
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Chapter 3 Database Management
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
File Systems and Databases
Organizing Data & Information
3-1 Chapter 3 Data and Knowledge Management
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Bioinformatics Databases: Fundamentals of Database Technology & Data Organization Kristen Chambers Director of Bioinformatics Dartmouth Medical School.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
BUSINESS DRIVEN TECHNOLOGY
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Database Management COP4540, SCS, FIU An Introduction to database system.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Chapter 5 Database Processing.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Database Design - Lecture 1
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Chapter 1 Overview of Database Concepts Oracle 10g: SQL
Database Technical Session By: Prof. Adarsh Patel.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Database System Concepts and Architecture
1 Chapter 1 Overview of Database Concepts. 2 Chapter Objectives Identify the purpose of a database management system (DBMS) Distinguish a field from a.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Lecture 2 An Overview of Relational Database IST 318 – DB Admin.
6 Chapter Databases and Information Management. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits.
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Chapter 4 Database Processing Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall 4-1.
Chapter 1Introduction to Oracle9i: SQL1 Chapter 1 Overview of Database Concepts.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
BSA206 Database Management Systems Lecture 2: Introduction to Oracle / Overview of Database Concepts.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Object storage and object interoperability
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
1 Management Information Systems M Agung Ali Fikri, SE. MM.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Managing Data Resources File Organization and databases for business information systems.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Fundamentals of Information Systems, Sixth Edition
An Introduction to database system
Bio68: Bioinformatics Databases
MANAGING DATA RESOURCES
Database Design Hacettepe University
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Bioinformatics Databases: Fundamental Concepts of Database Technology & Data Organization Kristen Anton Director of BioInformatics Dartmouth Medical School Dartmouth Medical School

How can data be organized? Paper (i.e. in notebooks) Flat files –Collection of data records –Minimal structure, no metadata –Application program must contain relationship information Database –Hierarchical –Network –Relational

Dartmouth Medical School

How can data be organized? Paper (i.e. in notebooks) Flat files –Collection of data records –Minimal structure, no metadata –Application program must contain relationship information Database –Hierarchical –Network –Relational

Dartmouth Medical School What is a relational database? A database composed of relations and conforming to a set of principles governing how such relations are supposed to behave (“Codd’s 12 Rules”). There are many database systems that use tables but don’t conform to all of the principles. These are often called “semirelational” systems. from Understanding SQL, Martin Gruber

Dartmouth Medical School Practically speaking... A database is a body of information stored in two dimensions (rows and columns) –Rows are records –Columns are attributes of those record entities (usually!) The groups of rows and columns, or tables, are largely independent of each other The power of the database lies in the relationships that you construct among the tables A database is self-describing: it contains metadata, which is a description of its own structure

A set of programs which define, administer and process databases and their associated applications A scalable DBMS can run on multiple platforms (varying sizes) A DBMS that supports interoperability uses industry-standard language and standard ways of exchanging data What is a Database Management System (DBMS)? Examples: Oracle, Sybase, 4D, MS Access … Dartmouth Medical School

Features of a Relational Database Rows (records) are in no particular order Columns (fields) are ordered, numbered and named; names should indicate content of the field Primary key uniquely identifies each row - ensures that no row is empty, and that every row is different from every other row Two-step commit process Dartmouth Medical School

Features of a Relational Database A view is a subset of the database that an application (or user) can process The database schema is the structure of the entire database A constraint is a condition you apply to an attribute of a table Dartmouth Medical School

Relationships between tables One-to-One, Many-to-One, Many-to-Many A “join” is an operation that combines data from multiple tables into a singe result table E-R (entity-relationship) diagram is the basic graphic to describe the structure of a database SELECT Sequence.sname, KnownGenes.gname, KnownGenes.length FROM Sequence, KnownGenes WHERE KnownGenes.length = Sequence.length

Dartmouth Medical School E-R Diagram

The tool for communicating with relational databases: SQL Standard Query Language (SQL) A query is a question you ask the database, and SQL retrieves the appropriate answer set Interactive SQL (command line) vs. RAD tool/GUI Standardization issue: ANSI (American National Standards Institute) Dartmouth Medical School

Data Types Types of data indicate functions that are possible between related fields Each field is assigned one data type (imposes structure on data) Examples: text (CHAR, VARCHAR), number (INT, DEC); date, time, money binary Standardization issue: ANSI (American National Standards Institute) Dartmouth Medical School

Designing a database is not trivial The value is not in the data, but in the structure Design to facilitate the retrieval and interpretation of the data Dartmouth Medical School A word about database design:

Relationships ease extraction and/or reporting of data from the system Redundancy Concept of attributes in rows instead of columns Dartmouth Medical School Design database for data extraction: think it through

Dartmouth Medical School Design database for data extraction: think it through

Dartmouth Medical School Design database for data extraction: think it through

Reusable ‘core’ modules, with customizable components Standard business logic framework controls transactions (middle layer) Metadata-based back-end data storage (facilitates data sharing) Dartmouth Medical School Example: BioInformatics Core Technology

Dartmouth Medical School BioInformatics Core Technology

Data Security: High Priority Dartmouth Medical School HIPAA, FIPS (VA), IRB requirements …

Life science has become a field which generates an enormous amount of un-integrated data. Dartmouth Medical School How can methods for data organization help to solve this problem?

Dartmouth Medical School What is Data Integration? Creating a system which allows the extraction of a piece or set of information (query result) across multiple domains (possibly disparate data sources - flat files, databases, spreadsheets, URLs...)

Dartmouth Medical School Sample integration problem: Cancer Biomarker Discovery Clinical center collects blood samples from 1000 individuals with colon cancer Expression analysis reveals that protein ‘x’ is over-expressed in these samples, relative to controls Could this be a colon cancer biomarker?

Dartmouth Medical School Understanding transcription factors for protein ‘x’ production Show me all genes in the public literature that are putatively related to protein ‘x’, have more than 4-fold expression differential between affected and normal tissue and are homologous to known transcription factors. Q 1 : Find homologs Q 2 : Find genes with 4-fold differential Q 3 : Show me genes in public literature SEQUENCEEXPRESSIONLITERATURE (Q1  Q2  Q3)

Dartmouth Medical School Key components to integration Accessing without modifying original data sources Handling redundant, conflicting, missing, changing (versions) data Normalizing analytical data from different data sources Conforming terminology to industry standards Accessing the integrated data as a single repository Including metadata in repository

Dartmouth Medical School Approaches to Integration where are the key issues addressed? Federated database (poses constraints on original data sources; fragility in reliance on source systems) Data warehousing (ETL layer, original data sources untouched, required understanding of domain, sophisticated update/archive processes) Integrating data source profiles Indexed Flat Files Others….

Dartmouth Medical School Data Warehousing

Dartmouth Medical School Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Metadata one key to success Data value: 55

Dartmouth Medical School Data value: 55 Metadata values: Data element name: vehicle speed Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Metadata one key to success

Dartmouth Medical School Data value: 55 Metadata values: Data element name: vehicle speed Unit: miles per hour Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Metadata one key to success

Dartmouth Medical School Data value: 55 Metadata values: Data element name: vehicle speed Unit: miles per hour Description: the average velocity of a vehicle Describes data types, relationships, histories, etc. Back-end (supports developers), front-end (supports users and application) Metadata one key to success

Dartmouth Medical School Standards the final frontier Naming conventions Standard coordinate systems Unify interpretations of single object types Unify software solutions to the same problem (also data formats) Standards for metadata (incompatible or missing metadata)

Dartmouth Medical School Developing Standards for Life Sciences Research Discovery science does not lend well to constraints (especially system constraints) Decentralized data management infrastructure, competition Wildly varying skill levels for data and information management Several groups (Bio-Ontologies, HGNC, OMG, etc.) and national research initiatives (EDRN, caBIG, etc.) are taking the lead in the effort to create ‘workable’ standards.

New approach to integration: Cancer Biomarker Discovery Network of distributed data ‘silos’ (does not perturb data sources) Centralized query and ‘business logic’ servers, accessed through web interface CORBA framework ‘manages’ XML profile definitions across the web A profile is a set of resource definitions implemented in XML for data sources residing in one or more distributed systems Dartmouth Medical School