Database Design Presented to Molecular Biology Bioinformatics Meeting by Norbert Kremer April 29, 2002
Database Features Efficient storage and retrieval of data relevant to a specific purpose –Data integrity –Security –Performance, indexing Best design depends on application
Database Technologies Manual Systems –Shoebox full of index cards –Yellow stickies Mechanical Systems –Hollerith Cards –Indexing holes
Database Technologies Automated Systems –Flat file, fields in fixed or delimited columns –Early database systems: network, hierarchical –Client-server paradigm: relational database management systems –Object databases, e.g. AceDB –Object-relational, e.g. Oracle 9 –XML, eg Neocore and Tamino
RDBMS Features 1/2 Data stored as tables with keys Sound mathematical foundations in set theory, T.Codd, IBM Research Labs Relational algebra – closure Best for highly structured data
RDBMS Features 2/2 SQL is most common relational language SQL is non-procedural, relationally complete Robust, mature technology, performance and scalability, security, data integrity
Relational Table Structure Examples showing –One big table –Multiple tables with keys Normalization Process –Data elements depend on the “key, the whole key, and nothing but the key”
Data Modeling Identify and define data elements Discover logical relationships between data elements
Software Engineering Methodology ANALYSIS Identify and Define data elements Requirements Doc, Functional Spec Data dictionary DESIGN Logical Relationships Technical Spec Database schema BUILD Physical layout Code, Release cycle DB creation scripts, seed data DEPLOY Maintenance SOPs Backup, DB Tuning
Data Integrity Constraints enforced by DBMS Transaction processing
Data Integrity - Constraints Constraints enforced by DBMS –Central location eases maintenance –Uniform application in all code –Cannot be subverted by malicious or lazy programming Foreign keys
Data Integrity - Transactions ACID Properties –Atomic –Consistent –Isolated –Durable
Data Integrity - Transactions ACID Properties –Atomic Logical unit of work, complete all or roll back –ConsistentAll constraints observed –IsolatedTransactions are independent –Durable“Commit” is irreversible
Software Engineering Methodology ANALYSIS Identify and Define data elements Requirements Doc, Functional Spec Data dictionary DESIGN Logical Relationships Technical Spec Database schema BUILD Physical layout Code, Release cycle DB creation scripts, seed data DEPLOY Maintenance SOPs Backup, DB Tuning
PowerDesigner Demo Graphical communication among analysts, architects, developers, end users Centralized maintenance of database schema and data dictionary Forward engineering (SQL script from model) Reverse engineering (model from existing database) Medline example, showing modeling process for “associative tables”
References WinSQL graphical database client for any ODBC- compliant database (just about all) Get the free Lite version from ODBC driver for MySQL (also need the usual MySQL client installed) Powerdesigner 9 Trial The trial is fully functional for 45 days. See