Physical Data Modeling – Implementation

Physical Data Modeling – Implementation
BCHB697

Outline DBMS infrastructure & performance
DBMS data-types, sizes, collation Create, read, update, delete (CRUD) Logical data-access architectures Denormalization, summary tables BCHB697 - Edwards

The Data Modeling Process
Conceptual Data Modeling Define the entities and their relationships Logical Data Modeling Define each entities’ attributes, incl. types Choose primary key(s) for each entity Attribute details (cardinality, optional) Normal Forms. Physical Data Modeling Implementation BCHB697 - Edwards

Database Management Systems (Oracle, …)
Monolithic servers, expensive software Many databases, each with many tables Users, permissions, etc. Single point of failure! the focal point of all projects, etc. High priests (DBAs) manage as infrastructure Big memory, fast disk, fast network Clients connect from other computers via the network Clients may be web-applications, scripts, or ad-hoc query tools BCHB697 - Edwards

Database Management Systems (MySQL, MariaDB, …)
Monolithic servers (Free software) Many databases, each with many tables Users, permissions, etc. Anyone can install (Linux, PC, etc.) One per project, not one per machine Commodity hardware, free OS, ad-hoc Clients connect from the same or other computers via the network Clients may be web-applications, scripts, or ad-hoc query tools BCHB697 - Edwards

LAMP Stack BCHB697 - Edwards

Database Management Systems (SQLite)
No servers, free One database per file, with many tables No users, just filesystem permissions Nothing to install (Linux, PC, etc.) One/many per project, sometimes temporary Personal computers, often directly integrated with client software Clients open file directly Web-applications, scripts, or ad-hoc query tools BCHB697 - Edwards

Data access hierarchy Memory (RAM) Disk (Hard Drives, SSD) Intranet / Local Network (DBMS, File Server) Internet / Wide Area Network (Web, Cloud Storage) Capacity Cost & Speed BCHB697 - Edwards

Database Management Systems (Common)
Use the filesystem (disk) to store the data Database size >> RAM; Lots of data-structures and low-level tricks to retrieve some data from disk to memory quickly. Access data using fixed-size disk “blocks” Many rows at a time; # of rows depends of # of bytes/row; keep small! Keep frequently accessed data in memory Avoid sending large results over the network! BCHB697 - Edwards

DBMS data-types: Numeric
Also, FLOAT (4 bytes), DOUBLE (8 bytes) Floating point numbers DECIMAL (digits, decimals) Exact decimal representation, expensive MySQL Documentation, Table 11.1 BCHB697 - Edwards

DBMS data-types: String
CHAR(n): fixed length string, characters VARCHAR(n): variable length string, characters BLOB(n): variable length bytes, binary ENUM: fixed set of possible string values, internally represented as integers Characters come from a “character set” with a “collation” – often case insensitive BCHB697 - Edwards

Semantic considerations
ID columns are typically integer, unsigned Primary key id columns are auto increment Consider the number of instances for size Use CHAR for accessions, VARCHAR for “human” strings, descriptions etc. Use BLOBs for data Use ENUM when possible Could represent DECIMAL using CHAR or INTEGER… Usually DATE and DATETIME data-types are available too BCHB697 - Edwards

Use-cases drive performance!
Consider the lifecycle of the data stored Create, read, update, delete Are updates, additions, deletions interleaved with read-only data-access? Changing the underlying data can invalidate in-memory caches, pre-computes, and summaries Writes (to disk) are typically slower than reads Consistency issues abound… Which data is accessed most frequently, needed with least latency? What is the data-access time-scale? BCHB697 - Edwards

Use-cases drive performance!
Data-access hierarchy considerations: Is the relevant disk-block already in memory? Is the result in a single disk-block? Can we determine the disk-block to retrieve without reading all of a table’s disk blocks? Does the query require scanning and/or retrieving an entire table? Understanding the use-cases for data CRUD will help identify performance bottlenecks BCHB697 - Edwards

Database Application Architectures
Presentation/View Layer User-facing, user-interface, display Generates use-cases to support view e.g. web-browser, client software Application/Controller Layer Translates the Presentation/View layer requests into queries of the DBMS/Data/Model layer Handles business logic, Based on logical data-model DBMS/Data Layer Executes the requested query on the physical data-model implementation BCHB697 - Edwards

Two-tier applications: Presentation and application together Direct connection to DBMS Change in the physical data-model forces change on the client Three-tier applications: Presentation communicates with “middle” (application) tier Middle tier interprets logical data model requests Middle tier queries DBMS Change in the physical data-model forces changes only in the middle tier. Changes in presentation might require change in the middle tier only. BCHB697 - Edwards

Middle tiers provide rich, aggregate, de-normalized data for presentation layer Application Programming Interface (API) Often return their results as XML or JSON Can be implemented as using one, or many, queries to the DBMS DBMS also provide “views” which behave like tables, but which are constructed on the fly DBMS optimization can be carried out without affecting the client application BCHB697 - Edwards

Denormalization Derived values and/or summary tables violate the normalized forms (1NF, especially)… …but some are expensive to compute e.g. total drug expenditure across all operations Pre-computing derived or summary values shifts the expense away from the query …but must be updated whenever source changes …leading to slow updates/deletes/additions …or must be allowed to be out-of-date (how much?) Multi-table updates can lead to inconsistencies Transactions guarantee atomic changes… …but can block read access to many tables. BCHB697 - Edwards

Denormalization Large or infrequently accessed columns
Split problem columns to secondary table Reduces size of primary table’s rows in memory BLOBs are rarely needed to determine queries Put large BLOBs on the filesystem and store the filenames… Redundant values: Avoid accessing multiple tables. e.g. scientific name in Taxonomy database BCHB697 - Edwards

Exercise Start your BCHB524 virtual machine
Start a web-browser on the host, and connect to Click on phpMyAdmin Create New Databases: taxonomy, sakila Use the Import tab to populate SQL dumps are in the course data-directory Poke around….what is good, what is bad? BCHB697 - Edwards

Homework Create a physical data model for MySQL using phpMyAdmin for your Class Registration logical-data model Populate the rows for each table, using Import from CSV format Export as SQL, including the data. Submit exported SQL dump. Due Feb 8th, 10am BCHB697 - Edwards

Physical Data Modeling – Implementation

Similar presentations

Presentation on theme: "Physical Data Modeling – Implementation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Physical Data Modeling – Implementation

Similar presentations

Presentation on theme: "Physical Data Modeling – Implementation"— Presentation transcript:

Similar presentations

About project

Feedback