IS 325 Database Management Systems Notes for Tuesday, September 5, 2017
What is a Database? In a general, a database is any organized collection of data Examples: Grocery List Audio CD Catalog Phone Book Airline Ticketing Software Tax Preparation Software An Address Book Google MapQuest Amazon eBay
Data vs. Information For the user of a database, the end goal is to view meaningful and useful information Raw data, the values we store in a database, by themselves are essentially useless. For instance, do we know what the value 59 means? Is it my IQ? Is it my age? Is it the number of jelly beans in a jar? We don't know Information is data with context
Data Processing When we process data, we connect sets of data to make meaningful information For instance, if we connect the value 13502 to the value “Zip Code”, we're probably able to discern that the value 13502 is a zip code that represents a specific location The end result of data processing is meaningful and useful information. Data is stored Information is retrieved
Types of Modern Databases Operational Databases Online transaction processing (OLTP) Dynamic in nature ("just in time" information) Used heavily by commercial entities Analytical Databases Used for online analytical processing (OLAP) Static in nature Often, use OLTPs to populate data Used heavily by research entities
Historical Database Models A database model speaks to how we create a database. Throughout the years, people have used these models for creating databases: The Hierarchical Model The Network Model The Relational Model (most commonly used today)
The Hierarchical Model The hierarchical model connects tables of data via parent/child relationships. In such relations, a parent table can have 1 or more children, but a child table must have 1 and only 1 parent. Tables connect using the physical arrangement of records. The hierarchical model requires that a user know the structure of the database. Access always starts at the root table.
Hierarchical Model Example - Figure 1.1 from Herenandez
Network Database Model Introduces nodes and sets structures. Nodes are collections of records and set structures are the relationships in the database. The relationship between nodes has 1 nodes as the owner node, with 1 or more member nodes. A record in a member node can only be related to only 1 record in an owner node. Records in a member node cannot exist without being related to a record in an owner node.
Network Model Example - Figure 1.3 from Herenandez Agents Clients Represent Manage Clients Entertainers Make Schedule Perform Play Payments Engagements Musical Styles - Figure 1.3 from Herenandez
Relational Model Derived from two branches of mathematics – set theory & first-order predicate logic. Stores data in relations (tables). Each table is composed of tuples (records) and attributes (fields). Two features of this model allow us to access data without knowing database structure: The physical structure of the records and fields in a table doesn’t matter. We identify each individual record in a table by a unique value.
Table Relationships We categorize table relationships in the Relational Model as follows: One-to-One (1:1) One-to-Many (1:N) Many-to-Many (N:N) To establish a relationship between tables, we need to match values of a shared field.
Relationship Example - Figure 1.5 from Herenandez Agent ID Agent First Name Agent Last Name Hire Date 100 Mike Hernandez 05/16/95 101 Greg Piercy 10/15/95 102 Katherine Ehrlich 03/01/96 Client ID Agent ID Client First Name Client Last Name 9001 100 Stewart Jameson 9002 101 Shannon McLain 9003 102 Estella Pundt - Figure 1.5 from Herenandez
Advantages of Relational Databases Layers of data integrity Table level data integrity: ensures records aren’t duplicated and key values are present Relationship level data integrity: ensures that the relationship between two tables is valid Business level: ensures that data is accurate in terms of business rules Data consistency & accuracy – result of built-in data integrity. Independence from physical structure Easy data retrieval
Database Management Software Relational database management systems (RDBMS) are applications used to “create, maintain, modify and manipulate” a database. Typically, RDBMSs include: Tools to build tables and establish table relationships Tools for creating forms for user input/output. Tools for querying a database (asking the database a question) Tools for creating reports for output.
Phases of Database Design Requirements Analysis – Understanding the information needs of a business client through interviews to understand their current (and future) business environment. Data Modeling – Modeling the database structure using one of the established data-modeling methods, like entity-relationship diagrams; end goal is to visually represent the database structure.
Phases of Database Design (cont.) Data Normalization – Breaking large tables into smaller ones to eliminate redundant data and avoid problems when manipulating data.
Database Tables A database stores data in relations, perceived by the user as tables. Comprised of tuples (records) and attributes (fields) Chief structures in a database Logical and physical order of fields and records doesn’t matter Every table must contain a Primary Key Field, which uniquely identifies each of the table’s records. Tables can represent objects or events.
Types of Tables Data Table Validation Table (Lookup Table) Most common type of table in a relational database Store data that supplies information Dynamic in nature Validation Table (Lookup Table) Stores data used when enforcing data integrity Usually static in nature Examples: job codes, city names, billing categories, etc.
Fields A field, or attribute, is the smallest structure in a database. Represents a characteristic of the subject of the table to which it belongs. The quality of information retrieved from the database depends heavily on the time invested in ensuring the structural and data integrity of fields (more on that later …). A field should contain 1 and only 1 distinct value (FirstName or LastName versus FullName, for instance.)
Records A record, or tuple, is a specific instance of the subject of a table. A record is made up of all fields in a table. Some fields may not have specific values populated. The value stored in the primary key field uniquely identifies the record throughout the database.
Record & Field Example Fields Records Table Name is Students Student ID Student First Name Student Last Name Student Major 1 40853 William Harden Political Science 98364 Maria Garcia-Grande Nursing 15792 Michael Bobersky Psychology Records Table Name is Students
Views A view, or a virtual table or saved query, is made up of fields from other tables in the database. The contributing tables are called base tables. Since data is stored in other tables, databases do not store data associated with views (thus eliminating redundancy). Databases only store the structure of the view.
Advantages of Views You can work with data from multiple base tables simultaneously. Security – views prevent restricted users from manipulating data stored in base tables. Views are useful for implementing data integrity (a validation view).
Primary Keys A primary key is a field or group of fields that uniquely identifies a record. A primary key comprised of two or more fields is called a composite primary key. Every table must have a primary key! The most important key in a table: Uniquely identifies a specific record throughout a database Identifies a specific table throughout the database Enforces table-level integrity Helps to establish relationships between tables
Foreign Keys A foreign key is important when establishing relationships between tables. To create a foreign key, you would take a primary key from one table and incorporate it in a second table. In the second table, the key becomes a foreign key. Foreign keys enforce relationship-level integrity – values in one table's foreign key field must match exactly with the corresponding values of a second table's primary key field.
Example of Primary & Foreign Keys Agent ID Agent First Name Agent Last Name Hire Date 100 Mike Hernandez 05/16/95 101 Greg Piercy 10/15/95 102 Katherine Ehrlich 03/01/96 Agents Table Client ID Agent ID Client First Name Client Last Name 9001 100 Stewart Jameson 9002 101 Shannon McLain 9003 102 Estella Pundt Clients Table Agent ID is the Primary Key in the Agents Table and a Foreign Key in the Clients Table. - Adapted from Figure 3.11 from Herenandez
Design Objectives Why should you be concerned with database design? The importance of theory The advantages of learning good design methodology Objectives/advantages of good design Database design methods
Why Should We Be Concerned With Database Design? Database must insure consistency, integrity, and accuracy of data Inaccurate information is probably the most detrimental result of improper database design. It can adversely affect the bottom line of a business
Importance of Theory Most major disciplines have theoretical base Structural engineers theories of physics Composers music theory Automobile designer Aerodynamic theory
Relational Database Models Mathematical theory Guarantees accurate information Guidelines for good design methodologies Not necessary to know all about mathematical theories to use a relational database! Mathematical theories provide the foundation for the relational database model, and thus makes the model predictable, reliable, and sound
Advantages/Objectives of good Design Support required and ad hoc information retrieval Contain efficiently constructed table structures Imposes data integrity at the field, table, and relationship levels Supports appropriate Business Rules Lends itself to future growth
Database Design Methods Traditional three phase method Requirements analysis phase examine business being modeled Data modeling phase modeling the database structure Normalization phase eliminate redundant data
Let's Discuss Normalization Functional dependencies 1st Normal Form 2nd NF 3rd NF