Introduction to Databases (1) COP 6726: New Directions in Database Systems Introduction to Databases (1)
Welcome Course web-site http://faculty.eng.fau.edu/yangk/COP6726/schedule.html
A database is a collection of related data. A database represents some aspect of the real world. A database is logically coherent collection of data with some inherent meaning. A database management systems (DBMS) is a collection of programs that enables users to create and maintain database.
Database management System (DBMS) Databases are an essential component of life in modern society. Examples include banking system, airline reservation, mobile, car navigation system, etc. A DBMS is a powerful tool for creating and managing large amounts of data efficiently.
Database management System (DBMS) Allow users to create new database and specify their schemas (i.e., logical structure of the data), using DDL. Give users the ability to query the data. Support the storage of very large amounts of data. Enable durability, the recovery of the database in the face of failures. Control access to data from many users at once.
A Timeline of Database History 1960 Computerized database started. There were two popular data models: a network model called CODASYL and a hierarchical model called IMS. 1970 Relational database model was proposed by E.F. Codd (IBM). Two major relational database system prototypes were created: Ingres (UBC) and System R (IBM). Entity Relationship (ER) database model was proposed by P. Chen. 1980 Structured Query Language (SQL) became the standard query language (ISO and ANSI).
Data Model Hierarchical or tree-based model (IMS): late 1960’s and 1970’s Graph-based Network model (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Extended Relational: 1980’s Semantic: late 1970’s and 1980’s Object-oriented: late 1980’s and early 1990’s Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present
Evolution of databases
Relational Database Separate the physical model from its conceptual model the programmer would not be concerned with the storage structure. Provide a mathematical foundation for storage (e.g., Relational Algebra, Normal Form., etc.) Provide high-level query languages (i.e., SQL)
Real World Conceptual Model Logical Model Physical Model Levels of Abstraction in DBMS Real World Conceptual Model Logical Model Physical Model
Real world Conceptual model Banking System
Conceptual model Logical model
Logical model Physical model File structure Indexing Concurrency Fault tolerance Performance
Data Definition Language (DDL) Overview of DBMS Data Definition Language (DDL) Create database schema Query Processing (DML) Answering the Query Queries and other DML actions are grouped into transactions. Storage and Buffer Management Data, Meta-data, Log Records, Statistics, Indexes
Overview of DBMS Transaction Processing Logging, Concurrency control, deadlock resolution Query Processor Query Parser, processor, and optimizer Execution Engine
Oracle Architecture
IBM DB2 Architecture
MS SQL Server Architecture
Basics of the Relational Model A relational database consists of a collection of relations (or two-dimensional tables). The relation Movies A schema is the name of a relation and the set of attributes. Movies (title, year, length, genre)
Basics of the Relational Model Tuples are the rows of a relation. Relations are sets of tuples, not lists of tuples. A set of attributes forms a key for a relation if we do not allow two tuples in a relation instance to have the same values in all the attributes of the key.
Structured Query Language Basics of SQL Structured Query Language The standard for relational database management systems (DBMS) SQL-92 and SQL-99 Standard Describe and manipulate relational database. Most commercial DBMS implement something similar, but not identical to, the standard. There are two aspects to SQL: Data-Definition Language (DDL) Data-Manipulation Language (DML)
Basics of SQL Data Types Character strings of fixed or varying length Bit strings of fixed or varying length Boolean Integer Float or Real Double Precision Date or Time
Basics of SQL CREATE TABLE DROP TABLE CREATE TABLE Movies ( title CHAR (100), year INT, length INT, genre CHAR (10) ); DROP TABLE DROP TABLE Movies;
Basics of SQL Declaring Keys CREATE TABLE Movies ( title CHAR (100) PRIMARY KEY, year INT, length INT, genre CHAR (10) ); CREATE TABLE Movies ( title CHAR (100), year INT, length INT, genre CHAR (10), PRIMARY KEY (title) );
Basics of SQL Retrieval Query SELECT <attribute list> FROM <table list> WHERE <condition> ORDER BY <attribute list> GROUP BY <attribute list>;
Basics of SQL
Basics of SQL
Query Optimization
Physical database file structure Databases are stored physically as files of records. Cache Memory Main Memory Flash Memory Magnetic disk Optical Drives Magnetic tapes Storage device hierarchy
Physical database file structure Data must be in Main memory for DBMS to operate it. Buffer Pool Page (or block) pin count dirty bit Main Memory Disk
Blocking Factor Data must be in Main memory for DBMS to operate it. Disk disk block Given block size B and record size R, Blocking Factor (bfr) = floor ( B / R ). Therefore, unused space in each block is B - ( bfr * R ).
Index Structure An Index takes a value for some field(s) and finds records with the matching value SELECT * FROM r WHERE a =10;
Index Structure Files of Ordered Records (Sorted File) Hash Index B+ tree Index 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 B+ tree
What is next ?
What is next? Temporal Database It represents events that occur in time. Spatial Database It stores and queries data that represents objects defined in a geometric space. Graph Database
Temporal Database Fine the employee who worked at the same company during all the time that James worked at FAU Find all vehicles that are on service in Boca throughout the first hour of the afternoon.
Spatial Database Find the names of all countries which are neighbors of the United States. List the length of the rivers in each of the countries they pass through. List all countries, ordered by number of neighboring countries.
Spatial Database Florida Flood Zone Map
Spatial Database Homeland Defense & Evacuation Planning Preparation of response to a chem-bio attack Plan evacuation routes and schedules Help public officials to make important decisions Guide affected population to safety Base Map Weather Data Plume Dispersion Demographics Information Transportation Networks ( Images from www.fortune.com )
Graph Database Find the shortest route from FAU to PBI. Find the nearest gas station. Find the evacuation routes in building.
Graph Database Dynamic social network Airline flight schedule Emerging trends Evolving network Propagation models - Dynamic migration Price Routes Schedules - Peak load - Cascading blackout Spatio-temporal road network Power distribution Routing planning Shortest path
Graph Database Connected Vehicles Gartner reported that about one in five vehicles on the road worldwide will have some form of wireless network connection by 2020, amounting to more than 250 million connected vehicles [1]. Connected vehicles have the potential to transform the way Americans travel through creation of data within vehicles that includes cars, buses, trucks, trains, traffic signals, cell phones, and other devices. Apple CarPlay Waze (Crowd-Sourced Data) Connected Vehicle - Next Generation ITS (US DOT) Connected Taxi Service
Take Home Message Introduction to Databases History of Database Relational Databases Future Generation of Databases