Persistence Database Management Systems

Persistence Database Management Systems
Lecture 15 Persistence Database Management Systems

Resources so far We studied:
How memory is managed within the virtual memory space of a process. How messages can be exchanged between processes, using inter-process communication

Storage resource Storage remains unchanged beyond the end of the process execution. RTE provides services to processes to support persistent data storage. Persistent: the data survives after the process with which it was created has ended.

Persistent storage Can be shared among several processes We will discuss two types of persistent storage services: The file system Database services.

File System (FS) Services
The file system provides the following interface to processes: Map names to storage locations: Storage location=storage on disk. FS maps names ("/usr/local/file.ext") to concrete locations on disk. Interface through system calls: open(name), create(name), delete(name); queries getfiles(name-pattern). FS organizes space of file names in hierarchical structure of embedded directories.

File System Services Access through streams:
Stream interface to read and write a file as a stream of bytes. FS Allows to seek within the file, read its content and write new content. Allocation of space / automatic increase storage space (in writes).

File System (FS) Services (interface)
Locking files: Allows multiple processes to synchronize access on shared files. One can lock a whole file or just a segment of bytes within the content of a file.

File Systems FS provides abstract interface of system calls (open, read, write, seek etc) implemented by specific drivers which know how to interact with specific hardware devices. Distributed FS: process can request access to files stored on a remote machine: Local FS in RTE passes the system calls (open, read, etc) to FS-server on server Distributed file systems (for example, NFS and SAMBA) have complex locking mechanisms.

Database Management System (DBMS)
Another way to store persistent data. DBMS is present as a service. Processes connect through inter-process communication protocol (over TCP). DBMS manages storage, often by accessing FS on its side.

File systems vs. DBMS - differences
Data Model FS is very simple: a stream of bytes with the open/close, read/write and lock operation. DBMS defines a rich data model, manages serialization (encoding/decoding). Programmer must define the specific data model for the data he is interested in storing.

File systems vs. DBMS Data Independence –
DBMSs provide an abstract interface for data storage/access. Programmer does not need to define in which file, at which offset data is to be stored. One can query the DBMS by content (e.g., "retrieve all content that satisfies certain criteria").

File systems vs. DBMS Efficient concurrency DBMSs are built to support thousands of concurrent users. Ensure data is kept consistent Provide efficient management of such high concurrency.

Reduced application development time
File systems vs. DBMS Reduced application development time DBMSs allow data manipulation using a simple API. DBMS manages storage, query optimization, concurrency and integrity management.

DBMS Services Security management:
Verify that users are authorized to access data. Specific rights can be granted for each part of the data and each group of users.

DBMS Services Session and Transaction management:
Users can request the execution of data transactions. Transaction = complex data operation - read and modify many different objects Viewed from the outside as a single atomic operation- completely succeeds or not performed.

DBMS Services Query optimization and execution:
clients interact with DBMS by complex queries DBMS execute queries most efficiently Powerful query compilation: relational database servers. “Relational” == “tabular”.

Back-end services File and Access methods. Buffer management.
not directly visible to user: File and Access methods. Buffer management. Disk space management.

Transaction: Unit of work performed against a database management system Treated in a coherent and reliable way independently of other transactions. A transaction is atomic, consistent, isolated, durable (ACID). Composed of independent units of work, each reading and/or writing information to DB

Transaction Provide "all-or-nothing" - work units must complete or take no effect whatsoever. Must be isolated from other transactions Results must agree with existing constraints in database. If completed successfully transactions must be committed to the storage.

Transaction pattern A transaction is usually issued to the DB in a language like SQL, using a pattern similar to the following: Begin transaction. Execute data manipulations and queries. If no errors occur then commit transaction. If errors occur rollback the transaction.

Transactions resources
To implement transactions, the RTE employs the following logic: Identify resources that will be accessed during the transaction. SQL: full tables or parts of tables (rows). Acquire locks on all resources before transaction starts.

Transactions safety Perform transaction.
Modifications performed on snapshot in memory. isolation: data in transaction is not affected by other threads. intermediate modifications not committed are not visible to other threads.

Transactions safety Transaction committed: modifications are merged to persistent storage If transaction is rollbacked: modifications are deleted and the storage is not modified.

Data Models Data model = which data values can be sent for storage
Example: Object Oriented method: Primitive data types (int, bool, char etc) Complex data types References, methods etc. Dominates the programming world. DBMS - relational model – dominates DB world.

Relational Model "Relation" = "table", "relational" = "based on tables“… Logical representation of information. Database normalization - is the process of organizing the columns (attributes) and tables (relations) of a DB to reduce data redundancy and improve data integrity.

Case Study: A Data Model for Colleges
Data model of an academic college. Different departments, each specializing in teaching a certain domain. Computer Science / Physics department Department has a department head and a geographical location. Students may be enrolled in one department and registered to courses given by it.

building blocks Relation - set of records of same fields. header of a table Our example: students, departments, etc. Attribute (columns) - field within a relation (data type, name) Our example: student name, department head,

building blocks Domain - restriction of data types
our example: student ID is 9 digits 0-9 each. Records - (or tuple), single row within a relation. Integrity constraints - constraint on attribute with regard to all records our example: Student ID is unique.

Relational Algebra Selection - select specific rows from a relation.
Projection - projects specific columns Set operations - union, intersection, cross- product, set-difference on relations that have the same structure (same attributes) Join - cross product followed by selection and projection. Renaming operation - change name of columns.

Example: select Projection: Selection:

Join example: consider the tables

Cross join

Inner join

Take existing relations as arguments and return new relations as value
Properties Take existing relations as arguments and return new relations as value Columns of tables in DB are restricted to being simple values Columns must be primitive data types No embedded tables or array. Complex data - primary and foreign keys.

SQL (Structured Query Language)
Query language - interface with relational DBMS DDL: Data Definition Language define schemas, relations and domains. DML: Data Manipulation Language - queries, insertions, updates and deletions

Client interacts with SQL server by sending SQL queries (TCP protocol)
Receiving as answers a result-set. The client requests the rows of the result as needed (result may be large). Communication protocol between an SQL server and its clients is stateful and session oriented.

Keys subset of attributes of relation which uniquely identify rows.
Example: relation for students with attributes name, birthday, address and Social Security Number (SSN). Name is not a key (two students could have the same name). SSN is a key. Several attributes as a key: courses (course name, department, semester, lecturer) course name is not a key / but the pair (course name, semester) could be.

Database Normalization and Foreign Keys
minimize duplication of information (also known as "Don't Repeat Yourself" (DRY) Duplication = data appear twice in the domain of the problem modeled. Example: relation student (ID, Name, Course) a student is enrolled to 3 courses = 3 records duplicating the ID and Name

Solution: student data will be stored only in a single relation.
Use keys to remove data repetition use a single attribute SSN to refer to a student.

Join Retrieve the information on the student given a tuple from the courses relation, we need to operate a join operation on the SSN attribute (which is shared by the 2 relations) The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

In student relation, SSN is a key.
Foreign key In student relation, SSN is a key. In course relation, StudentSSN is not a key Several courses taken by a single student StudentSSN refers to the key of another table. It is foreign key in the course relation - refers to the student relation.

Data Integrity Constraints
Cannot create a tuple in the course relation that refers to a student that does not exist. New row in course relation: check foreign key exists in student relation. Update SSN field of course relation: check new StudentSSN exists in student relation. Delete row in student relation: verify no rows in course that refer to it Update row in student relation: verify no rows in course that refer to old value.

When defining a data model on a server, user declares which constraints are part of the model:
keys and foreign keys. DBMS server enforces constraints and verifies that insert, update, delete operations – have no conflict with constraints.

indicates lack of knowledge - we do not know what its value is
NULL belongs to all domains indicates lack of knowledge - we do not know what its value is example (1 = NULL) is false and (1 != NULL) is also false! (NULL = NULL) is also false. compare NULL with value, result is always false.

Indexing Representing the records in a relation in a data structure to allow efficient queries with respect to certain attributes. B-Tree / hash-tables query efficient cost time when inserting or deleting data. Implicit indexing - employed by DBMS for attributes defined as unique (keys).

Design Steps

Step A – Identify simple correlation among attributes
A Student has a name and ID. ignore Courses and Department for now - not simple - hold correlation to other attributes as well: Department has a name, location and head. Course has a name.

Step B – Identify unique attributes
Unique attribute – the most important feature. We wish to have at least one attribute in a relation that is unique Primary Key - a unique attribute that must have a value (NULL is not allowed). Student relation: ID = primary key. Courses relation = artificial course number. department ID.

in rare cases, no need to define primary key.
Any relation that should be later queried efficiently must have a primary key. in rare cases, no need to define primary key. example: logs stored in DB, only insertions are performed occasionally based on a time range. no requirement to uniquely identify a single row of a log - so there is no need to define a key on such a table.

Step C – Associate simple relations
associations between tables. adding in a table an attribute that refers to rows in another relation. example, a student is registered to a department (assume only one). add the primary key of the department relation as an attribute in the students relation foreign key - many-to-one relationship

step D – Represent Collections
describe an attribute as a collection. our example: courses to which a student is registered is a collection of values. model association - create table that correlates courses and students with foreign keys. Student2course: a relation with courseId as foreign key to courses.ID / studentId as a foreign key to Students.ID. model many-to-many associations - called "cross tables”

The College model Tables: departments, students, courses, student2course departments: ID integer as a PRIMARY KEY. Name short string that must be UNIQUE, Building integer students: ID integer as a PRIMARY KEY, Name short string, DepID integer as a FOREIGN KEY to departments.ID

courses: ID integer as a PRIMARY KEY, Name short string that must not be NULL
student2course: SID integer as a FOREIGN KEY to students.ID , CID integer as a FOREIGN KEY to courses.ID and Grade integer.

We add an explicit index on Students. Name and on Courses
We add an explicit index on Students.Name and on Courses.Name as we know many queries will be performed on those attributes

Persistence Database Management Systems

Similar presentations

Presentation on theme: "Persistence Database Management Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Persistence Database Management Systems

Similar presentations

Presentation on theme: "Persistence Database Management Systems"— Presentation transcript:

Similar presentations

About project

Feedback