Introduction to Relational Databases

Slides:



Advertisements
Similar presentations
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Advertisements

Relational Algebra Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
The Relational Model. Introduction Introduced by Ted Codd at IBM Research in 1970 The relational model represents data in the form of table. Main concept.
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
1 Lecture 11: Basic SQL, Integrity constraints
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
SQL Review.
Introduction to Relational Databases Obtained from Portland State University.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28 Database Systems I The Relational Data Model.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
The Relational Model CS 186, Fall 2006, Lecture 2 R & G, Chap. 3.
SPRING 2004CENG 3521 The Relational Model Chapter 3.
1 Query Languages: How to build or interrogate a relational database Structured Query Language (SQL)
The Relational Model CS 186, Spring 2007, Lecture 2 Cow book Section 1.5, Chapter 3 Mary Roth.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
IST Databases and DBMSs Todd S. Bacastow January 2005.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…
The Relational Model These slides are based on the slides of your text book.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3 Modified by Donghui Zhang.
 Relational database: a set of relations.  Relation: made up of 2 parts: › Instance : a table, with rows and columns. #rows = cardinality, #fields =
Database Organization and Design
Instructor: Jinze Liu Fall Basic Components (2) Relational Database Web-Interface Done before mid-term Must-Have Components (2) Security: access.
1 The Relational Model. 2 Why Study the Relational Model? v Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. v “Legacy.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
1.1 CAS CS 460/660 Relational Model. 1.2 Review E/R Model: Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate.
1 Database Systems ( 資料庫系統 ) October 4, 2005 Lecture #3.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Example: Taxonomy of Organisms Hierarchy of categories: –Kingdom - phylum – class – order – family – genus - species –How would you design a relational.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Concepts 2440: 180 Database Concepts Instructor:
1 Database Systems ( 資料庫系統 ) October 4/5, 2006 Lecture #3.
The Relational Model Content based on Chapter 3 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw Hill, 2003.
12/2/2015CPSC , CPSC , Lecture 41 Relational Model.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CMPT 258 Database Systems The Relationship Model (Chapter 3)
1 Databases II (Fall 2009) Professor: Iluju Kiringa SITE 5072.
The Relational Model Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
ICS 421 Spring 2010 Relational Model & Normal Forms Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/19/20101Lipyeow.
CS34311 The Relational Model. cs34312 Why Relational Model? Currently the most widely used Vendors: Oracle, Microsoft, IBM Older models still used IBM’s.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
Faeez, Franz & Syamim.   Database – collection of persistent data  Database Management System (DBMS) – software system that supports creation, population,
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Advanced Computing Data Bases: Basic concepts Based in part on open access material from: Database Management Systems by Raghu Ramakrishnan and Johannes.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
SQL Basics Review Reviewing what we’ve learned so far…….
Introduction to Databases & SQL Ahmet Sacan. What you’ll need Firefox, SQLite plugin Mirdb and Targetscan databases.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
 Database – collection of persistent data  Database Management System (DBMS) – software system that supports creation, population, and querying of a.
Chapter 2: Relational Model
CS 186, Fall 2006, Lecture 2 R & G, Chap. 3
Chapter 2: Intro to Relational Model
Relational Algebra Chapter 4, Part A
Introduction to Relational Databases
Translation of ER-diagram into Relational Schema
Lecture#7: Fun with SQL (Part 2)
The Relational Model Relational Data Model
CS 405G: Introduction to Database Systems
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Presentation transcript:

Introduction to Relational Databases

Introduction Database – collection of persistent data Database Management System (DBMS) – software system that supports creation, population, and querying of a database

Relational Database Relational Database Management System (RDBMS) Consists of a number of tables and single schema (definition of tables and attributes) Students (sid, name, login, age, gpa) Students identifies the table sid, name, login, age, gpa identify attributes sid is primary key

An Example Table Students (sid: string, name: string, login: string, age: integer, gpa: real) sid name login age gpa 50000 Dave dave@cs 19 3.3 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 3.2 53650 smith@math 3.8 53831 Madayan madayan@music 11 1.8 53832 Guldu guldu@music 12 2.0

Another example: Courses Courses (cid, instructor, quarter, dept) cid instructor quarter dept Carnatic101 Jane Fall 06 Music Reggae203 Bob Summer 06 Topology101 Mary Spring 06 Math History105 Alice History

Keys Primary key – minimal subset of fields that is unique identifier for a tuple sid is primary key for Students cid is primary key for Courses Foreign key –connections between tables Courses (cid, instructor, quarter, dept) Students (sid, name, login, age, gpa) How do we express which students take each course?

Many to many relationships In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table Student Foreign key Enrolled sid name login 50000 Dave dave@cs 53666 Jones jones@cs 53688 Smith smith@ee 53650 smith@math 53831 Madayan madayan@music 53832 Guldu guldu@music cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 53666

Relational Algebra Collection of operators for specifying queries Query describes step-by-step procedure for computing answer (i.e., operational) Each operator accepts one or two relations as input and returns a relation as output Relational algebra expression composed of multiple operators

Basic operators Selection – return rows that meet some condition Projection – return column values Union Cross product Difference Other operators can be defined in terms of basic operators

Example Schema (simplified) Courses (cid, instructor, quarter, dept) Students (sid, name, gpa) Enrolled (cid, grade, studid)

Selection σgpa>3.3(S1) Select students with gpa higher than 3.3 from S1: σgpa>3.3(S1) S1 sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 sid name gpa 53666 Jones 3.4 53650 Smith 3.8

Projection Project name and gpa of all students in S1: name, gpa(S1) Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 name gpa Dave 3.3 Jones 3.4 Smith 3.2 3.8 Madayan 1.8 Guldu 2.0

Combine Selection and Projection Project name and gpa of students in S1 with gpa higher than 3.3: name,gpa(σgpa>3.3(S1)) Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 name gpa Jones 3.4 Smith 3.8

Set Operations Union (R U S) Intersection (R ∩ S) All tuples in R or S (or both) R and S must have same number of fields Corresponding fields must have same domains Intersection (R ∩ S) All tuples in both R and S Set difference (R – S) Tuples in R and not S

Set Operations (continued) Cross product or Cartesian product (R x S) All fields in R followed by all fields in S One tuple (r,s) for each pair of tuples r  R, s  S

Example: Intersection sid name gpa 53666 Jones 3.4 53688 Smith 3.2 53700 Tom 3.5 53777 Jerry 2.8 53832 Guldu 2.0 sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 sid name gpa 53666 Jones 3.4 53688 Smith 3.2 53832 Guldu 2.0 S1  S2 =

Joins Combine information from two or more tables Example: students enrolled in courses: S1 S1.sid=E.studidE S1 E Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 53666

Joins S1 E Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 3.8 53831 Madayan 1.8 53832 Guldu 2.0 cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 53666 Note that join can also be thought of as cross product and select sid name gpa cid grade studid 53666 Jones 3.4 History105 B 53650 Smith 3.8 Topology112 A 53831 Madayan 1.8 Carnatic101 C 53832 Guldu 2.0 Reggae203

Relational Algebra Summary Algebras are useful to manipulate data types (relations in this case) Set-oriented Brings some clarity to what needs to be done Opportunities for optimization May have different expressions that do same thing We will see examples of algebras for other types of data in this course

Intro to SQL CREATE TABLE SELECT-FROM-WHERE INSERT Create a new table, e.g., students, courses, enrolled SELECT-FROM-WHERE List all CS courses INSERT Add a new student, course, or enroll a student in a course

Create Table CREATE TABLE Enrolled (studid CHAR(20), cid CHAR(20), grade CHAR(20), PRIMARY KEY (studid, cid), FOREIGN KEY (studid) references Students)

Select-From-Where query “Find all students who are under 18” SELECT * FROM Students S WHERE S.age < 18

Queries across multiple tables (joins) “Print the student name and course ID where the student received an ‘A’ in the course” SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid = E.studid AND E.grade = ‘A’

Other SQL features MIN, MAX, AVG COUNT, DISTINCT ORDER BY, GROUP BY Find highest grade in fall database course COUNT, DISTINCT How many students enrolled in CS courses in the fall? ORDER BY, GROUP BY Rank students by their grade in fall database course

Views Virtual table defined on base tables defined by a query Single or multiple tables Security – “hide” certain attributes from users Show students in each course but hide their grades Ease of use – expression that is more intuitively obvious to user Views can be materialized to improve query performance

Views Suppose we often need names of students who got a ‘B’ in some course: CREATE VIEW B_Students(name, sid, course) AS SELECT S.sname, S.sid, E.cid FROM Students S, Enrolled E WHERE S.sid=E.studid and E.grade = ‘B’ name sid course Jones 53666 History105 Guldu 53832 Reggae203

Indexes Idea: speed up access to desired data “Find all students with gpa > 3.3 May need to scan entire table Index consists of a set of entries pointing to locations of each search key

Types of Indexes Clustered vs. Unclustered Primary vs. Secondary Clustered- ordering of data records same as ordering of data entries in the index Unclustered- data records in different order from index Primary vs. Secondary Primary – index on fields that include primary key Secondary – other indexes

Example: Clustered Index Sorted by sid sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 3.2 53831 Madayan 1.8 53832 Guldu 2.0 50000 53600 53800

Example: Unclustered Index Sorted by sid Index on gpa sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 3.2 53831 Madayan 1.8 53832 Guldu 2.0 1.8 2.0 3.2 3.3 3.4 3.8

Comments on Indexes Indexes can significantly speed up query execution But inserts more costly May have high storage overhead Need to choose attributes to index wisely! What queries are run most frequently? What queries could benefit most from an index? Preview of things to come: SDSS

Summary: Why are RDBMS useful? Data independence – provides abstract view of the data, without details of storage Efficient data access – uses techniques to store and retrieve data efficiently Reduced application development time – many important functions already supported Centralized data administration Data Integrity and Security Concurrency control and recovery

So, why don’t scientists use them? “I tried to use databases in my project, but they were just too [slow | hard-to-use | expensive | complex] . So I use files”. Gray and Szalay, Where Rubber Meets the Sky: Bridging the Gap Between Databases and Science

Some other limitations of RDBMS Arrays Hierarchical data

Example: Taxonomy of Organisms Hierarchy of categories: Kingdom - phylum – class – order – family – genus - species How would you design a relational schema for this? Animals Chordates Arthropods Vertebrates insects spiders crustaceans birds reptiles mammals