Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture.

Similar presentations


Presentation on theme: "CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture."— Presentation transcript:

1 CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture materials

2 1.2 About the course – Administrivia Instructor:  Ravi Kothuri, rkothuri@cs.bu.edurkothuri@cs.bu.edu Office, Hours: MCS 147, Mon/Wed 5-6PM and 7:30-8PM Teaching Fellow:  Panagiotis Papapetrou, cs460tf@cs.bu.edu cs460tf@cs.bu.edu MCS 147, Tue/Thu 11 - 12:30 AM Home Page:  http://www.cs.bu.edu/rkothuri Check frequently! Syllabus, schedule, assignments, announcements…

3 1.3 Grading Homeworks: 20%  4-5 assignments Midterm 20% Final 30% Projects 30%  5-6 parts

4 1.4 My Background Oracle Corporation PhD from University of California, Santa Barbara Research:  Multi-dimensional indexing  Mobile Databases  Spatial, GIS systems and CAD/CAM databases  Google Maps type of technologies for Enterprise  Geometric algorithms for terrain management, city modeling,…  Data Mining (spatial, financial, …)  RFID technologies  Semantic –web (RDF) technologies Book: “Pro Oracle Spatial”, Nov 2004 Teaching on invitation from Prof. George Kollios

5 1.5 Who uses Databases? Universities (records for students, faculty, courses,… Airlines (passengers, flights, luggage, …) Banking (customers, loans, …) Utilities (customers, usage history, bills); e.g. telecom, electric,.. Any Company: human resources  Employees, depts, facilities,… “Data is the primary and integral part of information industry. Proper management of the data using database technology is essential for any large-scale company, organization.’’

6 1.6 What is a Database System? Database: A very large collection of related data Models a real world enterprise:  Entities (e.g., teams, games / students, courses)  Relationships (e.g., The Patriots are playing in the Superbowl)  Even active components (e.g. “business logic”) DBMS: A software package/system that can be used to store, manage and retrieve data form databases Database System: DBMS+data (+ applications)

7 1.7 Why Study Databases?? Shift from computation to information  Always true for corporate computing  More and more true in the scientific world  and of course, Web DBMS encompasses much of CS in a practical discipline  OS, languages, theory, AI, logic

8 1.8 Managing data: A naïve approach Why not store everything on flat files: use the file system of the OS, cheap/simple… Name, Course, Grade John Smith, CS112, B Mike Stonebraker, CS234, A Jim Gray, CS560, A John Smith, CS560, B+ ………………… Yes, but not scalable…  Filesize limitations, access/update performance is slow,..

9 1.9 Problem 1 Data redundancy and inconsistency  Multiple file formats, duplication of information in different files (say, in different departments) John Smith, js@cs.bu.edu, CS112, Bjs@cs.bu.edu John Smith, Arts560, js@cs.bu.edu, B+js@cs.bu.edu Smith J, js@cs.bu.edu, Math212, Ajs@cs.bu.edu Why is this a problem?  Wasted space  Potential inconsistencies (multiple formats, John Smith vs Smith J.)

10 1.10 Problem 2 Data retrieval:  Find the students who took CS560  Find the students with GPA > 3.5 For every query we need to write a program! Need a Query/Retrieval engine that can support different ways to access data  Easy to write  Execute efficiently

11 1.11 Problem 3 Data Integrity  No support for sharing:  Prevent simultaneous modifications  No coping mechanisms for system crashes  No means of Preventing Data Entry Errors (checks must be hard-coded in the programs)  Security problems

12 1.12 Database Systems Database systems offer solutions to all the mentioned problems Database systems:  Support Modeling of the data  Provide Levels of Abstraction of the data  Provide programs to allow you to Retrieve/modify the data  SQL For easy, standard specification of queries  Query Optimizer To process your queries efficiently  Ensure Integrity Maintenance  Transaction Manager/Recovery Manager to ensure atomicity/integrity in concurrent transactions to ensure integrity after system crashes)  …………….

13 1.13 Database Systems Data Modeling Levels of Abstraction Data Retrieval Data Modification/Integrity Maintenance

14 1.14 Data Model A framework for describing  data  data relationships  data semantics  data constraints Entity-Relationship model (Ch. 6)  A set of entities to model real-world objects  Relationships among entities Relational model  Data as a set (or sets) of “records” or “tuples”  Each tuple in the set has the same set of attributes Other models:  object-oriented model: inheritance, abstraction,…  semi-structured data models, XML: tuples in a set can have different attributes

15 1.15 Entity-Relationship Model Example of schema in the entity-relationship model

16 1.16 Entity Relationship Model (Cont.) E-R model of real world  Entities (objects)  E.g. customers, accounts, bank branch  Relationships between entities  E.g. Account A-101 is held by customer Johnson  Relationship set depositor associates customers with accounts Widely used for database design  Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing

17 1.17 Relational Model Example of tabular data in the relational model customer- name Customer-id customer- street customer- city account- number Johnson Smith Johnson Jones Smith 192-83-7465 019-28-3746 192-83-7465 321-12-3123 019-28-3746 Alma North Alma Main North Palo Alto Rye Palo Alto Harrison Rye A-101 A-215 A-201 A-217 A-201 Attributes

18 1.18 Database Systems Data Modeling Levels of Abstraction Data Retrieval Data Modification/Integrity Maintenance

19 1.19 Levels of Abstraction Data storage Involves Complex data structures Hide complexity from users  Abstract views of the data (e.g., for storing a customer record)  Physical level: how a customer record is stored as bytes/words on disk Mostly hidden from database users/programmers  Logical level: describes “types” inside the database type customer = record name : string; street : string; city : integer; ssn; integer; end;  View level: application programs hide details of data types. Views can also hide information (e.g., ssn) for security purposes.

20 1.20 View of Data A logical architecture for a database system

21 1.21 Physical Level: Data Organization Data Storage (Ch 11) Where can data be stored?  Main memory  Secondary memory (hard disks)  Optical store  Tertiary store (tapes) Move data? Determined by buffer manager Mapping data to files? Determined by file manager

22 1.22 Database Architecture (physical level data organization) DBA DDL Interpreter Buffer Manager File Manager Data Schema DDL Commands Metadata Storage Manager Secondary Storage

23 1.23 Database Systems Data Modeling Levels of Abstraction Data Retrieval Data Modification/Integrity Maintenance

24 1.24 Data retrieval Queries (Ch 3, 4) Query = Declarative data retrieval describes what data, not how to retrieve it Ex. Give me the students with GPA > 3.5 vs Scan the student file and retrieve the records with gpa>3.5 Why? 1.Easier to write 2.Efficient to execute

25 1.25 Data retrieval Query Optimizer “compiler” for queries (aka “DML Compiler”) Plan ~ Assembly Language Program Optimizer Does Better With Declarative Queries: 1. Algorithmic Query (e.g., in C)  1 Plan to choose from 2. Declarative Query (e.g., in SQL)  n Plans to choose from Query Optimizer Query Evaluator Query Plan Data Query Processor

26 1.26 Specifying the Query using SQL SQL: widely used (declarative) non-procedural language  E.g. find the name of the customer with customer-id 192-83-7465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’  E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number Procedural languages: C++, Java, relational algebra

27 1.27 Data retrieval: Indexing (Ch 12) How to answer fast the query: “Find the student with SID = 101”? One approach is to scan the student table, check every student, retrurn the one with id=101… very slow for large databases Any better idea? 1 st keep student record over the SID. Do a binary search…. Updates… 2 nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the same time keep the records sorted! In databases we use the B+-tree (multiway search tree) 3 rd Use a hash table. Much faster for exact match queries… but cannot support Range queries. (Also, special hashing schemes are needed for dynamic data)

28 1.28 Root B+Tree ExampleB=4 120 150 180 30 100 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

29 1.29 Database Architecture (data retrieval) DBA Query Optimizer DDL Interpreter Query Evaluator Buffer Manager File Manager Data Schema DDL Commands User Query DB Programmer DML Precompiler Code w/ embedded queries Statistics Indices Metadata Query Processor Storage Manager Secondary Storage

30 1.30 Database Systems Data Modeling Levels of Abstraction Data Retrieval Data Modification/Integrity Maintenance

31 1.31 Data Integrity Transaction processing (Ch 15, 16) Why Concurrent Access to Data must be Managed? John and Jane withdraw $50 and $100 from a common account… Initial balance $300. Final balance=? It depends… John: 1. get balance 2. if balance > $50 3. balance = balance - $50 4. update balance Jane: 1. get balance 2. if balance > $100 3. balance = balance - $100 4. update balance

32 1.32 Data Integrity Recovery (Ch 17) Transfer $50 from account A ($100) to account B ($200) 1. get balance for A 2. If balanceA > $50 3. balance A = balance A – 50 4.Update balance A in database 5. Get balance for B 6. balance B = balance B + 50 7. Update balance B in database System crashes…. Recovery management

33 1.33 Database Architecture DB Programmer User DBA DML Precompiler Query Optimizer DDL Interpreter Query Evaluator Buffer Manager File Manager Data Statistics Indices Schema DDL Commands Query Code w/ embedded queries Transaction Manager Recovery Manager Metadata Integrity Constraints Secondary Storage Storage Manager Query Processor

34 1.34 Outline 1 st half of the course: application-oriented  How to develop database applications: User + DBA 2 nd part of the course: system-oriented  Learn the internals of a relational DBMS (Oracle..)  Last few lectures on Oracle-specific features such as XDB….


Download ppt "CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture."

Similar presentations


Ads by Google