Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Similar presentations


Presentation on theme: "Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11."— Presentation transcript:

1 Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11

2 Fall 20022 Administrative Stuff What you should know to take this class. Handouts: Syllabus and Homework 1. Resources: Text, TAs, Web site, bulletin board and office hours. Coursework: homeworks, exams, project. Computer accounts.

3 Fall 20023 What the subject is about Modeling and organization of data Efficient (expressive?) retrieval of data Reliable and consistent storage of data Not surprisingly, all these topics are interrelated.

4 Fall 20024 What is a DBMS? A database (DB) is a large, integrated collection of data. A DB models a real-world enterprise. A database management system (DBMS) is a software package designed to store and manage databases.

5 Fall 20025 Why study databases? Everybody needs them, i.e. $$$. There are lots of interesting problems, both in database research and in implementation. Good design is always a challenge.

6 Fall 20026 Connection to other areas of CS… Programming languages and software engineering (obviously) Algorithms (obviously) Logic, discrete math, and theory of computation “Systems” issues: concurrency, operating systems, file organization and networks.

7 Fall 20027 But 80% of the world’s data is not in a DB! Examples: -scientific data (large images, complex programs that analyze the data) -personal data -WWW

8 Fall 20028 Why don't we “program up” databases when we need them? For simple and small databases this is often the best solution. Flat files and grep get us a long way. We run into problems when –The structure is complicated (more than a simple table) –The database gets large –Many people want to use it simultaneously

9 Fall 20029 We might start by building a file with the following structure: This text file is easy to deal with. So there's no need for a DBMS! Example: Personal Calendar WhatDayWhenWhoWhere Lunch10/241pmRickJoe’s Diner CS12310/259amDr. EggheadMorris234 Biking10/269amJaneJane’s house Dinner10/266PMJaneCafé Le Boeuf

10 Fall 200210 Problem 1: Data Organization Consider the all-important “who” field. Do we also want to keep e-mail addresses, telephone numbers etc? Expand our file to look like: Now we are keeping our address book in our calendar and doing so redundantly. WhatWhenWho-nameWho-emailWho-tel …. Where …

11 Fall 200211 “Link” Calendar with Address Book? Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information. This link could be based on something as simple as the person's name.

12 Fall 200212 Problem 2: Efficiency Size of personal address book is probably less than one hundred entries, but there are things we'd like to do quickly and efficiently. –“Give me all appointments on 10/28” –“When am I next meeting Jim?” “Program” these as quickly as possible. Have these programs executed efficiently. What would happen if you were using a corporate calendar with hundreds of thousands of entries?

13 Fall 200213 Problem 3. Concurrency and Reliability Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess? Suppose the system crashes while we are changing the calendar. How do we recover our work?

14 Fall 200214 Transactions Key concept for concurrency is that of a transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry). Key concept for recoverability is that of a log : keeping track of all actions carried out by the db. Sounds like operating systems all over again!

15 Fall 200215 Database architecture -- the traditional view It is common to describe databases in two ways: –The logical structure. What users see. The program or query language interface. –The physical structure. How files are organized. What indexing mechanisms are used. Further it is traditional to split the logical level into two components: overall database design (conceptual) and the views that various users get to see.

16 Fall 200216 Three-level architecture View 1View 2…View N Physical Level (file organization, indexing) Schema Conceptual Level

17 Fall 200217 Data independence A user of a relational database system should be able to use SQL to query the database without knowing about how the precisely how data is stored, e.g. After all, you don't worry much how numbers are stored when you program some arithmetic or use a computer-based calculator. SELECT When, Where FROM Calendar WHERE Who = "Bill"

18 Fall 200218 More on data independence Logical data independence protects the user from changes in the logical structure of the data -- could completely reorganize the calendar “schema” without changing how I query it. Physical data independence protects the user from changes in the physical structure of data: could add an index on Who without changing how the user would write the query, but the query would execute faster (query optimization).

19 Fall 200219 That's the traditional view, but... Three-level architecture is not always achievable for database programmers. When databases get big, queries must be carefully written to achieve efficiency. There are databases over which we have no control. The Web is a giant, disorganized, database. There are also well-organized database on the web ( e.g., the Movie database) for which the terminology does not quite apply.

20 Fall 200220 In this course... Study relational databases, their design, how to query, what forms of indices to use. Beyond relational algebra: a logical model of data (Datalog), recursion. Beyond “first-normal form”: object-oriented databases, how to query, using OO design techniques. XML and semi-structured data models

21 Fall 200221 What we won’t cover in any depth... The “technology” of databases: –details of physical design –concurrency control –transaction management –query optimization (although a few of these issues will be briefly discussed)


Download ppt "Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11."

Similar presentations


Ads by Google