CS186 Class Wrap-Up R&G Chapters 1-28 Lecture 28.

Slides:



Advertisements
Similar presentations
CS 540 Database Management Systems
Advertisements

Final Exam Coverage. E/R Converting E/R to Relations. SQL. –Joins and outerjoins –Subqueries –Aggregations –Views –Inserts, updates, deletes –Ordering.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
Introduction. 
Midterm Exam Chapters 1,2,3,5, 6,7 (closed book) March 11, 2014.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
1 Final Review Tuesday, March 6, The Final Date: Tuesday, March 13, 2007 Time: 6:30 - 8:30 Room: EE 037 You must come to campus Open book exam.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CPSC-310 Database Systems
CS4222 Principles of Database System
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
Introduction to Database Systems
CS 440 Database Management Systems
Diskusi-5 Sebutkan perangkat (tools) yang berpotensi mendukung kebutuhan tugas-tugas manajerial (management work) Jelaskan enam karakteristik informasi.
CS422 Principles of Database Systems Course Overview
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
Latihan Answer the following questions using the relational schema from the Exercises at the end of Chapter 3: Create the Hotel table using the integrity.
Diskusi-16 Buatlah ringkasan tentang pertimbangan dalam desain yang ergonomis pada tiga perangkat utama komputer yaitu monitor, keyboard dan mouse (lihat.
Database Application Development
Hash-Based Indexes Chapter 11
Latihan Create a separate table with the same structure as the Booking table to hold archive records. Using the INSERT statement, copy the records from.
Tugas-05 a. Sebutkan primary key masing-masing tabel
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
File Organizations Chapter 8 “How index-learning turns no student pale
Database management concepts
Physical Database Design
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#27: Final Review
Crash Recovery, Part 2 R&G - Chapter 18
B+-Trees and Static Hashing
Tree-Structured Indexes
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Hash-Based Indexes R&G Chapter 10 Lecture 18
Hash-Based Indexes Chapter 10
Selected Topics: External Sorting, Join Algorithms, …
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Introduction to Database Systems CSE 444 Lecture 23: Final Review
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Database Management Systems CSE594
Final Review Topics Chapter 4 SQL,
Database management concepts
Lecture 30: Final Review Wednesday, December 6, 2000.
Evaluation of Relational Operations: Other Techniques
CSE594: REVIEW.
Database Systems (資料庫系統)
Distributed Databases
Lecture 30: Final Review Wednesday, December 10, 2003.
Introduction to Database Systems
Chapter 11 Instructor: Xin Zhang
Tree-Structured Indexes
Introduction to Database Systems CSE 444 Lecture 23: Final Review
Final Review Friday, December 8, 2006.
Physical Database Design
File Organizations and Indexing
COSC 3480 Projects & Homeworks Fall 2003
Database Application Development
Presentation transcript:

CS186 Class Wrap-Up R&G Chapters 1-28 Lecture 28

Administrivia Final Exam –Friday 12/12, 5pm – 8pm, Room 4 LeConte –You may have 2 pages of notes, both sides –The exam is cumulative Final Exam Review –Tuesday 12/9, 1pm-3pm, 306 Soda Hall Homework 5 –Due Monday, 12/8

News Winter Consulting’s 2003 survey of Largest DBs – –The largest single database is 29,232 GB! –That’s a single database at France Telecom –Many companies have TBs of data, but usually spread out among multiple databases, file systems, etc. In 2001, largest DB was ~10TB

News (cont.) – Top Transaction Processing DBs 1.Land Registry, 18.3 terabytes 2.BT plc, 11.7 terabytes 3.United Parcel Service, 9.0 terabytes 4.Caica Econômica Federal, 6.9 terabytes 5.US Patent and Trademark Office, 5.4 terabytes 6.Verizon Communications, 5.3 terabytes 7.Bureau of Customs and Border Protection, 4.1 TB 8.Hewlett Packard, 3.2 terabytes 9.Boeing, 3.1 terabytes 10.CheckFree Corp, 2.9 terabytes

News (cont) – Top Decision Support DBs 1.France Telecom, 29.2 terabytes 2.AT&T, 26.3 terabytes 3.SBC, 24.8 terabytes 4.Anonymous, 16.2 terabytes 5.Amazon.com, 13.0 terabytes 6.Kmart, 12.6 terabytes 7.Claria Corp., 12.1 terabytes 8.HIRA, 11.9 terabytes 9.FedEx Services, 10.0 terabytes 10.Vodafone, 9.1 terabytes

Lessons? (from the survey and this course) DBs are a huge part of business today Companies have *lots* of data –(imagine tuning UPSs database with 41 billion rows!) DBs are based on theory of data modelling, with lots of practical data management on top –nice mix of theoretical and practical In most jobs, useful to understand how DBs work

Today What topics did we cover? What topics did we *not* cover?

First, what topics did we not cover? In the book: –Chapter 21 – Security and Authorization –Chapter 22 – Parallel and Distributed DBs –Chapter 23 – Object-Database Systems –Chapter 24 – Deductive Databases –Chapter 25 – Data Warehousing and Decision Support –Chapter 27 – XML Data –Chapter 28 – Spatial Data Management Not in the book –Federated Databases...

And what topics did we cover? Chapters 1-20, and 26 Database and Data Model basics (1-3)416% Query Languages (4-5)416% Integrating DBs with other systems (6-7)28% Storing data in memory and disk (8-9)28% Tree and Hash Indexes (10-11)28% Join/Sort cost, Query Optimization (12-15)312% Concurrency Control & Recovery (16-18)520% Normal Forms, Database Design (19)28% Database Tuning (20)14% Data Mining (26)14%

1. Overview of Database Systems What is a Database? A Database System? What are the useful characteristics of DBs? When should you use a database? When is the file system better?

2. Database Design/ER Models Databases support many levels of abstraction –possible to design at abstract level in one form, store data in very different form The E-R Model –Useful for design, easier for human to understand –Specify entities, attributes, relationships –Possible to convert ER schemas to Relational Schemas

3. The Relational Model Most common data model for databases Based on tables: rows and columns Tables connected using key/foreign keys Integrity Constraints –Domain constraints for field values –Referential integrity for keys/foreign keys –Other constaints specified by real world e.g. 0.0 <= gpa <= 4.0

4. Relational Algebra and Calculus Relational algebra –Operators that act on sets of tuples –σ, Π, , –,  etc. –“procedural” Relational Calculus –Uses first-order logic to describe query result –does not describe how to get result, i.e. declaritive –studied Tuple Relational Calculus, variables are tuples {S |S  Sailors  S.rating > 7}

5. SQL: Queries, Constraints, Triggers Data Definition Language (DDL) –Create Table –Constraints & Triggers Data Manipulation Language (DML) SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification Set Operations, subqueries, etc.

6. Database Applications How to access DBs from programs –embedded SQL, SQLJ –Dynamic APIs: ODBC, JDBG –Cursors: a way to iterate over relations –Stored procedures in database language Accessing other programs from databases –Extending postgres with C code

7. Internet Applications Internet basics: URIs, HTTP stateless protocol Web data formats: XML, HTML, DTD Different architectures –Single-tier –Client-server (thick or thin client) –Three-tier architecture Web browser/thin client App server running business logic Database maintaining data

8. Storage and Indexing Different file organizations –Heap Files (unordered) –Sorted Files –Clustered Files –Unclustered Tree –Unclustered Hash Tradeoffs in I/O costs for various operations

9. Storing Data: Disks and Files Hierarchy of storage Keeping data in files on disk –How to arrange fields into records –How to arrange records into pages –How to arrange pages into files Managing disk and memory –Buffer management –LRU, MRU, Clock, etc.

10. Tree-Structured Indexes Trees best for range queries, o.k. for equality ISAM –less common, usually best for data that doesn’t change –index doesn’t adjust, instead uses overflow pages if leaves fill B-Trees –present in virtually all databases –tree adjusts index to stay balanced –you should understand these pretty well after Hw4

11. Hash-Based Indexes Hash indexes best for equality, useless for range queries Static hashing –only good when data doesn’t change –uses overflow buckets Extendible hashing –uses directory of buckets, when overflow, double directory size –never needs overflow buckets Linear hashing –no directory, just a number indicating which buckets have split –may need overflow buckets, but doesn’t need directory

12. Overview of Query Evaluation System catalogs – info about all tables –includes statistics about field values Access paths – how to get at tuples –file scan, indexes Query plan – tree of relational operators

13. External Sorting Database can sort any amount of info, even if it doesn’t fit in memory Sort runs that fit in memory, then merge sorted runs together Used in Hw5

14. Evaluating Relational Operators How to implement: –Selection –Projection –Join Algorithms: Nested Loops Indexed Nested Loops Sort-Merge Join Hash-Join

15. A Typical Relational Query Optimizer Break query into query blocks Enumerate possible query plans Evaluate cost for each, choose cheapest

16. Overview of Transactions Transactions, unit of atomicity ACID properties anomolies with concurrent execution Introduction to logging

17. Concurrency Control Anomalies Precedences Graphs Schedule Charateristics –Seriazable, View Serializable, Conflict Serializable, Recoverable, Avoids Cascading Abort, Strict Locking approaches: 2PL, strict 2PL –dealing with deadlock –Hierarchical locking –Locking in B-Trees Non-locking approaches –Optimistic CC –Timestamp CC –Multiversion CC

18. Crash Recovery Effects of Buffer Management on recovery Write-ahead log Transaction abort Checkpointing Aries algorithm –Analysis phase –Redo phase –Undo phase

19. Schema Refinement & Normal Forms Functional dependencies –A  B, whenever A is the same, B must be same FDs allow us to determine candidate keys, normal forms, qualities of decomposition Tradeoffs between data replication, dependency preservatn Always must have lossless join decompositions BCNF has little replication, may need to join to check FDs 3NF may have replication, but can preserve FDs

20. Physical Database Design and Tuning Once a DB is running, many changes may improve performance First need to understand workload –What are typical queries? Which queries are most important? Indexes – what will improve queries Schema Changes –denormalize to reduce joins –supernormalize to reduce table size, contention Rewriting Queries –avoid queries that the optimizer will do poorly on

24. Data Mining What is Data Mining? Process of Data Mining Different classes of DM Algorithms –Supervised –Unsupervised

Summary Databases are highly important today DB Design based on theoretical foundation Numerous practical/implementation issues addressed to make them run efficiently This course covered enough practical and theoretical so you can use and understand DBs