CS 317 - Data Management and Information Processing.

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Advertisements

Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Chapter 1 Instructor: Murali Mani Database Management Systems.
1 541: Database Systems S. Muthu Muthukrishnan. 2 Some Data Collections I Have Played With….  Wireless call detail records.  U. S. Patents.  AskJeeves.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Wang-Chien Lee
Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
CSC443 Database Management Course Introduction Professor Pepper adapted from presentations given by Professor Juliana Freire & Karl Aberer & Yan Chen &
Introduction to Database Systems Ch. 1, Ch. 2 Mr. John Ortiz Dept. of Computer Science University of Texas at San Antonio.
The Relational Model CS 186, Spring 2006, Lecture 2 R & G, Chap. 1 & 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Deborah Strahman
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Johannes Gehrke
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I Introduction.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
1 Introduction to Database Systems Ref. Ramakrishnan & Gehrke Chapter 1.
1 CENG 302 Introduction to Database Management Systems Nihan Kesim Çiçekli URL:
LBSC 690: Session 7 Relational Databases
ECE 569 Database System EngineeringFall 2004 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Ethan Jackson
CSCD34 - Data Management Systems,- A. Vaisman1 CSC D34 - Data Management Systems Instructor: Alejandro Vaisman University of Toronto.
CSC343H – Introduction to Databases
Introduction. 
Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke.
COP Introduction to Database Systems Prof. Feifei Li.
CS6530 Graduate-level Database Systems Prof. Feifei Li.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
1 CS862 - Advanced Database Systems Sang H. Son
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Weichao Wang.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Overview of Database Systems.
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
1 Module 3 The concept of data processing Major issues in database management.
ICS 321 Fall 2009 Introduction to Database Systems Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
1 What Is a DBMS?  A very large, integrated collection of data.  Models real-world enterprise.  Entities (e.g., students, courses)  Relationships (e.g.,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Johannes Gehrke
-ebru a.s ATTRIBUTE: Description of entities For employee entity number, name, deptno, age, adr, salary..etc are attributes. RECORD: Stores whole.
ICS 321 Spring 2011 Introduction to Database Systems Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/12/20111Lipyeow.
CpSc 8620: DBMS Design Introduction. 2 Attribution Materials and lecture notes in this course are adapted from various sources, including the authors.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Professor: Iluju Kiringa
1 Geog 357: Data models and DBMS. Geographic Decision Making.
1 CS462- Database Systems Sang H. Son
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Why do you learn database?? Chapter 0.
CS 405G: Introduction to Database Systems Lecture 1: Introduction.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
BBM 371 – Data Management Lecture 3: Basic Concepts of DBMS Prepared by: Ebru Akçapınar Sezer, Gönenç Ercan.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xin Zhang.
1 Database Management Systems Introduction Instructor: Oliver Schulte
COP Introduction to Database Structures
Introduction to Database Systems Chapter 1
Database Management Systems Chapter 1
Introduction to Database Systems
Instructor: Elke Rundensteiner
Database Management Systems Chapter 1
CS Data Management and Information Processing
Introduction to Database Systems
Overview of Database Systems Chpt 1
Instructor: Murali Mani
Database Management Systems Chapter 1
Database Management Systems
CS Data Management and Information Processing
Database Management Systems CSE594
Sang H. Son CS6750: Database Systems The slides for this text are organized into chapters. This lecture covers Chapter 1. Chapter 1: Introduction.
Introduction to Database Systems
Data Management and Information Processing
Database Management Systems Chapter 1
Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure.
Is the WWW a DBMS? = Fairly sophisticated search available
Introduction to Database Systems Chpt 1
Presentation transcript:

CS Data Management and Information Processing

Logistics Instructor Yan Chen Office Hours: Th. 2-4pm or by appointment, Rm 330, 1890 Maple Ave. TA Yi Qiao Office Hours: Fri. 2-4pm, Rm 246, 1890 Maple

Prerequisites Required: CS110, CS 111 or programming experience Course Materials Required: A First Course in Database Systems, (2nd Edition), Jeffrey Ullman and Jennifer Widom, Prentice Hall, Recommended: Database Management Systems, Third Edition, Raghu Ramakrishnan and Johannes Gehrke, McGraw-Hill, 2002.

Grading Homework (4-5 sets) 20% Projects 30% –Use Microsoft Access to design a database in two projects. –The first project is on the entity-relational (ER) model, –The second project is on relational algebra (RA) and relational calculus (RC). Final 25% –Exams in-class, closed-book, non-cumulative Late policy: 10% each day after the due date No cheating

Communication Web page: Recitation: Tu, Th or Fri? 5-6pm, Room 381, 1890 Maple. –TA lectures on the homework and projects, and help to prepare the exams. Newsgroup are available –cs.317 (course announcement, and posting Q & A) Send s to instructor and TA for questions inappropriate in newsgroup Course outline (see it online)

What Is a Database System? Database: a very large, integrated collection of data. Models a real-world enterprise – Entities (e.g., teams, games) – Relationships (e.g., The Forty-Niners are playing in The Superbowl) – More recently, also includes active components, often called “business logic”. (e.g., the BCS ranking system) A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases.

Database Systems: Then

Database Systems: Today From Friendster.com on-line tour

Other Ways Databases Make Life Better? “Players could finally sign up for the Star Wars Galaxies game last week as Sony opened up registration to the public.” “Once players got in to the game they found that the game servers were offline because of database problems.” “Some players spent hours tuning their in- game characters only to find that crashes deleted all their hard work.” Source: BBC News Online, July 1, 2003.

Other databases you may use

Is the WWW a DBMS? Fairly sophisticated search available –crawler indexes pages on the web –Keyword-based search for pages But, currently –data is mostly unstructured and untyped –search only: can’t modify the data can’t get summaries, complex combinations of data –few guarantees provided for freshness of data, consistency across data items, fault tolerance, … –Web sites typically have a DBMS in the background to provide these functions. The picture is changing –New standards e.g., XML, Semantic Web can help data modeling –Research groups (e.g., at Berkeley) are working on providing some of this functionality across multiple web sites. =

“Search” vs. Query What if you wanted to find out which actors donated to John Kerry’s presidential campaign? Try “actors donated to john kerry” in your favorite search engine.

A “Database Query” Approach

Q: How do you write programs over a subsystem when it promises you only “???” ? A: Very, very carefully!! Is a File System a DBMS? Thought Experiment 1: –You and your project partner are editing the same file. –You both save it at the same time. –Whose changes survive? = Thought Experiment 2: –You’re updating a file. –The power goes out. –Which of your changes survive? A) Yours B) Partner’sC) BothD) NeitherE) ??? A) AllB) NoneC) All Since Last SaveD) ???

Current Commercial Outlook A major part of the software industry: –Oracle, IBM, Microsoft, Sybase –also Informix (now IBM), Teradata –smaller players: java-based dbms, devices, OO, … Well-known benchmarks (esp. TPC) Lots of related industries –data warehouse, document management, storage, backup, reporting, business intelligence, app integration Relational products dominant and evolving –adapting for extensibility (user-defined types), adding native XML support. Open Source coming on strong –MySQL, PostgreSQL, BerkeleyDB

Why Study Databases?? Shift from computation to information –always true for corporate computing –Web made this point for personal computing –more and more true for scientific computing Need for DBMS has exploded in the last years –Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. –Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network DBMS encompasses much of CS in a practical discipline –OS, languages, theory, AI, multimedia, logic –Yet traditional focus on real-world apps ?

What’s the intellectual content? representing information –data modeling languages and systems for querying data –complex queries with real semantics* –over massive data sets concurrency control for data manipulation –controlling concurrent access –ensuring transactional semantics reliable data storage –maintain data semantics even if you pull the plug * semantics: the meaning or relationship of meanings of a sign or set of signs

Describing Data: Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using a given data model. The relational model of data is the most widely used model today. –Main concept: relation, basically a table with rows and columns. –Every relation has a schema, which describes the columns, or fields.

Levels of Abstraction Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. (sometimes called the ANSI/SPARC model) Physical Schema Conceptual Schema View 1View 2View 3 DB Users

Example: University Database Conceptual schema: – Students(sid: string, name: string, login: string, age: integer, gpa:real) – Courses(cid: string, cname:string, credits:integer) – Enrolled(sid:string, cid:string, grade:string) External Schema (View): –Course_info(cid:string,enrollment:integer) Physical schema: –Relations stored as unordered files. –Index on first column of Students. Physical Schema Conceptual Schema View 1View 2View 3 DB

Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Q: Why are these particularly important for DBMS? Physical Schema Conceptual Schema View 1View 2View 3 DB

Queries, Query Plans, and Operators System handles query plan generation & optimization; ensures correct execution. SELECT eid, ename, title FROM Emp E WHERE E.sal > $50K SELECT E.loc, AVG(E.sal) FROM Emp E GROUP BY E.loc HAVING Count(*) > 5 SELECT COUNT DISTINCT (E.eid) FROM Emp E, Proj P, Asgn A WHERE E.eid = A.eid AND P.pid = A.pid AND E.loc <> P.loc Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, … EmployeesProjectsAssignments Emp Select  Emp Group(agg) HavingEmp Count distinct  Asgn Join Join Proj

Concurrency Control Concurrent execution of user programs: key to good DBMS performance. –Disk accesses frequent, pretty slow –Keep the CPU working on several programs concurrently. Interleaving actions of different programs: trouble! –e.g., account-transfer & print statement at same time DBMS ensures such problems don’t arise. –Users/programmers can pretend they are using a single-user system. (called “Isolation”) –Thank goodness! Don’t have to program “very, very carefully”.

Transactions: ACID Properties Key concept is a transaction: a sequence of database actions (reads/writes). DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Each transaction, executed completely, must take the DB between consistent states or must not run at all. DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes. Note: can specify simple integrity constraints on the data. The DBMS enforces these. –Beyond this, the DBMS does not understand the semantics of the data. –Ensuring that a single transaction (run alone) preserves consistency is largely the user’s responsibility!

Ensuring Transaction Properites DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. DBMS ensures durability of committed Xacts even if system crashes. Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts: –Before a change is made to the database, the corresponding log entry is forced to a safe location. –After a crash, the effects of partially executed transactions are undone using the log. Effects of committed transactions are redone using the log. –trickier than it sounds!

The Log The following actions are recorded in the log: –Ti writes an object: the old value and the new value. Log record must go to disk before the changed page! –Ti commits/aborts: a log record indicating this action. Log is often duplexed and archived on “stable” storage. All log related activities (and in fact, all concurrency control related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.

Structure of a DBMS A typical DBMS has a layered architecture. The figure does not show the concurrency control and recovery components. Each database system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB These layers must consider concurrency control and recovery

Advantages of a DBMS Data independence Efficient data access Data integrity & security Data administration Concurrent access, crash recovery Reduced application development time So why not use them always? –Expensive/complicated to set up & maintain –This cost & complexity must be offset by need –General-purpose, not suited for special-purpose tasks (e.g. text search!)

…must understand how a DBMS works Databases make these folks happy... DBMS vendors, programmers –Oracle, IBM, MS, Sybase, … End users in many fields –Business, education, science, … DB application programmers –Build enterprise applications on top of DBMSs –Build web services that run off DBMSs Database administrators (DBAs) –Design logical/physical schemas –Handle security and authorization –Data availability, crash recovery –Database tuning as needs evolve

Summary (part 1) DBMS used to maintain, query large datasets. –can manipulate data and exploit semantics Other benefits include: –recovery from system crashes, –concurrent access, –quick application development, –data integrity and security. Levels of abstraction provide data independence –Key when dapp/dt << dplatform/dt

Summary, cont. DBAs, DB developers the bedrock of the information economy DBMS R&D represents a broad, fundamental branch of the science of computation