Rutgers University CS541: Database Systems Spring 2008 Computer Science Department Rutgers University.

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Advertisements

Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Chapter 1 Instructor: Murali Mani Database Management Systems.
1 541: Database Systems S. Muthu Muthukrishnan. 2 Some Data Collections I Have Played With….  Wireless call detail records.  U. S. Patents.  AskJeeves.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Wang-Chien Lee
Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
Introduction to Database Systems Ch. 1, Ch. 2 Mr. John Ortiz Dept. of Computer Science University of Texas at San Antonio.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Deborah Strahman
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Johannes Gehrke
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I Introduction.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
1 Introduction to Database Systems Ref. Ramakrishnan & Gehrke Chapter 1.
1 CENG 302 Introduction to Database Management Systems Nihan Kesim Çiçekli URL:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Ethan Jackson
CSCD34 - Data Management Systems,- A. Vaisman1 CSC D34 - Data Management Systems Instructor: Alejandro Vaisman University of Toronto.
Akram Jalal Ahlia University MIS College of Business and Finance 1.
CSC343H – Introduction to Databases
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
CSC 411/511: DBMS Design 1 1 Dr. Nan WangCSC411_L1_Overview 1 Overview of Database Systems (Chapter 1)
Introduction. 
Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke.
CS6530 Graduate-level Database Systems Prof. Feifei Li.
1 CS862 - Advanced Database Systems Sang H. Son
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Weichao Wang.
Database Management Systems
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Overview of Database Systems.
Database Organization and Design
DATABASE MANAGEMENT SYSTEMS TERM B. Tech II/IT II Semester UNIT-I PPT SLIDES Text Books: (1) DBMS by Raghu Ramakrishnan (2) DBMS by Sudarshan and.
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
ICS 321 Fall 2009 Introduction to Database Systems Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
1 CS3431 – Database Systems I Introduction Instructor: Mohamed Eltabakh
1 What Is a DBMS?  A very large, integrated collection of data.  Models real-world enterprise.  Entities (e.g., students, courses)  Relationships (e.g.,
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Johannes Gehrke
-ebru a.s ATTRIBUTE: Description of entities For employee entity number, name, deptno, age, adr, salary..etc are attributes. RECORD: Stores whole.
CpSc 8620: DBMS Design Introduction. 2 Attribution Materials and lecture notes in this course are adapted from various sources, including the authors.
1 Geog 357: Data models and DBMS. Geographic Decision Making.
1 CS462- Database Systems Sang H. Son
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Why do you learn database?? Chapter 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
BBM 371 – Data Management Lecture 3: Basic Concepts of DBMS Prepared by: Ebru Akçapınar Sezer, Gönenç Ercan.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xin Zhang.
Lecture 1: Overview of CSCI 485 Shahram Ghandeharizadeh Associate Professor Computer Science Department University of Southern California Presented by:
1 Database Management Systems Introduction Instructor: Oliver Schulte
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles Course administration.
COP Introduction to Database Structures
Introduction to Database Systems Chapter 1
CS4222 Principles of Database System
Introduction to Database Systems
Instructor: Elke Rundensteiner
Database Management Systems Chapter 1
Introduction to Database Systems
Overview of Database Systems Chpt 1
Instructor: Murali Mani
Database Management Systems Chapter 1
Database Management Systems
Database Management Systems CSE594
Sang H. Son CS6750: Database Systems The slides for this text are organized into chapters. This lecture covers Chapter 1. Chapter 1: Introduction.
Introduction to Database Systems
Database Management Systems Chapter 1
Introduction to Database Systems CSE 444
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Is the WWW a DBMS? = Fairly sophisticated search available
Introduction to Database Systems Chpt 1
Presentation transcript:

Rutgers University CS541: Database Systems Spring 2008 Computer Science Department Rutgers University

Administration  Instructor: Amélie Marian CoRE 324 (732) x0636 Office Hours: Mondays 3-4pm or by appointment  TA: Minji Wu Office Hours: TBA

Rutgers University Class Information  Web page:  Meets Thursday 3:20-6:20pm in CoRE A  Prerequisites: CS513 and working knowledge of C or Java or instructor’s permission

Rutgers University Grading (subject to small changes)  15% Homework (3-4) Due at beginning of class on due date  30% Programming Project In teams of two (same project) In three parts  Find data source and scenario  Implementation of standard index structures for query processing  Extend project to non-standard query processing (e.g., IR-style text retrieval, nearest-neighbor, top-k) In class project presentation and demonstration More details later  25% Midterm Exam Tentatively scheduled for March 13  30% Final Exam

Rutgers University Collaboration Policy  Check DCS Academic Integrity Policy  Homework and exams are to be done individually  Project is done only with your team partner

Rutgers University Supporting Material  Textbook: Raghu Ramakrishnan, Johannes Gehrke: Database Management Systems, 3rd edition, McGraw-Hill, 2002  Class website: Lecture Notes Research Papers (for advanced topics)

Rutgers University Communication  Please send me , come to my office hour, or contact Minij if you have questions on the material, complaints, or feedback on how to improve the course

Rutgers University Class Organization  Basics of Database Systems What is a DBMS? Why do we need one? How do we design one? What are the common problems in DBMS?  Information Management Text documents Structure and content Approximate querying  Advanced Topics in Data Management What is new and exciting in DB Research? How do we deal with huge amounts of data? What are the new challenges brought by the internet? How should DBMS evolve?

Rutgers University Short History of Data Management  Evolved from file systems (1960’s) Airline reservation systems Banking systems Corporate data  Relational DB systems (1970’s) Data organized in tables and relations that model real- world Storage structure transparent to user High-level query language Widely used today  New challenges Distributed Data (e.g., internet) Parallel Computing Bigger systems Multimedia Data Data Analysis Information Integration

Rutgers University What is a DBMS?  Powerful tool to efficiently manage large amounts of data Persistent storage (more flexible than a file system) Data manipulation (complex query language) Transaction management (simultaneous access to data)

Rutgers University Why Use a DBMS?  Data independence and efficient access.  Reduced application development time.  Data integrity and security.  Uniform data administration.  Concurrent access, recovery from crashes.

Rutgers University Why Study Databases?  Shift from computation to information at the “low end”: user-input information (a mess!) at the “high end”: scientific applications  Datasets increasing in diversity and volume. Digital libraries, interactive video, Human Genome project, EOS project... need for DBMS exploding  DBMS encompasses most of CS OS, languages, theory, AI, multimedia, logic

Rutgers University Basics of Database Systems: The ER Model  Conceptual design of database  Models real-world: Entities (Students, Professor, and Classes) Relationships (Amélie Marian teaches 541) Attributes are associated with entities (the room for 541 is CoRE A) Constraints of the data  Logical schema of the data

Rutgers University Basics of Database Systems: The Relational Model  A data model is a collection of concepts for describing data.  A schema is a description of a particular collection of data, using the a given data model.  The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields.  Two formal query languages Relational algebra Relational calculus  Powerful and widely used query language: SQL

Rutgers University Levels of Abstraction  Many views, single conceptual (logical) schema and physical schema. Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. * Schemas are defined using DDL; data is modified/queried using DML. Physical Schema Conceptual Schema View 1View 2View 3

Rutgers University Example: University Database  Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string)  Physical schema: Relations stored as unordered files. Index on first column of Students.  External Schema (View): Course_info(cid:string,enrollment:integer)

Rutgers University Data Independence *  Applications insulated from how data is structured and stored.  Logical data independence: Protection from changes in logical structure of data.  Physical data independence: Protection from changes in physical structure of data. * One of the most important benefits of using a DBMS!

Rutgers University Basics of Database Systems: Physical Storage and Index Structures Many alternatives exist, each ideal for some situations, and not so good in others: Heap (random order) files: Suitable when typical access is a file scan retrieving all records. Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed. Indexes: Data structures to organize records via trees or hashing.  Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields  Updates are much faster than in sorted files.

Rutgers University Basics of Database Systems: Query Processing  What are the best algorithms to evaluate queries on data Performance issues: space/time  Algorithms for evaluating relational operators use some simple ideas extensively: Indexing: to retrieve small set of data Iteration: Sometimes, faster to scan all tuples even if there is an index. (And sometimes, we can scan the data entries in an index instead of the table itself.) Partitioning: By using sorting or hashing, we can partition the data and replace an expensive operation by similar operations on smaller inputs.  Ideally: Want to find best plan. Practically: Avoid worst plans!

Rutgers University Basics of Database Systems: Transaction Processing  Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently.  Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.  DBMS ensures such problems don’t arise: users can pretend they are using a single- user system.

Rutgers University Basics of Database Systems: Logical Data Management  Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update anomalies  Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.  Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).  Decomposition should be used judiciously: Is there reason to decompose a relation? What problems (if any) does the decomposition cause?

Rutgers University Advanced Topics in Data Management: Information Retrieval and Web Search  Keyword search over text (unstructured) data  User Expectations: Many say “The first item shown should be what I want to see!” This works if the user has the most popular/common notion in mind, not otherwise.  Widely used today  Top-k query model

Rutgers University Advanced Topics in Data Management: Advanced Query Processing  New challenges: Proactive (and reactive) optimization Smart statistics collection to cope with fast changes Approximate query answering Online query processing Important answers first (top-k queries, skyline queries)

Rutgers University Advanced Topics in Data Management: XML and Web Data  No application interoperability in the web today: HTML not understood by applications  screen scraping brittle Database technology: client-server  still vendor specific  New Universal Data Exchange Format: XML XML = semi-structured data XML generated by applications XML consumed by applications Easy access: across platforms, organizations

Rutgers University Advanced Topics in Data Management: Data Mining Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. Valid: The patterns hold in general. Novel: We did not know the pattern beforehand. Useful: We can devise actions from the patterns. Understandable: We can interpret and comprehend the patterns.

Rutgers University Advanced Topics in Data Management: and more…  Distributed Databases  Parallel Databases  ORDBMS  Data Cleaning  Data Warehousing  Data Streams  …

Rutgers University Contact me for more information! If you are interested in advanced DB topics…  For-credit research projects available Top-k query processing Scoring XML data Web data management