Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 4, 2003 Some slide content courtesy of Susan.

Slides:



Advertisements
Similar presentations
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Advertisements

Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
Information Resources Management January 16, 2001.
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 9, 2004 Some slide content courtesy of Susan.
1 IS380 Class Agenda 01/11/05 Sock H. Chung 1.Syllabus 2.Chapter 1 3.Introduction 4. Request.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Introduction Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 10, 2005.
1 CENG 302 Introduction to Database Management Systems Nihan Kesim Çiçekli URL:
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 8, 2005 Some slide content courtesy of Susan.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy.
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
1 Chapter 2 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data u User’s view immune to changes.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Introduction and Conceptual Modeling
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CSC2012 Database Technology & CSC2513 Database Systems.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
Introduction. 
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
CS6530 Graduate-level Database Systems Prof. Feifei Li.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Database System Concepts and Architecture
Introduction: Databases and Database Users
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Weichao Wang.
Database Management Systems
Database Organization and Design
Introduction to Database Systems Fundamental Concepts Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 4:15 pm – 5:30.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: M,T,W,Th,F 2:30 pm – 3:30 pm,
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Chapter 1 Introduction Yonsei University 1 st Semester, 2015 Sanghyun Park.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
CSCE 824 Secure and Distributed Database Management Systems FarkasCSCE 8241.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 2:30 pm – 3:30.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
1 Chapter 1 Introduction to Databases Transparencies.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
CSC 411/511: DBMS Design CSC411_L0_OutlineDr. Nan Wang 1 Course Outline.
Introduction HNDIT DBMS 1. Database Management Systems Module code HNDIT Module title Database Management Systems Credits2HoursLectures15.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Mr.Prasad Sawant, MIT Pune India Introduction to DBMS.
CSCE 824 Secure (and Distributed) Database Management Systems FarkasCSCE
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
1 Geog 357: Data models and DBMS. Geographic Decision Making.
1 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data. u A user’s view is immune to changes.
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
1 Chapter 2 Database Environment Pearson Education © 2009.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
CS 325 Spring ‘09 Chapter 1 Goals:
Introduction to Database Systems CSE 444
Chapter 1: Introduction
Introduction Zachary G. Ives August 2, 2018 University of Pennsylvania
Introduction to Database Management Systems
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Presentation transcript:

Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 4, 2003 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 Welcome! To CIS 550, officially “Database & Information Systems” in the course catalog… … A tour of the data management field… … A question for you: what does this really mean? What is this course (and the field) about?

3 What The Course Is Not… (A Few Warnings) This is not a course on Oracle or SQL  It may not directly impact your marketability  It’s an investigation into the principles of data management – which will improve your understanding This course will not be a cakewalk!  The data management field is broad, and we’ll touch on many subjects at a rapid pace  8 homeworks, paper summaries, term project, midterm, final, … This course is not suitable for people with a limited programming background; need skills in:  Algorithms & data structures  Logic  Programming languages  Perhaps even a little complexity theory!

4 What The Course Will Do Most CS courses concentrate on code – now you’ll understand data management and representation  In the end, it’s all about the data! Background in most of the important areas:  Data design, modeling  Understanding of DB system internals, performance  Understanding of data-driven systems (e.g., web sites)  An understanding of the complexities of integrating data – perhaps the biggest CS problem today  Understanding of what research topics in data management are

5 Administrivia Instructor: Zachary Ives,  Levine 611 (until end of Sept.; then 5 th floor GRW)  Office hours: Tuesday, 3:00-4:00PM (after class) TA: Dinkar Gupta,  Office hours on Monday; time 3:00-4:00PM (office TBA) Newsgroup: upenn.cis.cis550 Home page: Text(s):  Ramakrishnan & Gerke, Database Systems, 3 rd ed.  Supplementary papers (to be handed out in class)  Other books may be useful (see web page)

6 Course Format and Grading  We’ll cover roughly one major topic area per week to two weeks  Readings in the text & research papers  Occasionally, summaries/commentary on papers (5%)  Homework assignment for each topic area (30%)  One midterm (10%), one final exam (20%)  Project (30%) – groups of 3-4:  Build a blogging system on top of a database, or  Build a P2P data sharing system for XML data  (Or propose your own idea)  General participation, discussion, intangibles (5%)

7 Diving In…  What is a database and a DBMS?  Why do we need a DBMS?  Database and data management architectures  Process of building a DB  DBMS components

8 What’s Data Management? In the 1960’s and early 70’s:  file formats, traversals, indexes In the 1980’s (mid- to late-70’s in research) and 90’s:  Separation of logical + physical data representations  Well-defined general purpose, declarative data manipulation language (DML) and data definition language (DDL)  Reliable, consistent storage + concurrency control  Sophisticated system that takes DDL statements and knowledge of physical data representations and produces answers in “optimized” way Today:  All that plus managing and manipulating data in many models and representations

9 What is a DBMS?  A database (DB) is a large, integrated collection of data  Generally is cohesive in “some” way  A DB models a real-world organization or unit  A database management system (DBMS) is a software package designed to store and manage databases  Reliable storage & recovery of 100s of GB  Querying/updating interface and API  Support for many concurrent users

10 Connection to Other Areas of CS…  Programming languages and software engineering (obviously)  Algorithms (obviously)  Logic, discrete math, and theory of computation  Systems: concurrency, operating systems, file organization and networks, peer-to-peer, …  Web (and Semantic Web), information retrieval, digital libraries, software agents, …  AI planning and machine learning

11 But 80% of the World’s Data is Not in a DB! Examples:  scientific data (large images, complex programs that analyze the data)  personal data  WWW Data management is expanding to tackle these problems  Flexibility – data management imposes many constraints to make problems solvable  Must deal with entities outside our control In this course, we’ll start by focusing on databases, but eventually look “outside the box”

12 Why Not “Program up” Databases As Needed? For simple (single-concept) and small databases this is often the best solution  Flat files and grep get us a long way  But there are limits:  The structure is complicated (more than a simple table)  The database gets large (e.g., bigger than RAM)  Many people want to use it simultaneously  Need for reliable recovery from crashes  Updates generally require complete rewrite of file

13 Example: Palm-Style Calendar We might start by building a file with the following structure: This text file is easy to deal with. So there's no need for a DBMS! Right…? EventDayWhenWhoWhere Lunch10/241pmRickJoe’s Diner CS12310/259amDr. EggheadMorris234 Biking10/269amJaneJane’s house Dinner10/266PMJaneCafé Le Boeuf

14 Problem 1: Data Organization  Consider the all-important who field. Do we also want to keep addresses, telephone numbers etc.?  Expand our file to look like:  Now we are keeping our address book in our calendar and doing so redundantly EventWhenWho-nameWho- Who-tel …. Where …

15 “Link” Calendar with Address Book?  Two conceptual “entities” – contact information and calendar – with a relationship between them, linking people in the calendar to their contact information  This link could be based on something as simple as the person's name

16 Problem 2: Efficiency  Size of personal address book is probably less than one hundred entries, but there are things we'd like to do quickly and efficiently  “Give me all appointments on 10/28”  “When am I next meeting Jim?”  “Program” these as quickly as possible (and make them resilient to data format changes)  Have these programs executed efficiently  What would happen if you were using a corporate calendar with hundreds of thousands of entries?

17 Problem 3: Concurrency and Reliability  Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?  Suppose the system crashes while we are changing the calendar. How do we recover our work?

18 Transactions  Key concept for concurrency is that of a transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry).  Key concept for recoverability is that of a log : keeping track of all actions carried out by the db.  Sounds like operating systems all over again!

19 Database Architecture: The Traditional View It is common to describe databases in two ways:  Logical structure:  What users see. The program or query language interface.  Physical structure:  How files are organized. What indexing mechanisms are used. The logical level is further split into two components:  Overall database design (conceptual; seen by the DB designer)  Views that various users get to see

20 Three-level Architecture View 1View 2…View N Physical Level (file organization, indexing) Schema Conceptual Level

21 Data Independence A user of a relational database system should be able to use query the database without knowing about how the precisely how data is stored, e.g. After all, you don't worry much how numbers are stored when you program some arithmetic or use a computer-based calculator SELECT When, Where FROM Calendar WHERE Who = "Bill"

22 More on Data Independence Logical data independence  Protects the user from changes in the logical structure of the data: could reorganize the calendar “schema” without changing how I query it Physical data independence  Protects the user from changes in the physical structure of data: could add an index on who (or sort by when) without changing how the user would write the query, but the query would execute faster (query optimization)

23 That's the Traditional View, But...  Three-level architecture is not always achievable: when databases get big, queries must be carefully written to achieve efficiency  Also, may need a 4 th tier… Sometimes this is called middleware

24 Combining Databases with Mediators (a kind of middleware) May need to add further layers to combine multiple databases/sources on the Web  Some of these are databases over which we have no control  Some must be accessed in special ways XML “Mediated Schema”

25 Data-Driven Web Sites: Consumers of Database Output  “Data driven web sites” also add an HTML “presentation” layer on top of what we’ve seen view HTML Processing

26 The Process of Building a Database  Start with a conceptual model  Design & implement schema  Write applications using DBMS and other tools  Many ways of doing this where the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)

27 Conceptual Design STUDENT COURSE Takes name sid cid name PROFESSOR Teaches semester fid name

28 Designing a Schema (Set of Relations)  Convert to tables + constraints  Then need to do “physical” design: the layout on disk, indices, etc. sidname 1Jill 2Qun 3Nitin fidname 1Ives 2Saul 8Roth sidcid cidnamesem DBF AIS ArchF03 fidcid STUDENT Takes COURSE PROFESSOR Teaches

29 Applications Use Queries in SQL <!-- hypotheticalEmbeddedSQL: SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid -->  Structured Query Language  Based on restricted first-order logic expressions over relations  Not procedural – defines constraints on the output  Converted into a query plan that exploits properties; run over the data by the query optimizer and query execution engine

30 Processing the Query SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc

31 DBMS in a Bit More Detail API/GUI Optimizer Storage Mgr Exec. Engine Storage Catalog Query Physical plan Pages RequestsData Pages Stats Schemas (Simplification!) Buffer Mgr Index/file/rec Mgr Data/etcRequests Data/etc Logging, recovery

32 DBMSs in the Wild A huge industry for 20% of the world’s data!  Big, mature relational databases  IBM, Oracle, Microsoft  “Middleware” above these  SAP, PeopleSoft, dozens of special-purpose apps  “Application servers”  Integration and warehousing systems  Trends:  More integration; web services; XML everywhere  Smarter, self-tuning systems

33 The Research World  Conventional databases aren’t interesting!  Understanding what’s possible to do with XML  Better query processing  Better languages for meta-info (e.g., constraints)  Data streams  Peer-to-peer  Integrating data from different formats  Lots of theory and systems-building

34 In this Course...  Study relational databases, their design, how to query, what forms of indices to use.  Beyond relational algebra: a logical model of data (Datalog), recursion  XML and semi-structured data models  Understanding DB internals  How DBs are built  Performance implications  Integrating and mediating between databases (a huge problem today)

35 Questions? Dilbert, 8/9/2003 (via online archive)