Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 9, 2004 Some slide content courtesy of Susan.

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Advertisements

Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 4, 2003 Some slide content courtesy of Susan.
1 IS380 Class Agenda 01/11/05 Sock H. Chung 1.Syllabus 2.Chapter 1 3.Introduction 4. Request.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I Introduction.
Introduction Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 10, 2005.
1 Introduction to Database Systems Ref. Ramakrishnan & Gehrke Chapter 1.
1 CENG 302 Introduction to Database Management Systems Nihan Kesim Çiçekli URL:
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 8, 2005 Some slide content courtesy of Susan.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
1 ICS 223: Transaction Processing and Distributed Data Management Winter 2008 Professor Sharad Mehrotra Information and Computer Science University of.
Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy.
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
CSCD34 - Data Management Systems,- A. Vaisman1 CSC D34 - Data Management Systems Instructor: Alejandro Vaisman University of Toronto.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
CSC2012 Database Technology & CSC2513 Database Systems.
Introduction. 
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Web-Enabled Decision Support Systems
CS6530 Graduate-level Database Systems Prof. Feifei Li.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Weichao Wang.
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Database Organization and Design
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 4:15 pm – 5:30.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: M,T,W,Th,F 2:30 pm – 3:30 pm,
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
COMU114: Introduction to Database Development 1. Databases and Database Design.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 2:30 pm – 3:30.
1 CS3431 – Database Systems I Introduction Instructor: Mohamed Eltabakh
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Introduction HNDIT DBMS 1. Database Management Systems Module code HNDIT Module title Database Management Systems Credits2HoursLectures15.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Intro to Database Lecture 1: Course Overview 1. 2 Data analysis in the fight against human trafficking. All of society is online. New York DA use MEMEX.
1 Geog 357: Data models and DBMS. Geographic Decision Making.
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Lecture 1: Overview of CSCI 485 Shahram Ghandeharizadeh Associate Professor Computer Science Department University of Southern California Presented by:
Big Data Yuan Xue CS 292 Special topics on.
Introduction to Database Systems CSE 444
Database Management Systems
Special Topics in CCIT: Databases
Introduction Zachary G. Ives August 2, 2018 University of Pennsylvania
Introduction to Database Systems
Translation of ER-diagram into Relational Schema
Database Management Systems CSE594
Introduction to Database Systems CSE 444
Introduction to Database Systems
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
CMPE/SE 131 Software Engineering March 7 Class Meeting
Presentation transcript:

Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 9, 2004 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 Welcome to CIS 550, Database and Information Systems! Instructor: Zachary Ives,  576 Levine Hall North  Office hours: Tuesday, 3:00-4:00PM (after class) TA: T.J. Green,  Office hours: Thursday, 3:00-4:00PM Newsgroup: upenn.cis.cis550 Home page: Texts and readings:  Ramakrishnan & Gerke, Database Systems, 3 rd ed.  Supplementary papers (to be handed out in class)  Other books may be useful, esp. Brundage’s Using XQuery

3 Course Format and Grading  Roughly one major topic area per week to two weeks  Readings in the text & research papers  Occasionally, summaries/commentary on papers (5%)  Homework assignment for each topic area (30%)  One midterm (10%), one final exam (20%)  Project (30%) – groups of 3-4:  Build a “GMail”/Hotmail clone on top of a database, or  Build a P2P system for synchronizing tables  (Or propose your own idea)  General participation, discussion, intangibles (5%)

4 Why This Course? Most CS courses concentrate on code – our interest is managing and representing data Warning: this course doesn’t focus on teaching SQL or how to be an Oracle DBA (though it will get you started) … So what in the world are we studying for 14 weeks???

5 What Do We Do with Data?

6 Some Ways to Represent Information

7 Example: An Encyclopedia Entry (  A database is an information set with a regular structure. Its front- end allows data access, searching and sorting routines. Its back-end affords data inputting and updating. A database is usually but not necessarily stored in some machine-readable format accessed by a computer. There are a wide variety of databases, from simple tables stored in a single file to very large databases with many millions of records, stored in rooms full of disk drives or other peripheral electronic storage devices.informationfront- endback-endmachine-readable  Databases resembling modern versions were first developed in the 1960s. A pioneer in the field was Charles Bachman.Charles Bachman  The most useful way of classifying databases is by the programming model associated with the database. Several models have been in wide use for some time. Historically, the hierarchical model was implemented first, then the network model, then the relational model overcame with the so-called flat model accompanying it for low-end usage…hierarchical modelnetwork model relational modelflat model

8 Example: To-Do List  Buy school suppliesdue 9/7  Go to orientationon 9/7  Exerciseevery M/W/F  Buy Philly postcards How does this differ from the plain text model? What might you do with it that you couldn’t?

9 Example: Your PDA/Cell Phone EventDayWhenWhoWhere Lunch10/241pmZackCavanaugh’s Advice10/259amDr. Smith599 Levine Biking10/269amJanePottruck Dinner10/266PMJaneFood Court Calendar WhoPhone Office Zack6-2789zives576 Levine N Dr. Smith6-1234drsmith599 Levine Jane jane2220 Walnut St. Contacts

10 What If We Want to Include Contact Info on Our Calendar?  Do we also want to keep addresses, telephone numbers etc.?  Should we expand the number of columns in our table: EventWhenWho-nameWho- Who-tel …. Where Lunch1pmZackzives6-2789…. Cav… … What is the trade-off in terms of entering data?

11 “Link” Calendar with Contacts?  Why can’t we “link” calendar entries with contact info, and show the results of the two?  The link could be based on something as simple as the person's name  (What’s the danger here? What else might work better?)  This brings up an issue – how to “follow links”  If we were to do this in Java, how might it be done?

12 Another Kind of Link: Classes and Subclasses  Person has attributes:  ssn  PennID  set of user IDs  given name  family name  …  Student IS A person who:  takes courses  is given grades  is taught  listens to lectures in class, OR over the Web, OR on videotape  This is yet another kind of information  How have you previously seen such relationships encoded?

13 Data Representation and Modeling  All of the data we’ve seen have an implicit data model The data model includes some basic assumptions about what’s an “item” of data, how to interpret it, and so on  The relational data model was the first model for data that is independent of its data structures and implementation  A theory of normalization guides you in designing relations  Concepts from the relational data model have been adapted to form object-oriented data models (with classes and subclasses), XML models, etc.  There are “sibling” fields to databases that consider:  natural language models (how to understand words)  document models (how to match words and documents)  ontologies (how to define relationships between classes)

14 The DBMS Provides an Interface over the Database  A database (DB) is a large, integrated collection of data  Generally is cohesive in “some” way  A DB models a real-world organization or unit  A database management system (DBMS) is a software package designed to store and manage databases  Reliable storage & recovery of 100s of GB  Querying/updating interface and API (for applications and Web pages)  Support for many concurrent users  Why do we need a DBMS, instead of coding in Java?

15 DBMS Benefit #1: Generality and Declarativity  Don’t require the programmer or user to know details like indices, sort orders, machine speeds, disk speeds, concurrent users, etc.  Instead, the programmer/user programs with a logical model in mind  The DBMS “makes it happen” based on an understanding of relative costs of different methods

16 Benefit #2: Efficiency and Scale  Size of personal address book is probably less than 100 entries, but there are things we'd like to do quickly and efficiently:  “Give me all appointments on 10/28”  “When am I next meeting Jim?”  “Program” these as quickly as possible (and make them resilient to data format changes)  Scale to a corporate calendar with hundreds of thousands of entries

17 Benefit #3: Management of Concurrency and Reliability  Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?  Suppose the system crashes while we are changing the calendar. How do we recover our work?  This requires a basic concept…

18 Transactions  Key concept for concurrency is that of a transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry).  Key concept for recoverability is that of a log: keeping track of all actions carried out by the db.

19 The Layers of the DBMS API/GUI Optimizer Storage Mgr Exec. Engine Storage Catalog Query Physical plan Pages RequestsData Pages Stats Schemas (Simplification!) Buffer Mgr Index/file/rec Mgr Data/etcRequests Data/etc Logging, recovery Red = logical Blue = physical

20 The Database Abstraction Provided by the DBMS We think of databases at two levels:  Logical structure:  What users/programmers see – program or query interface  Physical structure:  Organization on disk, indices, etc. The logical level is further split into:  Overall database design (conceptual; seen by the DB designer)  Views that various users get to see

21 The Three-level Architecture for Databases View 1View 2…View N Physical Level (file organization, indexing) Schema Logical, Conceptual Level

22 Data Independence A user of a relational database system should be able to use the database without knowing about how the precisely how data is stored, e.g. After all, you don't worry IEEE floating-point when you do division in a Java program or with a calculator SELECT When, Where FROM Calendar WHERE Who = “Jane"

23 More on Data Independence Logical data independence Protects the user from changes in the logical structure of the data: could reorganize the calendar “schema” without changing how we query it Physical data independence Protects the user from changes in the physical structure of data: could add an index on who (or sort by when) without changing how the user would write the query, but the query would execute faster (query optimization)

24 Presentation Layer (4 th Tier): Data-Driven Web Sites  “Data driven web sites” also add an HTML “presentation” layer on top of what we’ve seen  Or they use XML plus “style sheets” to get the same effect view HTML Processing

25 An Issue: 80% of the World’s Data is Not in a DB! Examples:  scientific data (large images, complex programs that analyze the data)  personal data  WWW and (some of it is stored in something resembling a DBMS) Data management is expanding to tackle these problems  Flexibility – data management imposes many constraints to make problems solvable  Must deal with entities outside our control In this course, we’ll start by focusing on databases, but eventually look “outside the box” at the Web and at gluing together data from many places

26 Combining Databases with Mediators (a kind of middleware) A layer above the three-tiered architecture, to combine multiple databases/sources on the Web  Some of these are databases over which we have no control  Some must be accessed in special ways  We generally need to think about how to translate between different database formats XML “Mediated Schema”

27 How Does One Build a Database?  Start with a conceptual model  Design & implement schema  Write applications using DBMS and other tools  Many ways of doing this where the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)  Common applications include PHP/JSP/servlet- driven web sites  The DBMS takes care of query optimization and execution

28 Conceptual Design STUDENT COURSE Takes name sid cid name PROFESSOR Teaches semester fid name

29 Designing a Schema (Set of Relations)  Convert to tables + constraints  Then need to do “physical” design: the layout on disk, indices, etc. sidname 1Jill 2Bo 3Maya fidname 1Ives 2Saul 8Roth sidcid cidnamesem DBF AIS ArchF03 fidcid STUDENT Takes COURSE PROFESSOR Teaches

30 Applications Use Queries in SQL <!-- hypotheticalEmbeddedSQL: SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid -->  Structured Query Language  Based on restricted first-order logic expressions over relations  Not procedural – defines constraints on the output  Converted into a query plan that exploits properties; run over the data by the query optimizer and query execution engine

31 Processing the Query SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc

32 DBMSs in the Real World A huge industry for 20% of the world’s data!  Big, mature relational databases  IBM, Oracle, Microsoft  “Middleware” above these  SAP, PeopleSoft, dozens of special-purpose apps  “Application servers”  Integration and warehousing systems  Current trends:  Web services; XML everywhere  Smarter, self-tuning systems

33 So What about Database Research?  Not focusing on the problems of Oracle…  Understanding what’s possible to do with XML  Better query processing  Better languages for meta-info (e.g., constraints)  Data streams  Peer-to-peer architectures  Integrating data from different formats  Lots of theory and systems-building  You’ll see familiar concepts in this course from operating systems and from complexity theory/logic  … And from programming languages, AI planning, …

34 In this Course...  Study relational databases, their design, how to query, what forms of indices to use.  Beyond relational algebra: a logical model of data (Datalog), recursion  XML and semi-structured data models  Understanding DB internals  How DBs are built  Performance implications  Integrating and mediating between databases (a huge problem today)