Introduction Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 10, 2005.

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

By RUPESH KUMAR.  Database? Types? Abstraction?  Database Models?  Database Integrity?  ACID?  RDBMS?  Normalization?  Data Warehouse?  Database.
Information Resources Management January 16, 2001.
Transaction.
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 9, 2004 Some slide content courtesy of Susan.
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 4, 2003 Some slide content courtesy of Susan.
1 IS380 Class Agenda 01/11/05 Sock H. Chung 1.Syllabus 2.Chapter 1 3.Introduction 4. Request.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 8, 2005 Some slide content courtesy of Susan.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
Methodology Conceptual Database Design
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
1: IntroductionData Management & Engineering1 Course Overview: CS 395T Semantic Web, Ontologies and Cloud Databases Daniel P. Miranker Objectives: Get.
Database Systems Chapter 1 The Worlds of Database Systems.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
CSC2012 Database Technology & CSC2513 Database Systems.
Overview of a Database Management System
Introduction. 
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
Database Management Exploring the Territory. Database vs Flat Files Flat Files –Characters-fields-records-files Files are not designed to work together.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
1 12. Course Summary Course Summary Distributed Database Systems.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 4:15 pm – 5:30.
Session-8 Data Management for Decision Support
Information System Development Courses Figure: ISD Course Structure.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: M,T,W,Th,F 2:30 pm – 3:30 pm,
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
MIS 673: Database Analysis and Design u Objectives: u Know how to analyze an environment and draw its semantic data model u Understand data analysis and.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Database System Concept.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 2:30 pm – 3:30.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  The concept of Data, Information and Knowledge  The fundamental terms:  Database and database system  Database.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Intro to Database Lecture 1: Course Overview 1. 2 Data analysis in the fight against human trafficking. All of society is online. New York DA use MEMEX.
1 Chapter 2 Database Environment Pearson Education © 2009.
Advanced Databases COMP3017 Dr Nicholas Gibbins
Lecture 1: Overview of CSCI 485 Shahram Ghandeharizadeh Associate Professor Computer Science Department University of Southern California Presented by:
CPSC-310 Database Systems
Introduction Zachary G. Ives August 2, 2018 University of Pennsylvania
1.1 The Evolution of Database Systems
Database Architecture
Query Optimization.
Introduction to Database Systems CSE 444
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Presentation transcript:

Introduction Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 10, 2005

2 Welcome to CIS 650, Database and Information Systems! Instructor: Zachary Ives,  576 Levine Hall North  Office hours: Tuesday, 2:30-3:30PM (before colloquium) Home page: Texts and readings:  Hellerstein and Stonebraker: Readings in Database Systems, 4 th ed.  (Should be available soon)  Supplementary papers (will be linked via schedule)

3 Course Format and Grading Very discussion-oriented; about one topic area per week or two  Readings in the text & other research papers – summaries/commentary on papers (20%) “Midterm report” (25%)  You’ll take one of the topics we’ve discussed and write a summary and synthesis paper  Graded for organization, clarity, grammar, etc. as well as content Project (50%) -- may choose to work in teams:  Implementation  Experimentation / validation  Project report (should be in the style of a research paper)  Brief (~15-minute) presentation Participation, discussion, intangibles (5%) At the end, you should be equipped to do research in this field, or to take ideas from databases and apply them to your field

4 So What Is This Course About? Not how to build an Oracle-driven Web site… … nor even how to build Oracle…

5 What Is Unique about Data Management?  It’s been said that databases and data management focus on scalability to huge volumes of data  What is it that makes this possible – and what makes the work interesting if NOT at huge scale?  Why are data management techniques useful in situations where scale isn’t the bottleneck?

6 The Key Principle: Data Independence  Most methods of programming don’t separate the logical and physical representations of data  The data structures, access methods, etc. are all given via interfaces!  The relational data model was the first model for data that is independent of its data structures and implementation

7 What Is Data Independence?  Codd points out that previous methods had:  Order dependence  Index dependence  Access path dependence  Still true in today’s Java/C#: what is the drawback?  What might you be able to do in removing those?

8 The Relational Data Model More than just tables!  True relations: sets of tuples  The only data representation a user/programmer “sees”  Explicit encoding of everything in values Additional integrity constraints  Key constraints, functional dependencies, … General and universal means of encoding everything!  (Semantics are pushed to queries) A secondary concept: views  Define virtual, derived relations that are always “live”  A way of encapsulating, abstracting data

9 Constraints and Normalization  Fundamental idea: we don’t want to build semantics into the data model, but we want to be able to encode certain constraints  Functional dependencies, key constraints, foreign-key constraints, multivalued dependencies, join dependencies, etc.  Allows limited data validation, plus opportunities for optimization  The theory of normalization (see CSE 330, CIS 550) makes use of known constraints  Idea: eliminate redundancy, in order to maintain consistency in the presence of updates  (Note that there’s no reason for normalization of data in views!)  Ergo, XML???

10 Relational Completeness (Plus Extensions): Declarativity What is special about relational query languages that makes them amenable to scalability?  Limited expressiveness – particularly when we consider conjunctive queries (even with recursion)  Guaranteed polytime execution in size of data  Can reason about containment, invert them, etc.  “Magic sets”  (What about XQuery’s Turing-completeness???)  Equivalence between relational calculus and algebra  Calculus  fully declarative, basis of query languages  Algebra  imperative but polytime, basis of runtime systems  Predictability of operations  cost models  Ability to supplement data with auxiliary structures for performance

11 Concurrency and Reliability (Generally requires full control)  Another key element of databases – ACID properties  Atomicity, Consistency, Isolation, Durability  Transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry)  Recoverability via a log: keeping track of all actions carried out by the database  How do distributed systems, Web services, service- oriented architectures, and the like affect these properties?

12 Other Data Models  Concepts from the relational data model have been adapted to form object-oriented data models (with classes and subclasses), XML models, etc.  But doesn’t this result in some loss of logical-physical independence?  GMAP and answering queries using views?

13 What Is a Data Management System?  Of course, there are traditional databases  The focus of most work in the past 25 years  “Tight loops” due to locally controlled data  Indexing, transactions, concurrency, recovery, optimization  But…

14 80% of the World’s Data is Not in Databases! Examples:  Scientific data (large images, complex programs that analyze the data)  Personal data  WWW and (some of it is stored in something resembling a DBMS)  Network traffic logs  Sensor data  Are there benefits to declarative techniques and data independence in tackling these issues?  XML is a great way to make this data available  Also need to deal with data we don’t control and can’t guarantee consistency over

15 An Example of Data Management with Heterogeneity: Data Integration A layer above heterogeneous sources, to combine them under a unified logical abstraction  Some of these are databases over which we have no control  Some must be accessed in special ways  Data integration system translates queries over mediated schema to the languages of the sources; converts answers to mediated schema XML “Mediated Schema”

16 Other Interesting Points Data streams and sensor data How do we process infinite amounts of data? Peer-to-peer architectures What’s the best way of finding data here? Personal information management Can we use integration-style concepts and a bit of AI to manage associations between our data? Web search What’s the back-end behind Google? Semantic Web How do we semantically interrelate data to build a better Web?

17 Layers of a Typical Data Management System API/GUI Optimizer Physical retrieval Exec. Engine Source Catalog Query Physical plan Pages RequestsData Pages Stats Schemas (Simplification!) Buffer Mgr Access Methods Data/etcRequests Data/etc Logging, recovery Red = logical Blue = physical

18 Query Answering in a Data Management System  Based on declarative query languages  Based on restricted first-order logic expressions over relations  Not procedural – defines constraints on the output  Converted into a query plan that exploits properties; run over the data by the query optimizer and query execution engine  Data may be local or remote  Data may be heterogeneous or homogeneous  Data sources may have different interfaces, access methods, etc.  Most common query languages:  SQL (based on tuple relational calculus)  Datalog (based on domain relational calculus, plus fixpoint)  XQuery (functional language; has an XML calculus core)

19 Processing the Query SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc

20 DBMSs in the Real World  Big, mature relational databases  IBM, Oracle, Microsoft  “Middleware” above these  SAP, PeopleSoft, dozens of special-purpose apps  “Application servers”  Integration and warehousing systems  Current trends:  Web services; XML everywhere  Smarter, self-tuning systems  Stream systems

21 Our Agenda this Semester  Reading the canonical papers in the data management literature  Some are very systems-y  Some are very experimental  Some are highly algorithmic, complexity-oriented  Gaining an understanding of the principles of building systems to handle declarative queries over large volumes of data

22 For Next Time  Skim Codd if you haven’t already  Read the overview papers of the two first database systems:  Astrahan et al., pp  Wong et al. (skip Section 2; focus on pp. 200-)  Write a summary of your assigned paper and it to me at  Key question: how well did this system mesh with Codd’s relational model? (You may need to skim through other aspects of your assigned paper to help answer that question)