EECS 647: Introduction to Database Systems

Slides:



Advertisements
Similar presentations
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Advertisements

Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
Ch1: File Systems and Databases Hachim Haddouti
CS186 - Introduction to Database Systems Spring Semester 2006 Prof. Michael Franklin “Knowledge is of two kinds: we know a subject ourselves, or we know.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1 Outline Types of Databases and Database Applications Basic Definitions Typical DBMS Functionality.
Chapter 1 Databases Introduction The content 1).Database Application 2).Database concept 3).An example of database 4).Function of DBMS 5).DBMS product.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Web-Enabled Decision Support Systems
COP Introduction to Database Systems Prof. Feifei Li.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
CS6530 Graduate-level Database Systems Prof. Feifei Li.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Introduction: Databases and Database Users
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
1Mr.Mohammed Abu Roqyah. Introduction and Conceptual Modeling 2Mr.Mohammed Abu Roqyah.
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
CS 505: Intermediate Topics in Database Systems Instructor: Jinze Liu Fall 2008.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
1 CS3431 – Database Systems I Introduction Instructor: Mohamed Eltabakh
1-1 Chapter 1 Databases and Database Users 1.1 Introduction 1.2 An Example 1.3 Characteristics of the Database Approach 1.4 Actors on the Scene 1.5 Workers.
CS 405G: Introduction to Database Systems Lecture 1: Introduction.
1 Geog 357: Data models and DBMS. Geographic Decision Making.
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
CS 405G: Introduction to Database Systems Lecture 1: Introduction.
CS 405G: Introduction to Database Systems Lecture 1: Introduction.
CS3431: C-Term CS3431 – Database Systems I Introduction Instructor: Mohamed Eltabakh
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction and Conceptual Modeling.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
What is a database? (a supplement, not a substitute for Chapter 1…) some slides copied/modified from text Collection of Data? Data vs. information Example:
10/3/2017.
Chapter 1 Database and Database Users
Introduction to Database Systems Chapter 1
CS4222 Principles of Database System
Introduction To DBMS.
Fundamentals of Information Systems, Sixth Edition
Database Management Systems
Introduction Instructor: Elke A. Rundensteiner
Outline Types of Databases and Database Applications Basic Definitions
Chapter 1: Introduction
Chapter 1: Introduction
Database and Database Users
Introduction Instructor: Mohamed Eltabakh
Introduction: Databases and Database Users
Fundamentals of Information Systems, Sixth Edition
Fundamentals & Ethics of Information Systems IS 201
7/4/2018.
Tools for Memory: Database Management Systems
9/22/2018.
Database Management System (DBMS)
11/14/2018.
1/2/2019.
Database Management Systems CSE594
CS186: Introduction to Database Systems
CS 405G: Introduction to Database Systems
Introduction Instructor: Mohamed Eltabakh
CS 405G Introduction to Database Systems
Terms: Data: Database: Database Management System: INTRODUCTION
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Is the WWW a DBMS? = Fairly sophisticated search available
Database management systems
Presentation transcript:

EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2007

Luke Huan Univ. of Kansas Queries for Today What is a database? What is a database management system? Why take a database course? Who will teach? How to take the class? Preview of class contents 12/31/2018 Luke Huan Univ. of Kansas

What: Database Systems Then 12/31/2018 Luke Huan Univ. of Kansas

What: Database Systems Today 12/31/2018 Luke Huan Univ. of Kansas

What: Database Systems Today 12/31/2018 Luke Huan Univ. of Kansas

What: Database Systems Today 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas So… What is a Database? A database is a very large, integrated collection of data. Data is a group of facts that can be recorded and have an implicit meaning. Typically a database is used to model a real-world “enterprise” (or a miniworld) Entities (e.g., basketball teams, games) Relationships (e.g. KU’s basketball team won <you name it> last week) Might surprise you how flexible this is Web search: Entities: words, documents Relationships: word in document, document links to document. P2P filesharing: Entities: words, filenames, hosts Relationships: word in filename, file available at host 12/31/2018 Luke Huan Univ. of Kansas

Main Characteristics of Databases Self-describing nature of a database system A DBMS catalog stores the description of the database. The description is called meta-data. Insulation between programs and data Allows changing data storage structures and operations without having to change the DBMS access Data Abstraction Use data model to hide storage details and present the users with a conceptual view of the database Support of multiple views of the data Each user may see a different view of the database, which describes only the data of interest to that user. Sharing of data and multi-user transaction processing 12/31/2018 Luke Huan Univ. of Kansas

Databases make these folks happy ... End users in many fields Business, education, science, … DB application programmers Build data entry & analysis tools on top of DBMSs Build web services that run off DBMSs Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve DBMS vendors, programmers Oracle, IBM, MS … …must understand how a DBMS works 12/31/2018 Luke Huan Univ. of Kansas 21

What is a Database Management System? A Database Management System (DBMS) is a collection of programs that enable uses to create and maintain databases store, manage, and access data in a databases. Typically this term is used narrowly Relational databases with transactions E.g. Oracle, DB2, SQL Server Mostly because they predate other large repositories Also because of technical richness When we say DBMS in this class we will usually follow this convention But keep an open mind about applying the ideas! 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas What: Is the WWW a DBMS? Fairly sophisticated search available Crawler indexes pages on the web Keyword-based search for pages But, currently data is mostly unstructured and untyped search only: can’t modify the data can’t get summaries, complex combinations of data few guarantees provided for freshness of data, consistency across data items, fault tolerance, … web sites typically have a (relational) DBMS in the background to provide these functions. 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas What: Is the WWW a DBMS? The picture is changing quickly Information Extraction to get structure from unstructured New standards e.g., XML, Semantic Web can help data modeling 12/31/2018 Luke Huan Univ. of Kansas

What: “Search” vs. Query What if you wanted to find out which actors donated to John Kerry’s presidential campaign? Try “actors donated to john kerry” in your favorite search engine. If it isn’t “published”, it can’t be searched! 12/31/2018 Luke Huan Univ. of Kansas

What: A “Database Query” Approach 12/31/2018 Luke Huan Univ. of Kansas

What: Is a File System a DBMS? Thought Experiment 1: You and your project partner are editing the same file. You both save it at the same time. Whose changes survive? Q: How do you write programs over a subsystem when it promises you only “???” ? A: Very, very carefully!! A) Yours B) Partner’s C) Both D) Neither E) ??? Thought Experiment 2: You’re updating a file. The power goes out. Which changes survive? A) All B) None C) All Since Last Save D) ??? 12/31/2018 Luke Huan Univ. of Kansas

OS Support for Data Management Data can be stored in RAM this is what every programming language offers! RAM is fast, and random access Isn’t this heaven? Every OS includes a File System manages files on a magnetic disk allows open, read, seek, close on a file allows protections to be set on a file drawbacks relative to RAM? 12/31/2018 Luke Huan Univ. of Kansas

Database Management Systems What more could we want than a file system? Simple, efficient ad hoc1 queries concurrency control recovery benefits of good data modeling S.M.O.P.2? Not really… as we’ll see this semester in fact, the OS often gets in the way! 1ad hoc: formed or used for specific or immediate problems or needs 2SMOP: Small Matter Of Programming 12/31/2018 Luke Huan Univ. of Kansas

Current Commercial Outlook A major part of the software industry: Oracle, IBM, Microsoft also Sybase, Informix (now IBM), Teradata smaller players: java-based dbms, devices, OO, … Lots of related industries data warehouse, document management, storage, backup, reporting, business intelligence, ERP, CRM, app integration Traditional Relational DBMS products dominant and evolving adapted for extensibility (user-defined types), native XML support. Microsoft merger of file system/DB…? Open Source coming on strong MySQL, PostgreSQL, Apache Derby, BerkeleyDB, Ingres, EigenBase And of course, the other “database” technologies Search engines, P2P, etc. 12/31/2018 Luke Huan Univ. of Kansas

Advantages of a DBMS: a short list Controlling redundancy Restrict unauthorized access Providing persistent storage for program objects Providing storage structure for efficient query processing Providing backup and crash recovery …. And many many others that are going to be explored in this class 12/31/2018 Luke Huan Univ. of Kansas

What database systems will we cover? We will be try to be broad and touch upon Relational DBMS (e.g. Oracle, SQL Server, DB2, Postgres) “Semi-structured” DB systems (e.g. XML repositories like Xindice) Data mining: transfer data into knowledge! Starting point We assume you have used web search engines We assume you don’t know relational databases Yet they pioneered many of the key ideas So focus will be on relational DBMSs With frequent side-notes on search engines, XML issues 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? Database systems are at the core of CS They are incredibly important to society The topic is intellectually rich It isn’t that much work Looks good on your resume Let’s spend a little time on each of these 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? A. Database systems are the core of CS Shift from computation to information True in corporate computing for years Web, p2p made this clear for personal computing Increasingly true of scientific computing Need for DB technology has exploded in the last years Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. Web:not just “documents”. Search engines, e-commerce, blogs, wikis, other “web services”. Scientific: digital libraries, genomics, satellite imagery, physical sensors, simulation data Personal: Music, photo, & video libraries. Email archives. File contents (“desktop search”). 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? B. DBs are incredibly important to society “Knowledge is power.” -- Sir Francis Bacon “With great power comes great responsibility.” -- SpiderMan’s Uncle Ben Policy-makers should understand technological possibilities. Informed Technologists needed in public discourse on usage. 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? C. The topic is intellectually rich. representing information data modeling languages and systems for querying data complex queries & query semantics* over massive data sets concurrency control for data manipulation controlling concurrent access ensuring transactional semantics reliable data storage maintain data semantics even if you pull the plug data mining Let your data speak * semantics: the meaning or relationship of meanings of a sign or set of signs 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? D. It isn’t that much work. Bad news: It is a lot of work. Good news: the course is front loaded Most of the hard work is in the first half of the semester Load balanced with most other classes 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Why take this class? E. Looks good on my resume. Yes, but why? This is not a course for: Oracle administrators IBM DB2 engine developers Though it’s useful for both! It is a course for well-educated computer scientists Database system concepts and techniques increasingly used “outside the box” Ask your friends at Microsoft, Yahoo!, Google, Apple, etc. Actually, they may or may not realize it! A rich understanding of these issues is a basic and (un?)fortunately unusual skill. 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Who? Instructors Prof. Luke Huan, EECS jhuan@eecs.ku.edu Prof. Office Hours: Eaton Hall 2034, M, W 4:15-5:15pm 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas How? Workload Projects with a “real world” focus: Build a web-based application w/PostgreSQL and your favorite internet programming languge (PHP, JAVA, PERL, just naming a few) Homework assignments and quizzes Exams – 1 Midterm & 1 Final Projects to be done in groups of 2 Pick your partner ASAP The course is “front-loaded” most of the hard work is in the first half 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas How? Administrative Text book: Elmasri & Navatgh 5th edition Class website - people.eecs.ku.edu/~jhuan/EECS647 read it regularly Please include eecs 647 in your mail to the instructor (for quick response) Grading, hand-in policies, etc. is on syllabus * The textbook will be available at the KU bookstore today or tomorrow 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Academic Misconduct We take academic misconduct very seriously. We encourage student to work together for doing homework and projects, but each student should write down their own solution. You are absolutely encouraged to discuss with your classmates about your homework assignments, programming exercises, and final projects You are responsible for all your works For homework assignments and projects resulted from discussion, please always acknowledge other people’s contribution by including a sentence at the beginning of the hand-in saying “I discussed the homework (project) with XXX (include more names if necessary)”. There is absolutely NO penalty for doing so. 12/31/2018 Luke Huan Univ. of Kansas

Agenda for the rest of today Today: a preview of the class contents: ER model Logical database design File system Transaction Data mining Database in the future This Wednesday the Entity-Relationship model Today’s lecture is from Chapter 1 and 2 in E & N Read Chapter 3 for next class. Relational model SQL Query processing XML 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Agenda … Design a Database ER model Relational model Database logical design Query with SQL 12/31/2018 Luke Huan Univ. of Kansas

Data Models Describe Data A data model is a collection of concepts for describing data. Different levels of data models Conceptual (semantic) data model uses data (facts) to describe a physical world Representational model describes data in database management systems Physical data model describe how data is stored in files in computers. 12/31/2018 Luke Huan Univ. of Kansas 5

An ER Model for a mini-World Salary name Employees ssn name manager number Departments Works_For Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes. Relationship: Association among two or more entities. E.g., John works in Pharmacy department. 4

Relational Model Describes Data Relational model is the most popular representational data model that are used by current commercial DBMS. Relational model describes data using tables SSN Name Salary 123456789 John Smith 30000 333444555 Franklin Wong 40000 453453453 Joyce English 25000 Salary name Employees ssn Cardinality = 3, degree = 3, all rows distinct Do all columns in a relation instance have to be distinct? 12/31/2018 Luke Huan Univ. of Kansas 5

Logical Database Design Salary name Employees ssn name manager number Departments Works_For SSN Name Salary 123456789 John Smith 30000 333444555 Franklin Wong 40000 453453453 Joyce English 25000 DNumber DName Manager 5 research 123456789 SSN Dnum 123456789 5 333444555 453453453 12/31/2018 Luke Huan Univ. of Kansas

Logical Database Design (Cont.) Salary name Employees ssn name manager number Departments Works_For SSN Name Salary DNum 123456789 John Smith 30000 5 333444555 Franklin Wong 40000 453453453 Joyce English 25000 DNumber DName Manager 5 research 123456789 12/31/2018 Luke Huan Univ. of Kansas

Logical Database Design (Cont.) Salary name Employees ssn name manager number Departments Works_For SSN Name Salary DNumber DName Manager 123456789 John Smith 30000 5 research 333444555 Franklin Wong 40000 453453453 Joyce English 25000 12/31/2018 Luke Huan Univ. of Kansas

SQL: Query Information from a Database Query 1: Retrieve the salary of Franklin Wong Q1: SELECT SALARY FROM EMPLOYEE WHERE NAME=‘Franklin Wong‘ Answer: 40000 SSN Name Salary 123456789 John Smith 30000 333444555 Franklin Wong 40000 453453453 Joyce English 25000 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Agenda … Design a Database Management System File systems Disk storage Indexing Query processing Transactions Concurrency Recovery 12/31/2018 Luke Huan Univ. of Kansas

Components of a Disk The platters spin (say, 90rps). Spindle Tracks Disk head The platters spin (say, 90rps). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Sector Arm movement Platters Arm assembly 21

Indexing is the Key to Speed up Disk Operations ``Find all employees whose salary > $25000’’ If data is in sorted file, do binary search to find first such employee, then scan to find others. Cost of binary search can be quite high. Simple idea: Create an `index’ file. Index File k1 k2 kN Data File Page 1 Page 2 Page 3 Page N Can do binary search on (smaller) index file! 3

Concurrent execution of user programs Why? Utilize CPU while waiting for disk I/O (database programs make heavy use of disk) Avoid short programs waiting behind long ones e.g. ATM withdrawal while bank manager sums balance across all accounts 12/31/2018 Luke Huan Univ. of Kansas

Key concept: Transaction an atomic sequence of database actions (reads/writes) takes DB from one consistent state to another transaction consistent state 1 consistent state 2 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Example Transaction “transfer $100 from Saving to checking” checking: $300 savings: $900 checking: $200 savings: $1000 Here, consistency is based on our knowledge of banking “semantics” In general, up to writer of transaction to ensure transaction preserves consistency DBMS provides (limited) automatic enforcement, via integrity constraints e.g., balances must be >= 0 12/31/2018 Luke Huan Univ. of Kansas

Luke Huan Univ. of Kansas Agenda … Advanced topics XML Data mining Association Classification Clustering Database in the future 12/31/2018 Luke Huan Univ. of Kansas