CENG 351 Introduction to Data Management and File Structures

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Advertisements

Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Chapter 1 Instructor: Murali Mani Database Management Systems.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Database Systems Chapter 1 Instructor: Wang-Chien Lee
Database: A collection of related data [Elmasri]. A database represents some aspect of real world called “miniworld” [Elmasri] or “enterprise” [Ramakrishnan].
Introduction to Database Systems Ch. 1, Ch. 2 Mr. John Ortiz Dept. of Computer Science University of Texas at San Antonio.
CNG 3511 CNG 351 Introduction to Data Management and File Structures Müslim Bozyiğit (Prof. Dr.) Department of Computer Engineering METU.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Deborah Strahman
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I Introduction.
1 Introduction to Database Systems Ref. Ramakrishnan & Gehrke Chapter 1.
1 CENG 302 Introduction to Database Management Systems Nihan Kesim Çiçekli URL:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1 Instructor: Ethan Jackson
CSCD34 - Data Management Systems,- A. Vaisman1 CSC D34 - Data Management Systems Instructor: Alejandro Vaisman University of Toronto.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1 Outline Types of Databases and Database Applications Basic Definitions Typical DBMS Functionality.
CSC343H – Introduction to Databases
Introduction to Data bases concepts
CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU.
Database Management Systems 1 Introduction to Database Systems Instructor: Xintao Wu Ramakrishnan & Gehrke.
CS6530 Graduate-level Database Systems Prof. Feifei Li.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Xintao Wu.
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chpt 1 Instructor: Weichao Wang.
Database Organization and Design
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
BBM371 Data Managment Assoc. Prof. Dr. Ebru Akçapınar Sezer
-ebru a.s ATTRIBUTE: Description of entities For employee entity number, name, deptno, age, adr, salary..etc are attributes. RECORD: Stores whole.
1 CS462- Database Systems Sang H. Son
Database Management Systems.  Instructor: Yrd. Doç. Dr. Cengiz Örencik   Course material.
CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
BBM 371 – Data Management Lecture 3: Basic Concepts of DBMS Prepared by: Ebru Akçapınar Sezer, Gönenç Ercan.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles Course administration.
Fundamental of Database Systems
Introduction to Database Systems Chapter 1
CS4222 Principles of Database System
CF 1334 Sistem Basis Data (3 SKS)
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
File Organization and Processing
Outline Types of Databases and Database Applications Basic Definitions
Introduction to Database Systems
Diskusi-5 Sebutkan perangkat (tools) yang berpotensi mendukung kebutuhan tugas-tugas manajerial (management work) Jelaskan enam karakteristik informasi.
CENG 213 Data Structures Nihan Kesim Çiçekli
Storage and Indexes Chapter 8 & 9
Instructor: Elke Rundensteiner
Database Management Systems Chapter 1
Latihan Answer the following questions using the relational schema from the Exercises at the end of Chapter 3: Create the Hotel table using the integrity.
Latihan Create a separate table with the same structure as the Booking table to hold archive records. Using the INSERT statement, copy the records from.
Introduction to Database Systems
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Overview of Database Systems Chpt 1
9/22/2018.
Instructor: Murali Mani
Database Management Systems Chapter 1
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
CENG 213 Data Structures Nihan Kesim Çiçekli
Selected Topics: External Sorting, Join Algorithms, …
Database Management Systems
Database Management Systems CSE594
Sang H. Son CS6750: Database Systems The slides for this text are organized into chapters. This lecture covers Chapter 1. Chapter 1: Introduction.
Introduction to Database Systems
CENG 351 Introduction to Data Management and File Structures
Database Management Systems Chapter 1
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
File Organizations and Indexing
Introduction to Database Systems Chpt 1
Presentation transcript:

CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU CENG 351

CENG 351 Instructor: Nihan Kesim Çiçekli Office: A308 Email: nihan@ceng.metu.edu.tr Lecture Hours: Section 1: Mon. 13:40, 14:40 (BMB4); Thu. 10:40 (BMB4) Section 2: Wed. 9:40, 10:40 (BMB5); Thu. 11:40 (BMB1) Course Web page: http://cow.ceng.metu.edu.tr Teaching Assistants: Emre Işıklıgil Office: A402 isikligil@ceng.metu.edu.tr Alev Mutlu Office: A302 mutlu@ceng.metu.edu.tr Abdullah Doğan Office: A206 adogan@ceng.metu.edu.tr CENG 351

References Raghu Ramakrishnan, Database Management Systems (3rd. ed.), McGraw Hill, 2003. R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 4th edition, Addison-Wesley, 2004. CENG 351

Course Outline Introduction to relational database systems Relational Model and E/R Modeling Relational Algebra, Relational Calculus Structural Query language (SQL) Secondary Storage Media Fundamental File Structure Concepts Sequential File Processing External Sorting of Large Files Indexing: Multilevel Indexing and B+ trees Hashing (static, linear, extendible hashing) SQL Query Evaluation and optimization issues CENG 351

Grading Assignments 25% Attendence/Quiz 5% Midterm Exam 1 20% Final 30% Tentative Exam Dates: Midterm Exam 1: Nov. 15, 2012 Midterm Exam 2: Dec. 20, 2012 CENG 351

Grading Policies Policy on missed midterm: Lateness policy: no make-up exam Lateness policy: Every student has a total of 5 days for late submission for assignments. One can spend this credit for any of the assignments or distribute it for all. If total of late submissions exceeds the limit, a penalty of 10*day*day is applied. All assignments and programs are to be your own work. No group projects or assignments are allowed. CENG 351

Introduction to File management CENG 351

Motivation Most computers are used for data processing. A big growth area in the “information age” This course covers data processing from a computer science perspective: Storage of data Organization of data Access to data Processing of data CENG 351

Data Structures vs File Structures Both involve: Representation of Data + Operations for accessing data Difference: Data structures: deal with data in main memory File structures: deal with data in secondary storage CENG 351

Where do File Structures fit in Computer Science? Application DBMS File system Operating System Hardware CENG 351

Computer Architecture data is manipulated here - Semiconductors - Fast, expensive, volatile, small Main Memory (RAM) data transfer Secondary Storage - disks, tape - Slow,cheap, stable, large data is stored here CENG 351

Advantages Disadvantages Main memory is fast Secondary storage is big (because it is cheap) Secondary storage is stable (non-volatile) i.e. data is not lost during power failures Disadvantages Main memory is small. Many databases are too large to fit in main memory (MM). Main memory is volatile, i.e. data is lost during power failures. Secondary storage is slow (10,000 times slower than MM) CENG 351

How fast is main memory? Typical time for getting info from: Main memory: ~12 nanosec = 120 x 10-9 sec Magnetic disks: ~30 milisec = 30 x 10-3 sec An analogy keeping same time proportion as above: Looking at the index of a book : 20 sec versus Going to the library: 58 days CENG 351

Normal Arrangement Secondary storage (SS) provides reliable, long-term storage for large volumes of data At any given time, we are usually interested in only a small portion of the data This data is loaded temporarily into main memory, where it can be rapidly manipulated and processed. As our interests shift, data is transferred automatically between MM and SS, so the data we are focused on is always in MM. CENG 351

Goal of the file structures Minimize the number of trips to the disk in order to get desired information Grouping related information so that we are likely to get everything we need with only one trip to the disk. CENG 351

Physical Files and Logical Files physical file: a collection of bytes stored on a disk or tape logical file: a "channel" (like a telephone line) that connects the program to a physical file The program (application) sends (or receives) bytes to (from) a file through the logical file. The program knows nothing about where the bytes go (came from). The operating system is responsible for associating a logical file in a program to a physical file in disk or tape. Writing to or reading from a file in a program is done through the operating system. CENG 351

Files The physical file has a name, for instance myfile.txt The logical file has a logical name (a varibale) inside the program. In C : FILE * outfile; In C++: fstream outfile; CENG 351

Basic File Processing Operations Opening Closing Reading Writing Seeking CENG 351

File Systems Data is not scattered hither and thither on disk. Instead, it is organized into files. Files are organized into records. Records are organized into fields. CENG 351

Example A student file may be a collection of student records, one record for each student Each student record may have several fields, such as Name Address Student number Gender Age GPA Typically, each record in a file has the same fields. CENG 351

Properties of Files Persistance: Data written into a file persists after the program stops, so the data can be used later. Sharability: Data stored in files can be shared by many programs and users simultaneously. Size: Data files can be very large. Typically, they cannot fit into main memory. CENG 351

Introduction to Database Systems Ref. Ramakrishnan & Gehrke Chapter 1 The slides for this text are organized into chapters. This lecture covers Chapter 1. Chapter 1: Introduction to Database Systems Chapter 2: The Entity-Relationship Model Chapter 3: The Relational Model Chapter 4 (Part A): Relational Algebra Chapter 4 (Part B): Relational Calculus Chapter 5: SQL: Queries, Programming, Triggers Chapter 6: Query-by-Example (QBE) Chapter 7: Storing Data: Disks and Files Chapter 8: File Organizations and Indexing Chapter 9: Tree-Structured Indexing Chapter 10: Hash-Based Indexing Chapter 11: External Sorting Chapter 12 (Part A): Evaluation of Relational Operators Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques Chapter 13: Introduction to Query Optimization Chapter 14: A Typical Relational Optimizer Chapter 15: Schema Refinement and Normal Forms Chapter 16 (Part A): Physical Database Design Chapter 16 (Part B): Database Tuning Chapter 17: Security Chapter 18: Transaction Management Overview Chapter 19: Concurrency Control Chapter 20: Crash Recovery Chapter 21: Parallel and Distributed Databases Chapter 22: Internet Databases Chapter 23: Decision Support Chapter 24: Data Mining Chapter 25: Object-Database Systems Chapter 26: Spatial Data Management Chapter 27: Deductive Databases Chapter 28: Additional Topics

Basic Definitions Data Database Database Management System (DBMS) Database System

Basic Definitions Data: Known facts that can be recorded and have an implicit meaning. Database: A collection of related data. Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database System: The DBMS software together with the data itself. Sometimes, the applications are also included.

Files vs. DBMS Application must stage large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, etc.) Special code for different queries Must protect data from inconsistency due to multiple concurrent users Crash recovery Security and access control

Typical DBMS Functionality Define a database : in terms of data types, structures and constraints Construct or load the database on a secondary storage medium Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent

Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the given data model. The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. 5

Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer) 7

Instance of Students Relation Students( sid: string, name: string, login: string, age: integer, gpa: real ) sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 53650 Smith smith@math 19 3.8

Levels of Abstraction Many external schemata, single conceptual(logical) schema and physical schema. External schemata describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. External Schema 1 External Schema 2 External Schema 3 Conceptual Schema Physical Schema Schemas are defined using DDL; data is modified/queried using DML. 6

Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. One of the most important benefits of using a DBMS!

Files and Access Methods These layers must consider concurrency control and recovery Structure of a DBMS A typical DBMS has a layered architecture. This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB 22