CS222/CS122c: Principles of Data Management Lecture #1

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Storing Data: Disks and Files
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files: Chapter 9
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
R. Ramakrishnan and J. Gehrke: Storing Data on Disks 1 Storing Data: Disks and Files Chapter 9 “Yea, from the table of my memory I’ll wipe away all trivial.
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
CS 540 Database Management Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Storing Data: Disks and Files
Storing Data: Disks and Files
Database Applications (15-415) DBMS Internals: Part II Lecture 11, October 2, 2016 Mohammad Hammoud.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Disk Storage, Basic File Structures, and Buffer Management
Introduction to Database Systems
Database Systems November 2, 2011 Lecture #7.
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Overview Continuation from Monday (File system implementation)
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Secondary Storage Management Brian Bershad
File Storage and Indexing
Basics Storing Data on Disks and Files
Secondary Storage Management Hank Levy
CSE451 File System Introduction and Disk Drivers Autumn 2002
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #14 Open Topics and Wrap up Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
CS222P: Principles of Data Management Lecture #1 Fall 2018
Presentation transcript:

CS222/CS122c: Principles of Data Management Lecture #1 Instructor: Chen Li UC Irvine

Overview Welcome grads and undergrads alike! Course wiki page https://grape.ics.uci.edu/wiki/public/wiki/cs222-2017-fall Piazza page https://piazza.com/uci/fall2017/cs222cs122c Sign up ASAP

DB courses at ICS CS122A CS122B CS122C/222 CS223 CS224

Pre-requisite CS122A or equivalent Data structures, algorithms, OS C++ programming skills Willingness to write code to make it work!

Grading Midterm Exam: 25% Final Exam: 25% Four-Part Programming Project: 50% “2-week window” to do a rebuttal

Textbooks Required: Database Management Systems, 3rd edition, by R. Ramakrishnan and J. Gehrke, McGraw Hill, 2003. Recommended: Readings in Database Systems, 4th edition, by J. Hellerstein and M. Stonebraker, MIT Press, 2005.

Projects 1: file and record management (solo project) 2: relation manager (team project) 3: index manager (team project) 4: query engine (team project) C++ 48-hour grace period with a 10% penalty

Next: Overview of DBMS

Files and Access Methods Structure of a DBMS These layers must consider concurrency control and recovery A typical DBMS has a layered architecture. The figure does not show the concurrency control and recovery components (CS 223). This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB 22

DBMS Structure In More Detail SQL Query Parser Query Optimizer Query plans Plan Executor Relational Operators (+ Utilities) (CS223) API calls Files of Records Buffer Manager Access Methods (Indices) Lock Manager Transaction Log Manager Disk Space and I/O Manager WAL Data Files Index Files Catalog Files 22

Components’ Roles Query Parser Query optimizer (often w/2 steps) Parse and analyze SQL query Produce data structure capturing SQL statement and the “objects” that it refers to in the system catalogs Query optimizer (often w/2 steps) Rewrite query logically Perform cost-based optimization Goal is a “good” query plan considering Physical table structures Available access paths (indexes) Data statistics (if known) Cost model (for relational operations) (Cost differences can be orders of magnitude!)

Components’ Roles (continued) Plan Executor + Relational Operators Runtime side of query processing Usually based on “tree of iterators” model, e.g.: Nodes are relational operators (actually they are physical implementations of the various operators)

Components’ Roles (continued) Files of Records OSs usually have byte-stream based APIs DBMSs instead provide record-based APIs Record = set of fields Fields are typed Records reside on pages of files Access Methods Index structures for access based on field values We’ll look at tree-based, hash-based, and spatial structures (including the time-tested B+ tree) Peer layer to record-based files (to map from field values to lists of RIDs or lists of primary keys)

Components’ Roles (continued) Buffer Manager DBMS answer to main memory management Cache of pages from files and indices “DB-oriented” page replacement scheme(s) All disk page accesses go via the buffer pool Also interacts with logging/recovery management (to support undo/redo and thus data consistency) Disk Space and I/O Managers Manage space on disk (pages), including extents Also manage I/O (sync, async, prefetch, …)

Components’ Roles (continued) System Catalog Info about physical data (volumes, table spaces, …) Info about tables (name, columns, types, … ; also constraints, keys, etc., etc.) Data statistics (e.g., value distributions, counts, …) Info about indexes (types, target tables, …) And so on! Views, triggers, security, … Transaction Management (CS 223) ACID: Atomicity, Consistency, Isolation, Durability Lock Manager for C+I Log Manager for A+D

A Brief History of Databases Pre-relational era: 1960’s, early 1970’s Codd’s seminal paper: 1970 Basic RDBMS R&D: 1970-80 (System R, Ingres) RDBMS improvements: 1980-85 Relational goes mainstream: 1985-90 Distributed DBMS research: 1980-90 Parallel DBMS research: 1985-95 Extensible DBMS research: 1985-95 OLAP and warehouse research: 1990-2000 Stream DB and XML DB research: 2000-2010 Big data R&D: 2005-present

So What’s the Plan? We’ll start working our way up the architectural stack next time You should also start on the 4-part course project right away Immediate to-do’s for you are: Read the materials indicated on the wiki Get yourself signed up on Piazza Review SQL and chapters 1-8 if need be Start on part 1 of the project (solo) today!

Next Disks and files Project 1 Overview Paged File Manager Record-Based File Manager

Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

Why Not Store Everything in Main Memory? Costs too much. Dell wants (in early 2014) $65 for 500GB of disk, $600 for 256GB of SSD, and $57 for 4GB of RAM ($0.13, $2.34, $14.25 per GB) Main memory is volatile. We want data to be saved between runs. (Obviously!) Your typical (basic) storage hierarchy: Main memory (RAM) for currently used data Disk for the main database (secondary storage) Tapes for archiving older versions of the data (tertiary storage) And we also have L1 & L2 caches, SSD, … 19

Storage Hierarchy & Latency (Jim Gray): How Far Away is the Data? Registers On Chip Cache On Board Cache Memory Disk 1 2 10 100 Tape /Optical Robot 9 6 This Building This Room My Head 10 min 1.5 hr 2 years 1 min Pluto 2,000 years Andromeda Irvine (UCI) 19

Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequential. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has a major impact on DBMS performance! (SSDs simplify things a bit in this respect) 20

Components of a Disk The platters spin (5400 rpm) Spindle Tracks Disk head The platters spin (5400 rpm) The arm assembly is moved in or out to position a head on a desired track Tracks under heads form a cylinder (imaginary!) Sector Arm movement Platters Only one head reads/writes at any one time. Arm assembly Block size is a multiple of sector size (which is fixed) 21

Accessing a Disk Page Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface) Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1 msec per 4KB page (old) Key to lower I/O cost: Reduce seek/rotation delays! Hardware vs. software solutions? 22

Arranging Pages on Disk `Next’ block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Blocks in a file should be arranged sequentially on disk (by `next’) in order to minimize seek and rotational delay For a sequential scan, prefetching several pages at a time is a big win! 23

RAID (Redundant Array of Inexpensive Disks) Disk Array: Arrangement of several disks that gives abstraction of a single, large disk. Goals: Increase performance and reliability. Two main techniques: Data striping: Data is partitioned; size of a partition is called the striping unit. Partitions are distributed over several disks. Redundancy: More disks => more failures. Redundant information allows reconstruction of data if a disk fails.

RAID 0: No redundancy (just striping)

RAID 1: Mirrored (two identical copies)

Disk Space Management Lowest layer of DBMS software manages the space on disk. Higher levels call upon this layer to: allocate/de-allocate a page read/write a page A request for a sequence of pages must be satisfied by allocating the pages sequentially on disk! Higher levels don’t need to know how this is done or how free space is managed. 24

Project 1 Overview 4