File Storage and Indexing

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Chapter 12: File System Implementation
File Systems.
Advance Database System
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
1 Lecture 7: Data structures for databases I Jose M. Peña
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Physical Data Organization and Indexing Lecture 14.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
CS4432: Database Systems II Record Representation 1.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Module 4.0: File Systems File is a contiguous logical address space.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 5 Record Storage and Primary File Organizations
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Lecture : chapter 9 and 10 file system 1. File Concept A file is a collection of related information defined by its creator. Contiguous logical address.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
File Organization Record Storage and Primary File Organization
Chapter 3 Data Representation
Query Processing Part 1: Managing Disks 1.
Chapter 2 Memory and process management
Module 11: File Structure
File-System Implementation
CHP - 9 File Structures.
Chapter 11: File System Implementation
Indexing Goals: Store large files Support multiple search keys
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
FileSystems.
Database Management Systems (CS 564)
Computer Science 210 Computer Organization
Chapter 11: File System Implementation
Oracle SQL*Loader
Operating System I/O System Monday, August 11, 2008.
Disk Storage, Basic File Structures, and Hashing
Mass-Storage Structure
9/12/2018.
Chapter 8: Main Memory.
Chapter 11: File System Implementation
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Disk storage Index structures for files
Chapter 11: File System Implementation
Lecture 3: Main Memory.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Secondary Storage Management Brian Bershad
RDBMS Chapter 4.
File Storage and Indexing
Indexing 4/11/2019.
Secondary Storage Management Hank Levy
File Organization.
Chapter 11: File System Implementation
Presentation transcript:

File Storage and Indexing Chapters 3-6 12/1/97

DBs Reside on Disks Main reasons: (main) memory is too small and too volatile Understanding disk operation and performance is important Data structures and algorithms for disk are very different from those appropriate for memory The single most important fact about disks, relative to memory, is… (we shall see) 12/1/97

Review: Disk Organization Spinning magnetic surfaces, stacked R/W head per surface Track: circular path Cylinder: vertical stack of tracks Sectors: fixed length physical blocks along track separated by gaps overhead info (ID, error-correction, etc.) each sector has a hardware address 12/1/97

Disks and OS OS manages disks Files are allocated in blocks (groups of sectors) units may be scattered on disk no simple map between logical records (program/file view) and physical placement OS makes file look contiguous 12/1/97

Beyond Character Streams C/C++ view Textfile: stream of characters Stream libraries automatically convert characters into internal data types Eg: "12 15.9 a" on the file can be automatically converted and stored into int, double, char variables Broader view File as a sequence of binary data Might correspond to a struct No automatic conversion Also supported by C/C++ 12/1/97

Program Viewpoint Files are collections of logical "records" records are commonly (but not always) fixed in size and format fields in records are commonly (but not always) fixed in size and format Sequential access: program gets data one record at a time Random access: program can ask for a record by its relative block # (not disk address!) 12/1/97

Common Modes of Record Processing Single record (random) …WHERE SSN=345667899 Multiple records (no special order) SELECT * FROM DEPT... Multiple records (in order) …ORDER BY SSN An efficient storage scheme should support all of these modes 12/1/97

Naïve Relational Implementation Each table is a separate file All rows in a table are of fixed size and format could correspond to a C struct Rows are stored in order by primary key We can revise this view as we go SQL types won't match program types Sequence of records is not efficiently accessed or updated 12/1/97

What's the Big Deal? (actual numbers will vary) CPU speed: 100-500MHz Memory access speed: (<100ns) how many CPU cycles is this? Disk speed has three components Seek time: to position the W/R head (10ms) how many memory cycles is this? Latency: rotational speed (3600+rpm) Transfer time: movement of data to/from memory (5Mb+/sec) 12/2/97

Application Dependency "Streaming" applications video, audio, file transfers can take advantage of high rotational speeds, high bus bandwidth Traditional DB applications one record at a time, randomly selected, small amount of data high degree of normalization makes it worse! 12/1/97

Fact of Life Seek time outweighs all other time factors Implication: Martin's three rules for efficient file access: 1. Avoid seeks 2. Avoid seeks 3. Avoid seeks Corollary: Any technique that takes more than one seek to locate a record is unsatisfactory. 12/1/97

Memory vs Disk Data Structures Example: Binary Search of a sorted array Needs O(Log2N) operations How many accesses if 1,000,000 integers in memory? The answer (20) is acceptable How many seeks if 1,000,000 records on disk? The answer (20) in unacceptable Worse problem:And how did they get sorted (NlogN at best)? 12/1/97

Worse and Worse 1,000,000 sorted records on disk... Worse problem:And how did they get sorted? Sort is NlogN at best... do the math And worse When you add or delete records, how do you keep the array sorted? 12/1/97

Avoiding Seeks Overlapping disk operations Disk cache double buffering Disk cache locality principle: recently used pages are likely to be needed again maintained by OS or DBMS Sequential access Seek needed only for first record Smart file organizations 12/1/97

A Smart File Organization... One which needs very few seeks to locate any record Hashing: go directly to the desired record, based on a simple calculation Indexing: go directly to the desired record, based on a simple look-up B-tree: a widely used index organization Also various schemes which combine indexing and hashing 12/1/97

Review: Hashing Main idea: address of record is calculated from some value in the record usually based on primary key Zillions of possible hash functions Hard to find a good hash function 12/1/97

Pitfalls of Hashing Conflicts: more than one record hashing to the same address. schemes to overcome conflicts: overflow areas; chained records, etc. all involve more disk I/O Wasted space No efficient access for other than primary key No efficient way for sequential (especially sorted) traversal 12/1/97