Tuples vs. Records CREAT TABLE MovieStar ( Name CHAR (30), Address VARCHAR (255), Gender CHAR (1), DataOfBirth Date ); Tuples are similar to records or.

Slides:



Advertisements
Similar presentations
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 8 – File Structures.
Advertisements

Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
1. 1. Database address space 2. Virtual address space 3. Map table 4. Translation table 5. Swizzling and UnSwizzling 6. Pinned Blocks 2.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
 Presented By:Payal Gupta  Roll Number:106 (225 in scetion 2)  Professor :Tsau Young Lin.
Variable Length Data and Records Eswara Satya Pavan Rajesh Pinapala CS 257 ID: 221.
Representing Block and Record Addresses Rajhdeep Jandir ID: 103.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
CS 4432lecture #61 CS4432: Database Systems II Lecture #6 Professor Elke A. Rundensteiner.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Chapter 12.2: Records Kristen Mori CS 257 – Spring /4/2008.
13.5 Representing Data Elements Fields, Records, Blocks Variable-length Data Modifying Records.
12.5 Record Modifications Sadiya Hameed ID: 206 CS257.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Primary Indexes Dense Indexes
Chapter 12 Representing Data Elements By Yue Lu CS257 Spring 2008 Instructor: Dr.Lin.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
13.6 Representing Block and Record Addresses Ramya Karri CS257 Section 2 ID: 206.
13.5 Arranging data on disk Meghna Jain ID-205CS257 ‏Prof: Dr. T.Y.Lin.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Bhanu Choudhary CS257 Section 1 ID: 101.  Introduction  Addresses in Client-Server Systems  Logical and Structured Addresses  Pointer Swizzling 
13.6 Representing Block and Record Addresses
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure File Organization Organization of Records in Files.
CS4432: Database Systems II Record Representation 1.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
CS 4432lecture #51 Data Items Records Blocks Files Memory Next:
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure II Some of the slides are from slides of.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
Representing Block & Record Addresses
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
Madhuri Gollu Id: 207. Agenda Agenda  Records with Variable Length Fields  Records with Repeating Fields  Variable Format Records  Records that do.
Chapter 31 Chapter 3 Representing Data Elements. Chapter 32 Fields, Records, Blocks, Files Fields (attributes) need to be represented by fixed- or variable-length.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
CS 257: Database System Principles Variable length data and record BY Govind Kalyankar Class Id: 107.
Storage and File Organization
Query Processing Part 1: Managing Disks 1.
Module 11: File Structure
CHP - 9 File Structures.
Secondary Storage Management 13.5 Arranging data on disk
Database Management Systems (CS 564)
CS 245: Database System Principles Notes 03: Disk Organization
Database Implementation Issues
Database Implementation Issues
Secondary Storage Management 13.5 Arranging data on disk
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
Representing Block & Record Addresses
CS 245: Database System Principles Disk Organization
Variable Length Data and Records
DATABASE IMPLEMENTATION ISSUES
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Database Implementation Issues
VIJAYA PAMIDI CS 257- Sec 01 ID:102
Database Implementation Issues
Lecture 20: Representing Data Elements
Database Implementation Issues
Presentation transcript:

Tuples vs. Records CREAT TABLE MovieStar ( Name CHAR (30), Address VARCHAR (255), Gender CHAR (1), DataOfBirth Date ); Tuples are similar to records or “structs” in C/C++ The record will occupy (part of) some disk block, and within the record there will be one field for every attribute of the relation. While this idea appears simple, the “devil is in details.” In relational terms: Field = sequence of bytes representing the value of an attribute in a tuple. Record = sequence of bytes divided into fields, representing a tuple. File = collection of blocks used to hold a relation = set of tuples/records. In object­oriented terms (ODL): Field represents an attribute or relationship. Record represents an object. File represents extent of a class.

Representing Data Elements 1. How do we represent fields? 2. How do we represent records? 3. How do we represent collections of records or tuples in blocks of memory? 4. How do we cope with variable record length? 5. How do we cope with the growth of record sizes? 6. How do we represent blobs ( B inary L arge OB jects)?

How do we represent fields? Typical attribute can be represented by a fixed length field. -Integers, reals  2--8 byte fields. -CHAR( n )  n bytes. -VARCHAR( n )  n +1 bytes. -SQL2 DATE  10 bytes. More difficult: 1. Truly varying character string. 2. Many­to-many relationship = set of pointers of arbitrary size.

Importance of data representation - Y2K Problem Many DBMS had a representation for dates: YYMMDD. The problem was that applications were taking advantage of the fact that: if date d1 is earlier than d2 then lexicog. d1<d2. SELECT name FROM MovieStar WHERE birthdate < '980603‘ When some children will be born in the new millenium, their birthdates will be lexicographically less than ‘980601’!

Fixed­Length Records Header = space for information about the record, e.g., record format (pointer to schema), record length, timestamp. nameaddress birth gender to schema header lengthtimestamp We need to know how to access the schema from the record, in case records in the same block belong to different relations. Space for each field of the record. a.Sometimes, it is more efficient to align fields starting at a multiple of 4. nameaddress birth gender header nameaddress birth gender header Aligned

Packing Fixed-length Records into Blocks The simplest form of packing is when: block holds tuples from one relation, and tuples have a fixed format. Blk. Header | Rec1 | Rec2| …|Recn| E.g. a directory giving the offset of each record in the block.

Inside the block Notice: The offset table grows from the front of the block, while the records are placed starting at the end of the block. Record address = physical block address + offset of the entry in the block’s offset table. We can move records around within the block, and all we have to do is change the record entry in the offset table. We have an option, should the record be deleted; we can leave in its offset-table entry a tombstone . -After a record deletion, following a pointer to this record leads to a tombstone. -Had we not left the tombstone, the pointer might lead to some new record, with surprising results.

Representing addresses We need pointers especially in object oriented databases. Two kind of addresses: -Physical (e.g. host, driveID, cylinder,surface, sector (block), offset) -Logical (unique ID). Physical addresses are very long ; 8B is the minimum – (up to 16B in some systems) Example : A database of objects that is designed to last 100 years. -If the database grows to encompass 1 million machines and each machine creates 1 object each nanoseconds then we could have 2 77 objects. -10 bytes are needed to represent addresses for that many objects.

Representing addresses (Cont.) We need a map table for flexibility. The level of indirection gives the flexibility. For example, many times we move records around, either within a block or from block to block. What about the programs that are pointing to these records? -They are going to have dangling pointers, if they work with physical addresses. We only arrange the map table! logical physical Logical address Physical address Efficiency? It is an issue because of indirection.

Pointer swizzling I Typical DB structure: Data maintained by server process, using physical or logical addresses of perhaps 8 bytes. Application programs are clients with their own (conventional memory) address spaces. When blocks and records are copied to client's memory, DB addresses must be swizzled = translated to virtual­memory addresses. -Allows conventional pointer following. -Especially important in OODBMS, where pointers­as­data are common. DBMS uses translation table Db addressmemory address

Pointer swizzling II DBMS uses a translation table DBaddr Mem-addr database address memory address Logical/Physical vs. DBaddr/Mem-addr map Logical and Physical address are both representations for the database address. In contrast, memory addresses in the translation table are for copies of the corresponding object in memory. All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table.

Swizzling Example DiskMemory Block 1 Block 2 Read into memory Unswizzled Swizzled

Pointer swizzling III Swizzling Options: 1.Never swizzle. Keep a translation table of DB pointers  local pointers; consult map to follow any DB pointer. Problem: time to follow pointers. 2.Automatic swizzling. When a block is copied to memory, replace all its DB pointers by local pointers. Problem: requires knowing where every pointer is (use block and record headers for schema info). Problem: large investment if not too many pointer­followings occur. 3.Swizzle on demand. When a block is copied to memory, enter its own address and those of member records into translation table, but do not translate pointers within the block. If we follow a pointer, translate it the first time. Problem: requires a bit in pointer fields for DB/local, Problem: extra decision at each pointer following.

Pinned records Pinned record = some swizzled pointer points to it Pointers to pinned records have to be unswizzled before the pinned record is returned to disk We need to know where the pointers to it are Implementation: keep a linked list of all (swizzled) records pointing to a record. xy y y Swizzled pointer

Exercise Suppose that if we swizzle all pointers automatically, we can perform the swizzling in half the time it would take to swizzle each one separately. If the probability that a pointer in main memory will be followed at least once is p, for what values of p is it more efficient to swizzle automatically than on demand? Suppose c is the cost of swizzling an individual pointer. This question asks for what values of p is the cost of swizzling fraction p of the pointers at cost c is greater than swizzling them all at cost c /2? That is, pc > c/2, or p > 1/2.

Variable-Length Data So far, we assumed fixed format, fixed length, and the schema is a list of fixed-len. fields. Real life is more complicated; -- varying size data items (eg,address) -- repeating fields (star-to-movie relationship) -- varying format records (SSD) Sliding Records Use offset table in a block, pointing to current records. If a record grows, slide records around the block. Not enough space? Create overflow block; offset table must indicate “record moved.”

Split Records Into Fixed/Variable Parts Fixed part has a pointer to space where current value of variable fields can be found. Example: Studio records = name and address (fixed), plus a set of pointers to movies by that studio.

Varying schema Varying schema, e.g. XML-data, Information integration, Semi-Structured Data Records with flexible schema (lots of null- values) Store “schema with record”

BLOB's Binary, Large Object = field whose value doesn't fit in a block, e.g., video clip. Hold in collection of blocks, e.g., several cylinders for fast retrieval. Allow client to access only part of value, e.g., get first 10 seconds of a video, and supply each 10­second segment within next 10 seconds.

Clustering relations together Suppose there is a many­one relation from relation R to S. It may make sense to keep tuples of R with their “owner” in S. Example -Studios(name, addr) -Movies(title, year, studio) Why Cluster? Supports queries such as: SELECT title FROM Movies WHERE studio = 'Disney'; Few disk I/O's needed to get all Disney movies. Even supports a query in which only the address of the studio is given and a join is needed. Notice the importance of type info in record headers. Problem: When does clustering lose?

Files File = Collection of blocks. How located? 1.Linked list of blocks. 2.Tree­structured access as for UNIX files. 3.Other index structures --- to be covered in detail.

Sequential Files Records ordered by search key (may not be “key” in DB sense). Blocks containing records therefore ordered. On insert: put record in appropriate block if room. -Good idea: initialize blocks to be less than full; reorganize periodically if file grows. If no room in proper block: 1.Create new block; insert into proper order if possible (what if blocks are consecutive around a track for efficiency?). 2.If not possible, create overflow block, linked from original block.

Deletion/Tombstones On delete: can we remove record, possibly consolidate blocks, delete overflow blocks? Danger: pointers to record become dangling. Solution: tombstone in record header = bit saying deleted/not. Rest of record space can be reused.