ICS 624 Spring 2011 Multi-Dimensional Clustering Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/24/20111Lipyeow.

Slides:



Advertisements
Similar presentations
Problems in IO & File System CS 1550 Recitation November 4 th /6 th, 2002 The questions in this slide are from Andrew S. Tanenbaum's textbook page 375,
Advertisements

Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
Advance Database System
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Efficient Storage and Retrieval of Data
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
1 Lecture 7: Data structures for databases I Jose M. Peña
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Lecture 11: DMBS Internals
Storage and File Structure. Architecture of a DBMS.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
1 6 Further System Fundamentals (HL) 6.2 Magnetic Disk Storage.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Physical Data Organization and Indexing Lecture 14.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Storage and Indexing1 Overview of Storage and Indexing.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Storing Data Dina Said 1 1.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
CPSC 404 Assignment #1, Winter 2008 Term 2. Due: Wednesday, Feb 4, by 5 pm. Laks V.S. Lakshmanan.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Disk Average Seek Time. Multi-platter Disk platter Disk read/write arm read/write head.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Programmer’s View of Files Logical view of files: –An a array of bytes. –A file pointer marks the current position. Three fundamental operations: –Read.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
Query Processing Part 1: Managing Disks 1.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Oracle SQL*Loader
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Disk storage Index structures for files
Lecture 12 Lecture 12: Indexing.
Midterm Review – Part I ( Disk, Buffer and Index )
File Storage and Indexing
Presentation transcript:

ICS 624 Spring 2011 Multi-Dimensional Clustering Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/24/20111Lipyeow Lim -- University of Hawaii at Manoa

Clustered vs Unclustered Index Suppose data records are stored in a Heap file. – To build clustered index, first sort the Heap file (with some free space on each page for future inserts). – Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not identical to, the sort order.) 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa2

Accessing Data on Disk Seek time: time to move disk heads over the required track Rotational delay: time for desired sector to rotate under the disk head. – Assume uniform distribution, on average time for half a rotation Transfer time: time to actually read/write the data 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa3 sector tracks spindle rotates Arms move over tracks

Example: Barracuda 1TB HDD (ST AS) What is the average time to read 2048 bytes of data ? = Seek time + rotational latency + transfer time = 8.5 msec msec + ( 2048 / 512 ) / 63 * ( msec / 7200 rpm ) = /24/2011Lipyeow Lim -- University of Hawaii at Manoa4 cylinders Bytes/cylinder16065*512 Blocks/cylinde r 8029 Sectors/track63 Heads255 Sprindle Speed 7200 rpm Average Latency 4.16 msec Random read seek time < 8.5 msec Random read Write time < 9.5 msec

Motivating Examples Suppose – Sales table is sorted by year – Index on region – Index on itemID – Index on yearOd – Index entries are 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa5 SELECT Sum(S.sales) FROM Sales S WHERE S.region=‘Mexico’ AND S.itemID=2 AND S.yearOd=1997 SELECT Sum(S.sales) FROM Sales S WHERE S.yearOd=1997 SELECT Sum(S.sales) FROM Sales S WHERE S.region=‘Canada’

Multi-Dimensional Clustering (MDC) DB2 v8 and above. Physical layout mimics a multi-dimensional cube Associates a physical region called blocks for each unique combination of dimension attribute values. These blocks are the units of addressability for our clusters. A block index that addresses these blocks. 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa6 CREATE TABLE Sales( int storeId, date orderDate, int region, int itemId, float price, int yearOd generated always as year(orderDate)) ORGANIZE BY DIMENSIONS (region, yearOd, itemId) See Multi-Dimensional Clustering: A New Data Layout Scheme in DB2. SIGMOD 2003:

MDC Storage Layout 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa7 Conventional Heap File Tuples and pages sorted by eg. time Pages grouped into blocks by multi- dimensional values Blocks sorted by multi-dimensional values. MDC Storage Layout

Dimension Block Index Regular B-tree indexes except that data entries contain One index for each dimension 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa8 MDC Storage Layout SELECT Sum(S.sales) FROM Sales S WHERE S.region=‘Canada’

Issues & Questions Choosing the MDC dimensions Overhead in maintaining the additional level of indirection Maintaining the clustering with updates 1/24/2011Lipyeow Lim -- University of Hawaii at Manoa9