Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin

Similar presentations


Presentation on theme: "Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin"— Presentation transcript:

1 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Secondary storage management Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin

2 Presentation Outline 13.1 The Memory Hierarchy 13.2 Disks
Transfer of Data Between Levels Volatile and Nonvolatile Storage Virtual Memory 13.2 Disks Mechanics of Disks The Disk Controller Disk Access Characteristics Introductory notes.

3 Presentation Outline (con’t)
13.3 Accelerating Access to Secondary Storage The I/O Model of Computation Organizing Data by Cylinders Using Multiple Disks Mirroring Disks Disk Scheduling and the Elevator Algorithm Prefetching and Large-Scale Buffering Introductory notes.

4 Memory Hierarchy Several components for data storage having different data capacities available Cost per byte to store data also varies Device with smallest capacity offer the fastest speed with highest cost per bit

5 Memory Hierarchy Diagram
Programs, DBMS Main Memory DBMS’s Tertiary Storage As Visual Memory Disk File System Main Memory Cache

6 13.1.1 Memory Hierarchy Cache Lowest level of the hierarchy
Data items are copies of certain locations of main memory Sometimes, values in cache are changed and corresponding changes to main memory are delayed Machine looks for instructions as well as data for those instructions in the cache Holds limited amount of data

7 13.1.1 Memory Hierarchy (con’t)
No need to update the data in main memory immediately in a single processor computer In multiple processors data is updated immediately to main memory….called as write through

8 Main Memory Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory Main memories are random access….one can obtain any byte in the same amount of time

9 Secondary storage Used to store data and programs when they are not being processed More permanent than main memory, as data and programs are retained when the power is turned off E.g. magnetic disks, hard disks

10 Tertiary Storage Holds data volumes in terabytes
Used for databases much larger than what can be stored on disk

11 13.1.2 Transfer of Data Between levels
Data moves between adjacent levels of the hierarchy At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time Disk is organized into bocks Entire blocks are moved to and from memory called a buffer

12 13.1.2 Transfer of Data Between level (cont’d)
A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time Same idea applies to other hierarchy levels

13 13.1.3 Volatile and Non Volatile Storage
A volatile device forgets what data is stored on it after power off Non volatile holds data for longer period even when device is turned off All the secondary and tertiary devices are non volatile and main memory is volatile

14 13.1.4 Virtual Memory Typical software executes in virtual memory
Address space is typically 32 bit or 232 bytes or 4GB Transfer between memory and disk is in terms of blocks

15 13.2.1 Mechanism of Disk Mechanisms of Disks
Use of secondary storage is one of the important characteristic of DBMS Consists of 2 moving pieces of a disk 1. disk assembly 2. head assembly Disk assembly consists of 1 or more platters Platters rotate around a central spindle Bits are stored on upper and lower surfaces of platters

16 13.2.1 Mechanism of Disk Disk is organized into tracks
The track that are at fixed radius from center form one cylinder Tracks are organized into sectors Tracks are the segments of circle separated by gap

17

18 Disk Controller One or more disks are controlled by disk controllers Disks controllers are capable of Controlling the mechanical actuator that moves the head assembly Selecting the sector from among all those in the cylinder at which heads are positioned Transferring bits between desired sector and main memory Possible buffering an entire track

19 13.2.3 Disk Access Characteristics
Accessing (reading/writing) a block requires 3 steps Disk controller positions the head assembly at the cylinder containing the track on which the block is located. It is a ‘seek time’ The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’ All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’

20 13.3 Accelerating Access to Secondary Storage
Several approaches for more-efficiently accessing data in secondary storage: Place blocks that are together in the same cylinder. Divide the data among multiple disks. Mirror disks. Use disk-scheduling algorithms. Prefetch blocks into main memory. Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm. Throughput – the number of disk accesses per second that the system can accommodate. Objectives for instruction and expected results and/or skills developed from learning.

21 13.3.1 The I/O Model of Computation
The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm. This should be minimized. Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. For Megatron 747 (M747) example, it takes 11ms to read a 16k block. A standard microprocessor can execute millions of instruction in 11ms, making any delay in searching for the desired tuple negligible. Objectives for instruction and expected results and/or skills developed from learning.

22 13.3.2 Organizing Data by Cylinders
If we read all blocks on a single track or cylinder consecutively, then we can neglect all but first seek time and first rotational latency. Ex 13.4: We request 1024 blocks of M747. If data is randomly distributed, average latency is ms by Ex 13.2, making total latency 11s. If all blocks are consecutively stored on 1 cylinder: 6.46ms ms * 16 = 139ms (1 average seek) (time per rotation) (# rotations) Objectives for instruction and expected results and/or skills developed from learning.

23 Using Multiple Disks If we have n disks, read/write performance will increase by a factor of n. Striping – distributing a relation across multiple disks following this pattern: Data on disk R1: R1, R1+n, R1+2n,… Data on disk R2: R2, R2+n, R2+2n,… Data on disk Rn: Rn, Rn+n, Rn+2n, … Ex 13.5: We request 1024 blocks with n = 4. 6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek) (time per rotation) (# rotations) Objectives for instruction and expected results and/or skills developed from learning.

24 Mirroring Disks Mirroring Disks – having 2 or more disks hold identical copied of data. Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks. Benefit 2: If we have n disks, read performance increases by a factor of n. Performance increases further by having the controller select the disk which has its head closest to desired data block for each read. Objectives for instruction and expected results and/or skills developed from learning.

25 13.3.5 Disk Scheduling and the Elevator Problem
Disk controller will run this algorithm to select which of several requests to process first. Pseudo code: requests[] // array of all non-processed data requests upon receiving new data request: requests[].add(new request) while(requests[] is not empty) move head to next location if(head location is at data in requests[]) retrieve data remove data from requests[] if(head reaches end) reverse head direction Objectives for instruction and expected results and/or skills developed from learning.

26 13.3.5 Disk Scheduling and the Elevator Problem (con’t)
Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 64000 56000 48000 Current time 30 Current time 34.2 Current time 45.5 Current time 56.8 Current time 26.9 Current time 20 Current time Current time 4.3 Current time Current time 10 Current time 13.6 40000 32000 24000 16000 8000 data time 8000.. 4.3 13.6 26.9 34.2 45.5 data time 8000.. 4.3 13.6 26.9 34.2 45.5 56.8 data time 8000.. 4.3 13.6 26.9 34.2 data time 8000.. 4.3 13.6 26.9 data time 8000.. 4.3 data time 8000.. 4.3 13.6 data time Objectives for instruction and expected results and/or skills developed from learning.

27 13.3.5 Disk Scheduling and the Elevator Problem (con’t)
Elevator Algorithm FIFO Algorithm data time 8000.. 4.3 13.6 26.9 34.2 45.5 56.8 data time 8000.. 4.3 13.6 26.9 42.2 59.5 70.8 Objectives for instruction and expected results and/or skills developed from learning.

28 13.3.6 Prefetching and Large-Scale Buffering
If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed. Objectives for instruction and expected results and/or skills developed from learning.

29 Questions An opportunity for questions and discussions.


Download ppt "Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin"

Similar presentations


Ads by Google