Copyright © 1995-2003 Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE CS582: Distributed Systems Lecture 15 - October.

Slides:



Advertisements
Similar presentations
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Redundant Array of Independent Disks
More on File Management
Raid dr. Patrick De Causmaecker What is RAID Redundant Array of Independent (Inexpensive) Disks A set of disk stations treated as one.
CSCE430/830 Computer Architecture
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
CSE521: Introduction to Computer Architecture Mazin Yousif I/O Subsystem RAID (Redundant Array of Independent Disks)
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
File System Implementation
The design and implementation of a log-structured file system The design and implementation of a log-structured file system M. Rosenblum and J.K. Ousterhout.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
1 File Management in Representative Operating Systems.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
FFS, LFS, and RAID Andy Wang COP 5611 Advanced Operating Systems.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
Redundant Array of Independent Disks
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Log-structured File System Sriram Govindan
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Serverless Network File Systems Overview by Joseph Thompson.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Embedded System Lab. 서동화 The Design and Implementation of a Log-Structured File System - Mendel Rosenblum and John K. Ousterhout.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
Transactions and Reliability Andy Wang Operating Systems COP 4610 / CGS 5765.
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
Introduction to RAID Rogério Perino de Oliveira Neves Patrick De Causmaecker
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
W4118 Operating Systems Instructor: Junfeng Yang.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
CS Introduction to Operating Systems
Transactions and Reliability
Vladimir Stojanovic & Nicholas Weaver
Advanced Operating Systems Lecture notes
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Overview Continuation from Monday (File system implementation)
RAID RAID Mukesh N Tekwani April 23, 2019
Andy Wang COP 5611 Advanced Operating Systems
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE CS582: Distributed Systems Lecture 15 - October 22, 2003 File System - Performance (slides by Dr. Katia Obraczka) Dr. Shahab Baqai LUMS

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Outline Leases –Continuum of cache consistency mechanisms. Log Structured File System and RAID. –FS performance from the storage management point of view.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Review File Systems. File System Case Studies: –NFS. –Sprite. –Andrew. –Coda.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Caching Improves performance in terms of response time, availability during disconnected operation, and fault tolerance. Price: consistency –Methods: ▪Timestamp-based invalidation –Check on use ▪Callbacks

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Leases Time-based cache consistency protocol. Contract between client and server. –Lease grants holder control over writes to corresponding data item during lease term. –Server must obtain approval from holder of lease before modifying data. –When holder grants approval for write, it invalidates its local copy.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Protocol Description 1 CS T=0 Read (1) read (file-name) (2) file, lease(term) CS T < term Read $ (1) read (file-name) (2) file If file still in cache: if lease is still valid, no need to go to server.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Protocol Description 2 CS T > term Read (1) read (file-name) (2) if file changed, file, extend lease On writes: CS T=0 Write (1) write (file-name) Server defers write request till: approval from lease holder(s) or lease expires.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Considerations Unreachable lease holder(s)? Leases and callbacks. –Consistency? –Lease term

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Lease Term Short leases: –Minimize delays due to failures. –Minimize impact of false sharing. –Reduce storage requirements at server (expired leases reclaimed). Long leases: –More efficient for repeated access with little write sharing.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Lease Management 1 Client requests lease extension before lease expires in anticipation of file being accessed. –Performance improvement?

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Lease Management 2 Multiple files per lease. –Performance improvement? –Example: one lease per directory. –System files: widely shared but infrequently written. –False sharing? –Multicast lease extensions periodically.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Lease Management 3 Lease term based on file access characteristics. –Heavily write-shared file: lease term = 0. –Longer lease terms for distant clients.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Clock Synchronization Issues Servers and clients should be roughly synchronized. –If server clock advances too fast or client’s clock too slow: inconsistencies.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Next... Papers on file system performance from storage management perspective. Issues: –Disk access time >>> memory access time. –Discrepancy between disk access time improvements and other components (e.g., CPU). Minimize impact of disk access time by: –Reducing # of disk accesses or –Reducing access time by performing parallel access.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Log-Structured File System Built as extension to Sprite FS (Sprite LFS). New disk storage technique that tries to use disks more efficiently. Assumes main memory cache for files. Larger memory makes cache more efficient in satisfying reads. –Most of the working set is cached. Thus, most disk access cost due to writes!

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Main Idea Batch multiple writes in file cache. –Transform many small writes into 1 large one. – Close to disk’s full bandwidth utilization. Write to disk in one write in a contiguous region of disk called log. –Eliminates seeks. –Improves crash recovery. ▪Sequential structure of log. ▪Only most recent portion of log needs to be examined.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE LSFS Structure Two key functions: –How to retrieve information from log. –How to manage free disk space.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE File Location and Retrieval 1 Allows random access to information in the log. –Goal is to match or increase read performance. –Keeps indexing structures with log. Each file has i-node containing: –File attributes (type, owner, permissions). –Disk address of first 10 blocks. –Files > 10 blocks, i-node contains pointer to more data.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE File Location and Retrieval 2 In UNIX FS: –Fixed mapping between disk address and file i- node: disk address as function of file id. In LFS: –I-nodes written to log. –I-node map keeps current location of each i-node. –I-node maps usually fit in main memory cache. i-node’s disk address File id

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Free Space Management Goal: maintain large, contiguous free chunks of disk space for writing data. Problem: fragmentation. Approaches: –Thread around used blocks. ▪Skip over active blocks and thread log through free extents. –Copying. ▪Active data copied in compacted form at head of log. ▪Generates contiguous free space. ▪But, expensive!

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Free Space Management in LFS Divide disk into large, fixed-size segments. –Segment size is large enough so that transfer time (for read/write) >>> seek time. Hybrid approach. –Combination of threading and copying. –Copying: segment cleaning. –Threading between segments.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Segment Cleaning Process of copying “live” data out of segment before rewriting segment. Number of segments read into memory; identify live data; write live data back to smaller number of clean, contiguous segments. Segments read are marked as “clean”. Some bookkeeping needed: update files’ i- nodes to point to new block locations, etc.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Crash Recovery When crash occurs, last few disk operations may have left disk in inconsistent state. –E.g., new file written but directory entry not updated. At reboot time, OS must correct possible inconsistencies. Traditional UNIX FS: need to scan whole disk.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Crash Recovery in Sprite LFS 1 Locations of last disk operations are at the end of the log. –Easy to perform crash recovery. 2 recovery strategies: –Checkpoints and roll-forward. Checkpoints: –Positions in the log where everything is consistent.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Crash Recovery in Sprite LFS 2 After crash, scan disk backward from end of log to checkpoint, then scan forward to recover as much information as possible: roll forward.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE More on LFS Paper talks about their experience implementing and using LFS. Performance evaluation using benchmarks. Cleaning overhead.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Redundant Arrays of Inexpensive Disks (RAID) Improve disk access time by using arrays of disks. Motivation: –Disks are getting inexpensive. –Lower cost disks: ▪Less capacity. ▪But cheaper, smaller, and lower power. Paper proposal: build I/O systems as arrays of inexpensive disks. –E.g., 75 inexpensive disks have 12 * I/O bandwidth of expensive disks with same capacity.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE RAID Organization 1 Interleaving disks. –Supercomputing applications. –Transfer of large blocks of data at high rates.... Grouped read: single read spread over multiple disks

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE RAID Organization 2 Independent disks. –Transaction processing applications. –Database partitioned across disks. –Concurrent access to independent items.... Read Write

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Problem: Reliability Disk unreliability causes frequent backups. What happens with 100*number of disks? –MTTF becomes prohibitive –Fault tolerance otherwise disk arrays are too unreliable to be useful. RAID: use of extra disks containing redundant information. –Similar to redundant transmission of data.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE RAID Levels Different levels provide different reliability, cost, and performance. MTTF as function of total number of disks, number of data disks in a group (G), number of check disks per group (C), and number of groups. C determined by RAID level.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE First RAID Level Mirrors. –Most expensive approach. –All disks duplicated (G=1 and C=1). –Every write to data disk results in write to check disk. –Double cost and half capacity.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Second RAID Level Hamming code. Interleave data across disks in a group. Add enough check disks to detect/correct error. Single parity disk detects single error. Makes sense for large data transfers. Small transfers mean all disks must be accessed (to check if data is correct).

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Third RAID Level Lower cost by reducing C to 1. –Single parity disk. Rationale: –Most check disks in RAID 2 used to detect which disks failed. –Disk controllers do that. –Data on failed disk can be reconstructed by computing the parity on remaining disks and comparing it with parity for full group.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fourth RAID Level Try to improve performance of small transfers using parallelism. Transfer units stored in single sector. –Reads are independent, i.e., errors can be detected without having to use other disks (rely on controller). –Also, maximum disk rate. –Writes still need multiple disk access.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fifth RAID Level Tries to achieve parallelism for writes as well. Distributes data as well as check information across all disks.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Failure Correction Techniques for Disk Arrays Examines codes for reliably storing information in disk arrays. Tries to make disk array at least as reliable as single disk. –As number of disks increases, reliability deteriorates.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Redundancy Metrics Disk as stack of bits. –ith. bit in each disk forms the ith. codeword in the redundancy encoding. Mean time to data loss (MTTDL) Check disk overhead: check disks/data disks Update penalty: number of check disks to b updated.... bi Codeword i

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE 1d-Parity Single-erasure-correction scheme. For G data disks, 1 check disk with parity of all G disks G=4 Parity. Overhead: 1/G.. Update penalty: 1.. Group size: G+1.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE 2-d Parity G 2 data disks arranged in 2-dimensional array. For each row and each column, 1 check disk stores parity for that row or column G= Failed disk belongs to 2 groups => can be reconstructed from data of either.. Any set of 2 erasures can be corrected.. Double-erasure-correcting.. Check disk overhead 2G/G 2.. Update penalty = 2.. Group size = G+1.

Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE More... Linear codes: –Compute check bits as the parity of subsets of the information bits. Double-erasure and triple-erasure correcting codes: –Full-2 and 2d-parity: double-erasure- correcting. –Full-3 and additive-3: triple-erasure- correcting.