Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

More on File Management
File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Allocation Methods - Contiguous
File Systems Examples.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
Comparison and Performance Evaluation of SAN File System Yubing Wang & Qun Cai.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Computer Organization Cs 147 Prof. Lee Azita Keshmiri.
1 I/O Management in Representative Operating Systems.
STORAGE Virtualization
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Objectives Learn what a file system does
Hands-On Microsoft Windows Server 2008
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX Margo Seltzer, Harvard U. Keith Bostic, U. C. Berkeley Marshall Kirk McKusick, U. C. Berkeley.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter pages1 File Management Chapter 12.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
1 Interface Two most common types of interfaces –SCSI: Small Computer Systems Interface (servers and high-performance desktops) –IDE/ATA: Integrated Drive.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Log-structured File System Sriram Govindan
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Chapter 4 Memory Management Virtual Memory.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Fast File System 2/17/2006. Introduction Paper talked about changes to old BSD 4.2 File System (FS) Motivation - Applications require greater throughput.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
UNIX & Windows NT Name: Jing Bai ID: Date:8/28/00.
Linux file systems Name: Peijun Li Student ID: Prof. Morteza Anvari.
1 © 2002 hp Introduction to EVA Keith Parris Systems/Software Engineer HP Services Multivendor Systems Engineering Budapest, Hungary 23May 2003 Presentation.
Department of Computer Sciences, University of Wisconsin Madison DADA – Dynamic Allocation of Disk Area Jayaram Bobba Vivek Shrivastava.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
W4118 Operating Systems Instructor: Junfeng Yang.
An Introduction to GPFS
Jonathan Walpole Computer Science Portland State University
File System Implementation
Sarah Diesburg Operating Systems COP 4610
Chapter 11: File System Implementation
File System Structure How do I organize a disk into a file system?
Filesystems.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM
Introduction to Operating Systems
Cloud Computing Architecture
Chapter 14: File-System Implementation
Sarah Diesburg Operating Systems CS 3430
Presentation transcript:

Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison

2 Texas A&M University Narasimha Reddy 5/1/2008 Outline  Big Picture  Part I: Flexible Storage Allocation –Introduction and Motivation –Design of Virtual Allocation –Evaluation  Part II: Data Distribution in Networked Storage Systems –Introduction and Motivation –Design of User-Optimal Data Migration –Evaluation  Part III: Storage Management across diverse devices  Conclusion

3 Texas A&M University Narasimha Reddy 5/1/2008 Storage Allocation  Allocate entire storage space at the time of the file system creation  Storage space owned by one operating system cannot be used by another 30 GB 50 GB Windows NT (NTFS) Linux (ext2) 70 GB 50 GB 98 GB AIX (JFS) Running out of space! Actual Allocations

4 Texas A&M University Narasimha Reddy 5/1/2008 Big Picture  Memory systems employ virtual memory for several reasons  Current storage systems lack such flexibility  Current file systems allocate storage statically at the time of their creation –Storage allocation: Space on the disk is not allocated well across multiple file systems

5 Texas A&M University Narasimha Reddy 5/1/2008 File Systems with Virtual Allocation  When a file system is created with X GB, –Allows the file system to be created with only Y GB, where Y << X –Remaining space used as one common available pool –As the file system grows, the storage space can be allocated on demand 30 GB 50 GB Windows NT (NTFS) Linux (ext2) 98 GB AIX (JFS) 10 GB Actual Allocations 60 GB40 GB 100 GB Common Storage Pool

6 Texas A&M University Narasimha Reddy 5/1/2008 Our Approach to Design Physical DiskPhysical block address  Employ Allocate-on-write policy –Storage space is allocated when the data is written –Writes all data to disk sequentially based on the time at which data is written to the device –Once data is written, data can be accessed from the same location, i.e., data is updated in-place

7 Texas A&M University Narasimha Reddy 5/1/2008 Allocate-on-write Policy Physical DiskWrite at t = t’ Extent  Storage space is allocated by the unit of the extent when the data is written  Extent is a group of file system blocks –Fixed size –Retain more spatial locality –Reduce information that must be maintained

8 Texas A&M University Narasimha Reddy 5/1/2008 Allocate-on-write Policy Physical Disk Extent 0 Extent 1 Write at t = t’ Write at t = t’’ (where t’’ > t’)  Data is written to disk sequentially based on write-time –Further writes to the same data updated in-place –VA (Virtual Allocation) requires additional data structure

9 Texas A&M University Narasimha Reddy 5/1/2008 Block Map Physical Disk Extent 0 Extent 1 Write at t = t’ Write at t = t’’ (where t’’ > t’) Extent 2 Block map  Block map keeps a mapping of logical storage locations and real (physical) storage locations

10 Texas A&M University Narasimha Reddy 5/1/2008 VA Metadata Physical Disk Extent 0 Extent 1 Extent 2 Block map VA Meta data Hardening  This block map is maintained in memory and regularly written to disk for hardening against system failures  VA Metadata represents the on-disk block map

11 Texas A&M University Narasimha Reddy 5/1/2008 On-disk Layout & Storage Expansion Physical Disk FS Meta data Extent 1 Extent 2 VA Meta data Extent 0 Virtual Disk Extent 3 Extent 4 Extent 5 Extent 6 Extent 7 Storage Expansion Threshold Storage Expansion  When the capacity is exhausted or reaches storage expansion threshold, a physical disk can be expanded to other available storage resources –File system unaware of the actual space allocation and expansion

12 Texas A&M University Narasimha Reddy 5/1/2008 Write Operation Application Write Request File System Buffer/Page Cache Layer Page Acknowledgement Allocate new extent and update mapping information Block I/O Layer (VA) Search VA block map Extent 3 FS Meta data Extent 1 Extent 2 VA Meta data Extent 0 Disk Hardening

13 Texas A&M University Narasimha Reddy 5/1/2008 Read Operation Application Read Request File System Buffer/Page Cache Layer Block I/O Layer (VA) Search VA block map Extent 3 FS Meta data Extent 1 Extent 2 VA Meta data Extent 0 Disk

14 Texas A&M University Narasimha Reddy 5/1/2008 Allocate-on-write vs. Other Work  Key difference from log-structured file systems (LFS) –Only allocation is done at the end of log –Updates are done in-place after allocation  LVM still ties up storage at the time of file system creation

15 Texas A&M University Narasimha Reddy 5/1/2008 Design Issues  Extent-based Policy Example (with Ext2) – I (inode), B (data block), V (VA block map) – A  B (B is allocated to A)  File system-based Policy Example (with Ext3 ordered mode)  VA Metadata Hardening (File System Integrity) –Must keep certain update ordering of VA metadata and FS (meta)data

16 Texas A&M University Narasimha Reddy 5/1/2008 Design Issues (cont.)  Extent Size –Larger extent size: Reduce block map size, retain more spatial locality, cause data fragmentation  Reclaiming allocated storage space of deleted files –Needed to continue to provide the benefits of virtual allocation –Without reclamation, possible to turn virtual allocation into static allocation  Interaction with RAID –RAID remaps blocks to physical devices to provide device characteristics –VA remaps blocks for flexibility –Need to resolve performance impact of VA ’ s extent size and RAID ’ s chunk size

17 Texas A&M University Narasimha Reddy 5/1/2008 Spatial Locality Observations & Issues  Metadata and data separation  Data clustering: Reduce seek distance  Multiple file systems  Data placement policy –Allocate hot data in a high data region of disk –Allocate hot data in the middle of the partition

18 Texas A&M University Narasimha Reddy 5/1/2008 Implementation & Experimental Setup  Virtual allocation prototype –Kernel module for Linux –Employ a hash table in memory for speeding up VA lookups  Setup –A 3GHz Pentium 4 processor, 1GB main memory –Red Hat Linux 9 with a kernel –Ext2 file system and Ext3 file system  Workloads –Bonnie++ (Large-file workload) –Postmark (Small-file workload) –TPC-C (Database workload)

19 Texas A&M University Narasimha Reddy 5/1/2008 VA Metadata Hardening  Compare EXT2 and VA-EXT2-EX  Compare EXT3 and VA-EXT3-EX, VA-EXT3-FS

20 Texas A&M University Narasimha Reddy 5/1/2008 Reclaiming Allocated Storage Space  Reclaim operation for deleted large files  How to keep track of deleted files? –Employed stackable file system: Maintain duplicated block bitmap –Alternatively, could employ “Life or Death at Block-Level” (OSDI’04) work

21 Texas A&M University Narasimha Reddy 5/1/2008 VA with RAID-5  Large-file workload  Small-file workload  Large-file workload with NVRAM  Used Ext2 with software RAID-5 + VA  NVRAM- X %: X % of total VA metadata size VA-RAID-5 NO-HARDEN VA-RAID-5 NVRAM-17% VA-RAID-5 NVRAM-4% VA-RAID-5 NVRAM-1%

22 Texas A&M University Narasimha Reddy 5/1/2008 Data Placement Policy (Postmark)  VA NORMAL partition: Same data rate across a partition  VA ZCAV partition: Hot data is placed in high data region of a partition  VA-NORMAL: start allocation from the outer cylinders  VA-MIDDLE: start allocation from the middle of a partition

23 Texas A&M University Narasimha Reddy 5/1/2008 Multiple File Systems  VA-7GB: 2 x 3.5GB partition, 30% utilization  VA-32GB: 2 x 16GB partition, 80% utilization  Used Postmark  VA-HALF: The 2 nd file system is created after 40% of the 1 st file system is written  VA-FULL: 80%

24 Texas A&M University Narasimha Reddy 5/1/2008 Real-World Deployment of Virtual Allocation Prototype built

25 Texas A&M University Narasimha Reddy 5/1/2008 VA in Networked Storage Environment  Flexible allocation provided by VA leads to –Balancing locality vs. load balance issues

26 Texas A&M University Narasimha Reddy 5/1/2008 Part II: Data Distribution  Locality-based approach –Use data migration (e.g. HP AutoRAID) –Employ “ hot ” data migration from slower device (remote disk) to faster device (local disk)  Load balancing-based approach (Striping) –Exploit multiple devices to support the required data rates (e.g. Slice- OSDI ’ 00) Hot data Cold data

27 Texas A&M University Narasimha Reddy 5/1/2008 User-Optimal Data Migration data  Locality is exploited first –Data is migrated from Disk B to Disk A  Load balancing is also considered –If the load on Disk A is too high, data is migrated from Disk A to Disk B

28 Texas A&M University Narasimha Reddy 5/1/2008 Migration Decision Issues data  Where to migrate: Use I/O request response time  When to migrate: Migration threshold –Initiate migration from Disk A to Disk B only when  How to migrate: Limit number of concurrent migrations (Migration token)  What data to migrate: Active data write read write

29 Texas A&M University Narasimha Reddy 5/1/2008 Design Issues  Allocation policy –Striping with user-optimal migration: will improve data access locality –Sequential allocation with user-optimal migration: will improve load balancing  Multi-user environment –Each user migrates data in a user-selfish manner –Migrations will tend to improve the performance of all users over longer periods of time

30 Texas A&M University Narasimha Reddy 5/1/2008 Evaluation  Implemented as a kernel block device driver  Evaluated it using SPECsfs benchmark  Configuration  SPECsfs Performance Curve Single-User Multi-User

31 Texas A&M University Narasimha Reddy 5/1/2008 Single-User Environment  Striping with user-optimal migration  Seq. allocation with user- optimal migration  Configuration: (Allocation Policy)-(Migration Policy) –STR (Striping), SEQ (Seq. Alloc.), NOMIG (No migration), MIG (User-Optimal migration)

32 Texas A&M University Narasimha Reddy 5/1/2008 Single-User Environment (cont.)  Comparison between migration systems –Migration based on locality: hot data (remote  local), cold data (local  remote)

33 Texas A&M University Narasimha Reddy 5/1/2008 Multi-User Environment - Striping  Server A: Load from 100 to 700  Server B: Load from 50 to 350

34 Texas A&M University Narasimha Reddy 5/1/2008 Multi-User Environment – Seq. Allocation  Server A: Load from 100 to 1100  Server B: Load from 30 to 480

35 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Storage Management Across Diverse Devices  Flash storage becoming widely available –More expensive than hard drives –Faster random accesses –Low Power consumption  In Laptops now  In hybrid storage systems soon  Manage data across Different Devices –Match application needs to device characteristics –Optimize for performance, power consumption

36 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Motivation  VFS Allows many file systems underneath  VFS maintains 1 to 1 mapping from namespace to storage  Can we provide different storage options for different files for a single user? –/user1/file1  storage system 1, /user2/file2  storage system 2…

37 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Normal File System Architecture

38 Texas A&M University Narasimha Reddy 5/1/2008 Umbrella File System

39 Texas A&M University Narasimha Reddy 5/1/2008 Example Data Organization

40 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Motivation --Policy Based Storage  User or System administrator Choice –Allow different types of files on different devices –Reliability, performance, power consumption  Layered Architecture –Leverage benefits of underlying file systems –Map applications to file systems and underlying storage  Policy decisions can depend on namespace and metadata –Example: Files not touched in a week  slow storage system

41 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Rules Structure  Provided at mount time  User specified  Based on inode values (metadata) and filenames (namespace)  Provides array of branches

42 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Umbrella File System  Sits under VFS to enforce policy  Policy enforced at open and close times  Policy also enforced periodically (less often)  UmbrellaFS acts as a “router” for files –Not only based on namespace, but also metadata

43 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Inode Rules Structure RuleInode/ Filename FieldMatchValueBranch 1Inodefile permissions=Read Only/fs1, /fs2 2Filenamen/a 3Inodefile creation time>=8:00 am, August 3 rd, 2007 /fs2 4Inodefile length<20 KB/fs3 …

44 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Inode Rules  Provide in order of precedence  First match  Compare inode value to rule –At file creation some inode values indeterminate –Pass over those rules

45 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Filename Rules Structure RuleMatch StringBranch 1/*.avi/fs2,/fs1 2/home/*.txt/fs1 3/home/jgarrison/*/fs3 …

46 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Filename Rules  Once first filename rule triggered, all checked  Similar to longest prefix matching  Double index based on –Path matching –Filename matching  Example: –Rules: /home/*/*.bar, /home/jgarrison/foo.bar –File: /home/jgarrison/foo.bar –File matches second rule more closely (3 path length and 7 characters of file name vs. 3 path length and 4 characters of file name)

47 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Evaluation  Overhead –Throughput –CPU Limited –I/O Limited  Example Improvement

48 Texas A&M University Narasimha Reddy 5/1/2008 UmbrellaFS Overhead

49 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 CPU Limited Benchmarks

50 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 I/O Limited Benchmarks

51 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Flash vs. RAID5 Read Performance

52 Texas A&M University Narasimha Reddy 5/1/2008 Flash vs. RAID5 Write Performance

53 Texas A&M University Narasimha Reddy 5/1/ Texas A&M University Narasimha Reddy 8/7/2007 Flash and Disk Hybrid System

54 Texas A&M University Narasimha Reddy 5/1/2008 Disks with Encryption hardware

55 Texas A&M University Narasimha Reddy 5/1/2008 Conclusion  Virtual allocation allows Flexibility –Improve the flexibility of managing storage across multiple file systems/platforms  Enabled user-optimal migration –Balance disk access locality and load balance automatically and transparently –Adapt to changes of workloads and loads in each storage device  Policy-based storage: Umbrella File System –Allows matching application characteristics to devices