DEFER Cache – an Implementation Sudhindra Rao and Shalaka Prabhu Thesis Defense Master of Science Department of ECECS OSCAR Lab.

Slides:

Advertisements

Similar presentations

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.

CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.

Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.

Memory Management 2010.

CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.

Common System Components

1 I/O Management in Representative Operating Systems.

PRASHANTHI NARAYAN NETTEM.

University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

Case Study - GFS.

File Systems (2). Readings r Silbershatz et al: 11.8.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Network File Systems Victoria Krafft CS /4/05.

File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.

Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.

Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

1 The Google File System Reporter: You-Wei Zhang.

Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.

Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.

Distributed File Systems

RAPID-Cache – A Reliable and Inexpensive Write Cache for Disk I/O Systems Yiming Hu Qing Yang Tycho Nightingale.

CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

Chapter 4 Storage Management (Memory Management).

Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.

Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.

GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.

The Mach System Silberschatz et al Presented By Anjana Venkat.

I/O Software CS 537 – Introduction to Operating Systems.

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

CS 540 Database Management Systems

Module 3: Operating-System Structures

Operating System Structures

File System Implementation

Filesystems 2 Adapted from slides of Hank Levy

Page Replacement.

Chapter 2: System Structures

Chapter 3: Operating-System Structures

Overview Continuation from Monday (File system implementation)

Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.

Chapter 2: Operating-System Structures

Outline Chapter 2 (cont) OS Design OS structure

Chapter 15: File System Internals

Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.

THE GOOGLE FILE SYSTEM.

Distributed Resource Management: Distributed Shared Memory

Chapter 2: Operating-System Structures

Page Cache and Page Writeback

The Design and Implementation of a Log-Structured File System

Presentation transcript:

DEFER Cache – an Implementation Sudhindra Rao and Shalaka Prabhu Thesis Defense Master of Science Department of ECECS OSCAR Lab

DEFER Cache2 Overview Related Work DEFER Cache Architecture Implementation – Motivation and Challenges Results and Analysis Conclusion Future Work

DEFER Cache3 Overview Accessing remote memory is faster than accessing local disks; hence co-operative caching Current schemes - fast reads, slow writes Goal: Combine replication with logging for fast, reliable write- back to improve write performance  DCD and RAPID already do something similar New architecture: Distributed, Efficient and Reliable (DEFER) Co-operative Cache DEFER Cache  Duplication & Logging  Co-operative caching for both read and write requests  Vast performance gains (up to 11.5x speedup) due to write-back [9]

DEFER Cache4 Co-operative Caching High Performance, Scalable LAN  Slow speed of the file server disks Increasing RAM in the server  File Server with 1GB RAM  64 Clients with 64MB RAM = 4GB Cost-Effective Solution  Using Remote Memory for Caching  6-12ms for accessing 8KB data from disk Vs 1.05 ms from remote client  Highly Scalable  But all related work focuses on read performance

DEFER Cache5 Other Related Work N-Chance Forwarding Forwards singlets to remote host on capacity miss Re-circulates N times and then written to server disk Uses write-through cache Co-operative Caching using hints Global Memory System Remote Memory Servers Log Structured Storage Systems  LFS, Disk Caching Disk, RAPID NVRAM - not cost effective with current technology What’s DEFER?  Improve Write Performance  DCD Using Distributed Systems

DEFER Cache6 Log based write mechanism DCD [7] and RAPID [8] implement log based write Improvement in small writes using log Reliability and data availability from the log partition Segment Buffer Data Partition Local disk Remote Cache Memory Log Partition Local Cache DCD like structure of DEFER

DEFER Cache7 Logging algorithm Writing a segment Cache disk RAM Cache FreeLog segmentFree Mapping Table Segment Buffer  Write 128KB of LRU data to a cache-disk segment, in one large write  Pickup LRU data to capture temporal locality, improve reliability and reduce disk traffic  Most data will be overwritten repeatedly..... Cache disk Segment write done RAM Cache Mapping Table Free Free

DEFER Cache8 Garbage Collection Data is written into the cache-disk continuously Cache-disk will fill eventually – log writes Most of data in the cache-disk is “garbage”  caused by data overwriting Need to clean the garbage to make free log disk Log disk on client Log disk on client Before garbage collection After garbage collection

DEFER Cache9 DEFER Cache Architecture Typical distributed system (client-server) Applications run on workstations (clients) and access files from the Server Local disks on clients only for booting, swapping and logging Local RAM divided into I/O cache and segment buffer Local disk has corresponding log partition

DEFER Cache10 DEFER Cache Algorithms DEFER is DCD distributed over the network  Best of co-operative caching and logging Reads handled exactly as in N-chance Forwarding Writes are immediately duplicated and eventually logged after a pre-determined time interval M Dirty singlets are forwarded like N-chance Three logging strategies used:  Server Logging  Client Logging  Peer Logging

DEFER Cache11 Server Logging Update Server Table. Free the Lock. Client 1 Client n W Request Invalidate Log Disk Segment Buffer 4. Send a Copy to Server. Cache. 1. Lock Ownership Request 3. Lock Ownership Granted Server  Client copies block to server cache on a write  Server Table maintains consistency  Invalidates clients and Logs the contents of segment buffer  Increased load on the server due to logging

DEFER Cache12 Client Logging  Advantage: Server load is decreased  Disadvantage: Availability of the block is affected Client 2 Client n.... Free the Lock. Update Server Table Invalidate Log Disk Segment Buffer W 6. Logging Complete. Remove the dirty blocks sent by Client 1 from the server cache. 4. Copy Data to Server Cache 1. Lock Ownership Request 3. Lock Ownership Granted 5. After ‘M’ seconds Server Client 1

DEFER Cache13 Peer Logging Client 1  Each workstation is assigned a peer – peer performs logging  Advantage: Reduces server load without compromising availability, Log Disk Segment Buffer Client 2 Client n W Request Update Server Table. Send Invalidate Invalidate 5. After ‘M’ seconds 1. Lock Ownership Request 3. Lock Ownership Granted 4. Send a Copy of the Block to the Peer 4. Update Server Table n-5n Peer mapping

DEFER Cache14 Reliability Every M seconds, blocks that were modified within the last M to 2M seconds are logged Thus, for M = 15, we guarantee that data modified within the last 30 seconds is written to disk Most UNIX systems use a delayed write-back of 30 seconds M can be reduced, to increase frequency of logging, without introducing high overhead With DEFER, blocks are logged and duplicated

DEFER Cache15 Crash Recovery Peer Logging Recovery algorithm works on the on-log-disk version of data In-memory and on-log-disk copy are in different hosts Find the blocks that were updated by the crashed client and the peer information Server initiates recovery of the blocks from the peer

DEFER Cache16 Simulation Results Simulation [9] using disksim – synthetic and real world traces

DEFER Cache17 Real Workloads Snake – Peer Logging 4.9x; Server and Client Logging 4.4x Cello – Peer Logging 8.5x; Server and Client Logging 7.3x

DEFER Cache Implementation

DEFER Cache19 DEFER Cache Architecture write to remote cache send to peer/server receive from peer/server Segment 1 Segment 2 logging to segment buffer Server Queue Defer Server write to remote cache send to peer/server receive from peer/server local Cache Remote Cache Segment 1 Segment 2 logging to segment buffer Server Queue Defer_client local Cache Remote Cache

DEFER Cache20 DEFER Cache design Follow the design principles  Use only commodity hardware that is available in typical systems.  Avoid storage media dependencies such as use of only SCSI or only IDE disks.  Keep the data structures and mechanisms simple.  Support reliable persistence semantics.  Separate mechanisms and policies.

DEFER Cache21 Implementation Implementation with Linux – open source Implemented client logging as a device driver or a library – No change to the system kernel or application code Uses linux device drivers to create a custom block device attached to the network device – provides system call overload using loadable kernel module Network device uses Reliable UDP to ensure fast and reliable data transfer Also provides a library for testing and implementing on non-linux systems – provides system call overload Alternative approach using NVRAM under test

DEFER Cache22 DEFER as a module unregister_capability() register_capability() printk, add_to_request_queue, ioctl, generic_make_request, send/recv, ll_rw_blk init_module() cleanup_module() read, write, open, close, M-sec algorithm garbage collect, call nbd Data Defer_module Kernel proper Kernel functions Network device

DEFER Cache23 Data Management Plugs into the OS as a custom block device that contains memory and a disk Disk managed independent of the OS request_queue intercepted to redirect to Defer module Read/Write override with Defer read/write Interfaces with the Network device to transfer data Interfaces with the kernel by registering special capabilities – logging, de-stage, garbage collection, data recovery on crash

DEFER Cache24 DEFER Cache - Implementation Simulation results present a 11.5x speedup DEFER Cache implemented in real time system to support the simulation results. Multi-hierarchy cache structure can be implemented at Application level File System level Layered device driver Controller level Kernel device driver selected as it achieves efficiency and flexibility.

DEFER Cache25 Implementation Design Implementation derived from DCD implementation. DEFER Cache can be considered as a DCD over a distributed system. Implementation design consists of three modules Data management  Implements the caching activities on the local machine. Network interface  Implements the network transfer of blocks to/from server/client. Coordinating daemons  Coordinates the activities of the above mentioned two modules.

DEFER Cache26 Data Management Custom block device driver developed and plugged into the kernel during execution. Driver modified according to DEFER Cache design. Request function of the device driver modified. Read/Write for RAM replaced by DEFER Cache read/write call.

DEFER Cache27 Network Interface Implemented as a network block device (NBD) driver. NBD simulates a block device on the local client, but connects to a remote machine which actually hosts the data. Local disk representation for a remote client. Can be mounted and accessed as a normal block device. All read/write requests transferred over the network to the remote machine. Consists of three parts NBD client NBD driver NBD server

DEFER Cache28 NBD – Design NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space

DEFER Cache29 NBD – Client NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space

DEFER Cache30 NBD – Driver NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space

DEFER Cache31 NBD – Driver NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space

DEFER Cache32 Linux Device Driver Issues Successfully implemented Linux device drivers for Data Management and Network Interface module. Could not be thoroughly tested and validated. Poses following problems Clustering of I/O requests by kernel Kernel memory corruption Synchronization problem No specific debugging tool

DEFER Cache33 User-mode Implementation Implementation of DEFER Cache switched to User-mode. Advantages High flexibility. All data can be manipulated by user according to requirements. Easier to design and debug. Good design can improve the performance. Disadvantages Response time is slower – worse if data is swapped

DEFER Cache34 User-Mode Design Simulates drivers in the user-mode. All data structures used by device drivers duplicated in the user-space. Use raw disk. 32MB buffer space allocated for DEFER Cache in RAM. Emulates I/O buffer cache.

DEFER Cache35 DEFER Server - Implementation Governs the entire cluster of workstation. Maintains it’s own I/O cache and a directory table. Server directory table maintains the consistency in the system. server-client handshake performed on every write update. Server directory table entry reflects the last writer. Used for garbage collection and data recovery.

DEFER Cache36 Initial Testing Basic idea : Accessing remote data faster than accessing data on local disk. Is LAN speed faster than disk access speed? As UDP used as network protocol, UDP transfer delay measured. Underlying network - 100Mbps Ethernet network. Use UDP monitor program.

DEFER Cache37 UDP monitor - Results Effect varying Response message size on Response time

DEFER Cache38 Benchmark Program Developed in-house benchmark program. Generates requests using a history table. Generates temporal locality and spatial locality. Runs on each workstation Following parameter can be modified at runtime Working set size Client cache size Server cache size Block size Correlation factor (c)

DEFER Cache39 Results (working set size) Result of varying file size on Bandwidth (c=1)

DEFER Cache40 Results (small writes) Effect of Small write

DEFER Cache41 Results (Response time for small writes) Result of varying file size on Response time (c=1)

DEFER Cache42 Results (Response time for sharing data) Result of varying file size on Response time (c=0.75)

DEFER Cache43 Results (Varying client cache size) Result of varying Client Cache Size on Bandwidth

DEFER Cache44 Results (varying server cache size) Result of varying Server Cache Size on Bandwidth

DEFER Cache45 Results (latency) Latency comparison of DEFER Cache and Baseline System

DEFER Cache46 Results (Delay measurements) Delay comparison of DEFER Cache and Baseline System

DEFER Cache47 Results (Execution time) Execution time for DEFER Cache and Baseline system

DEFER Cache48 Conclusions Improves write performance for cooperative caching. Reduces small write penalty. Ensures reliability and data availability Improves overall File system performance.

DEFER Cache49 Future Work Improve user-level implementation. Extend kernel-level functionality to user-level. Intercept system level calls and modify them to implement DEFER read/write calls. Kernel-Level Implementation. Successfully implement DEFER Cache at kernel level and plug-in with kernel.

DEFER Cache50 Thank you