DEFER Cache – an Implementation Sudhindra Rao and Shalaka Prabhu Thesis Defense Master of Science Department of ECECS OSCAR Lab
DEFER Cache2 Overview Related Work DEFER Cache Architecture Implementation – Motivation and Challenges Results and Analysis Conclusion Future Work
DEFER Cache3 Overview Accessing remote memory is faster than accessing local disks; hence co-operative caching Current schemes - fast reads, slow writes Goal: Combine replication with logging for fast, reliable write- back to improve write performance DCD and RAPID already do something similar New architecture: Distributed, Efficient and Reliable (DEFER) Co-operative Cache DEFER Cache Duplication & Logging Co-operative caching for both read and write requests Vast performance gains (up to 11.5x speedup) due to write-back [9]
DEFER Cache4 Co-operative Caching High Performance, Scalable LAN Slow speed of the file server disks Increasing RAM in the server File Server with 1GB RAM 64 Clients with 64MB RAM = 4GB Cost-Effective Solution Using Remote Memory for Caching 6-12ms for accessing 8KB data from disk Vs 1.05 ms from remote client Highly Scalable But all related work focuses on read performance
DEFER Cache5 Other Related Work N-Chance Forwarding Forwards singlets to remote host on capacity miss Re-circulates N times and then written to server disk Uses write-through cache Co-operative Caching using hints Global Memory System Remote Memory Servers Log Structured Storage Systems LFS, Disk Caching Disk, RAPID NVRAM - not cost effective with current technology What’s DEFER? Improve Write Performance DCD Using Distributed Systems
DEFER Cache6 Log based write mechanism DCD [7] and RAPID [8] implement log based write Improvement in small writes using log Reliability and data availability from the log partition Segment Buffer Data Partition Local disk Remote Cache Memory Log Partition Local Cache DCD like structure of DEFER
DEFER Cache7 Logging algorithm Writing a segment Cache disk RAM Cache FreeLog segmentFree Mapping Table Segment Buffer Write 128KB of LRU data to a cache-disk segment, in one large write Pickup LRU data to capture temporal locality, improve reliability and reduce disk traffic Most data will be overwritten repeatedly..... Cache disk Segment write done RAM Cache Mapping Table Free Free
DEFER Cache8 Garbage Collection Data is written into the cache-disk continuously Cache-disk will fill eventually – log writes Most of data in the cache-disk is “garbage” caused by data overwriting Need to clean the garbage to make free log disk Log disk on client Log disk on client Before garbage collection After garbage collection
DEFER Cache9 DEFER Cache Architecture Typical distributed system (client-server) Applications run on workstations (clients) and access files from the Server Local disks on clients only for booting, swapping and logging Local RAM divided into I/O cache and segment buffer Local disk has corresponding log partition
DEFER Cache10 DEFER Cache Algorithms DEFER is DCD distributed over the network Best of co-operative caching and logging Reads handled exactly as in N-chance Forwarding Writes are immediately duplicated and eventually logged after a pre-determined time interval M Dirty singlets are forwarded like N-chance Three logging strategies used: Server Logging Client Logging Peer Logging
DEFER Cache11 Server Logging Update Server Table. Free the Lock. Client 1 Client n W Request Invalidate Log Disk Segment Buffer 4. Send a Copy to Server. Cache. 1. Lock Ownership Request 3. Lock Ownership Granted Server Client copies block to server cache on a write Server Table maintains consistency Invalidates clients and Logs the contents of segment buffer Increased load on the server due to logging
DEFER Cache12 Client Logging Advantage: Server load is decreased Disadvantage: Availability of the block is affected Client 2 Client n.... Free the Lock. Update Server Table Invalidate Log Disk Segment Buffer W 6. Logging Complete. Remove the dirty blocks sent by Client 1 from the server cache. 4. Copy Data to Server Cache 1. Lock Ownership Request 3. Lock Ownership Granted 5. After ‘M’ seconds Server Client 1
DEFER Cache13 Peer Logging Client 1 Each workstation is assigned a peer – peer performs logging Advantage: Reduces server load without compromising availability, Log Disk Segment Buffer Client 2 Client n W Request Update Server Table. Send Invalidate Invalidate 5. After ‘M’ seconds 1. Lock Ownership Request 3. Lock Ownership Granted 4. Send a Copy of the Block to the Peer 4. Update Server Table n-5n Peer mapping
DEFER Cache14 Reliability Every M seconds, blocks that were modified within the last M to 2M seconds are logged Thus, for M = 15, we guarantee that data modified within the last 30 seconds is written to disk Most UNIX systems use a delayed write-back of 30 seconds M can be reduced, to increase frequency of logging, without introducing high overhead With DEFER, blocks are logged and duplicated
DEFER Cache15 Crash Recovery Peer Logging Recovery algorithm works on the on-log-disk version of data In-memory and on-log-disk copy are in different hosts Find the blocks that were updated by the crashed client and the peer information Server initiates recovery of the blocks from the peer
DEFER Cache16 Simulation Results Simulation [9] using disksim – synthetic and real world traces
DEFER Cache17 Real Workloads Snake – Peer Logging 4.9x; Server and Client Logging 4.4x Cello – Peer Logging 8.5x; Server and Client Logging 7.3x
DEFER Cache Implementation
DEFER Cache19 DEFER Cache Architecture write to remote cache send to peer/server receive from peer/server Segment 1 Segment 2 logging to segment buffer Server Queue Defer Server write to remote cache send to peer/server receive from peer/server local Cache Remote Cache Segment 1 Segment 2 logging to segment buffer Server Queue Defer_client local Cache Remote Cache
DEFER Cache20 DEFER Cache design Follow the design principles Use only commodity hardware that is available in typical systems. Avoid storage media dependencies such as use of only SCSI or only IDE disks. Keep the data structures and mechanisms simple. Support reliable persistence semantics. Separate mechanisms and policies.
DEFER Cache21 Implementation Implementation with Linux – open source Implemented client logging as a device driver or a library – No change to the system kernel or application code Uses linux device drivers to create a custom block device attached to the network device – provides system call overload using loadable kernel module Network device uses Reliable UDP to ensure fast and reliable data transfer Also provides a library for testing and implementing on non-linux systems – provides system call overload Alternative approach using NVRAM under test
DEFER Cache22 DEFER as a module unregister_capability() register_capability() printk, add_to_request_queue, ioctl, generic_make_request, send/recv, ll_rw_blk init_module() cleanup_module() read, write, open, close, M-sec algorithm garbage collect, call nbd Data Defer_module Kernel proper Kernel functions Network device
DEFER Cache23 Data Management Plugs into the OS as a custom block device that contains memory and a disk Disk managed independent of the OS request_queue intercepted to redirect to Defer module Read/Write override with Defer read/write Interfaces with the Network device to transfer data Interfaces with the kernel by registering special capabilities – logging, de-stage, garbage collection, data recovery on crash
DEFER Cache24 DEFER Cache - Implementation Simulation results present a 11.5x speedup DEFER Cache implemented in real time system to support the simulation results. Multi-hierarchy cache structure can be implemented at Application level File System level Layered device driver Controller level Kernel device driver selected as it achieves efficiency and flexibility.
DEFER Cache25 Implementation Design Implementation derived from DCD implementation. DEFER Cache can be considered as a DCD over a distributed system. Implementation design consists of three modules Data management Implements the caching activities on the local machine. Network interface Implements the network transfer of blocks to/from server/client. Coordinating daemons Coordinates the activities of the above mentioned two modules.
DEFER Cache26 Data Management Custom block device driver developed and plugged into the kernel during execution. Driver modified according to DEFER Cache design. Request function of the device driver modified. Read/Write for RAM replaced by DEFER Cache read/write call.
DEFER Cache27 Network Interface Implemented as a network block device (NBD) driver. NBD simulates a block device on the local client, but connects to a remote machine which actually hosts the data. Local disk representation for a remote client. Can be mounted and accessed as a normal block device. All read/write requests transferred over the network to the remote machine. Consists of three parts NBD client NBD driver NBD server
DEFER Cache28 NBD – Design NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space
DEFER Cache29 NBD – Client NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space
DEFER Cache30 NBD – Driver NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space
DEFER Cache31 NBD – Driver NBD Client init_module() ioctl() transmit() request() NBD DriverKernel register_blkdev() blk_init_queue() Default Queue NBD Server User Space Kernel Space
DEFER Cache32 Linux Device Driver Issues Successfully implemented Linux device drivers for Data Management and Network Interface module. Could not be thoroughly tested and validated. Poses following problems Clustering of I/O requests by kernel Kernel memory corruption Synchronization problem No specific debugging tool
DEFER Cache33 User-mode Implementation Implementation of DEFER Cache switched to User-mode. Advantages High flexibility. All data can be manipulated by user according to requirements. Easier to design and debug. Good design can improve the performance. Disadvantages Response time is slower – worse if data is swapped
DEFER Cache34 User-Mode Design Simulates drivers in the user-mode. All data structures used by device drivers duplicated in the user-space. Use raw disk. 32MB buffer space allocated for DEFER Cache in RAM. Emulates I/O buffer cache.
DEFER Cache35 DEFER Server - Implementation Governs the entire cluster of workstation. Maintains it’s own I/O cache and a directory table. Server directory table maintains the consistency in the system. server-client handshake performed on every write update. Server directory table entry reflects the last writer. Used for garbage collection and data recovery.
DEFER Cache36 Initial Testing Basic idea : Accessing remote data faster than accessing data on local disk. Is LAN speed faster than disk access speed? As UDP used as network protocol, UDP transfer delay measured. Underlying network - 100Mbps Ethernet network. Use UDP monitor program.
DEFER Cache37 UDP monitor - Results Effect varying Response message size on Response time
DEFER Cache38 Benchmark Program Developed in-house benchmark program. Generates requests using a history table. Generates temporal locality and spatial locality. Runs on each workstation Following parameter can be modified at runtime Working set size Client cache size Server cache size Block size Correlation factor (c)
DEFER Cache39 Results (working set size) Result of varying file size on Bandwidth (c=1)
DEFER Cache40 Results (small writes) Effect of Small write
DEFER Cache41 Results (Response time for small writes) Result of varying file size on Response time (c=1)
DEFER Cache42 Results (Response time for sharing data) Result of varying file size on Response time (c=0.75)
DEFER Cache43 Results (Varying client cache size) Result of varying Client Cache Size on Bandwidth
DEFER Cache44 Results (varying server cache size) Result of varying Server Cache Size on Bandwidth
DEFER Cache45 Results (latency) Latency comparison of DEFER Cache and Baseline System
DEFER Cache46 Results (Delay measurements) Delay comparison of DEFER Cache and Baseline System
DEFER Cache47 Results (Execution time) Execution time for DEFER Cache and Baseline system
DEFER Cache48 Conclusions Improves write performance for cooperative caching. Reduces small write penalty. Ensures reliability and data availability Improves overall File system performance.
DEFER Cache49 Future Work Improve user-level implementation. Extend kernel-level functionality to user-level. Intercept system level calls and modify them to implement DEFER read/write calls. Kernel-Level Implementation. Successfully implement DEFER Cache at kernel level and plug-in with kernel.
DEFER Cache50 Thank you