Single Writer/Multiple Reader (SWMR)

Slides:



Advertisements
Similar presentations
CS140 Review Session Project 4 – File Systems Samir Selman 02/27/09cs140 Review Session Due Thursday March,11.
Advertisements

Configuration Management
Full-Datapath Secure Deletion Sarah Diesburg 1. Overview Problem  Current secure deletion methods do not work State of the art  Optimistic system-wide.
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
Memory Management (II)
Ext3 Journaling File System “absolute consistency of the filesystem in every respect after a reboot, with no loss of existing functionality” chadd williams.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
CS140 Review Session Project 4 – File Systems Varun Arora Based on Vincenzo Di Nicola’s slide 7/16/2015cs140 Review Session1.
1 Securing Network Resources Understanding NTFS Permissions Assigning NTFS Permissions Assigning Special Permissions Copying and Moving Files and Folders.
1 Chapter Overview Creating User and Computer Objects Maintaining User Accounts Creating User Profiles.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
Full-Datapath Secure Data Deletion Sarah Diesburg 5/4/
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
May 30-31, 2012 HDF5 Workshop at PSI May Single Writer / Multiple Reader (SWMR) Dana Robinson The HDF Group Efficient Use of HDF5 With High Data.
Page 110/19/2015 CSE 30341: Operating Systems Principles Chapter 10: File-System Interface  Objectives:  To explain the function of file systems  To.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
Write-through Cache System Policies discussion and A introduction to the system.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
May 30-31, 2012 HDF5 Workshop at PSI May Metadata Journaling Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
CS140 Project 4 Due Thursday March 10th Slides adapted from Samir Selman’s Kiyoshi Shikuma.
Synchronization in Distributed File Systems Advanced Operating System Zhuoli Lin Professor Zhang.
Month Day(s), Year Event Title and Customer Name Single Writer / Multiple Reader (SWMR) Dana Robinson The HDF Group Efficient Use of HDF5 With High Data.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
HDF and HDF-EOS Workshop XII
Doron Orbach UCMDB Product Manager
Translation Lookaside Buffer
Elena Pourmal The HDF Group
Database Recovery Techniques
Database Recovery Techniques
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Jonathan Walpole Computer Science Portland State University
Software Coherence Management on Non-Coherent-Cache Multicores
Transactions and Reliability
SQL and SQL*Plus Interaction
Distributed File Systems
Distributed Shared Memory
HDF5 Metadata and Page Buffering
The University of Adelaide, School of Computer Science
Isolation Levels Understanding Transaction Temper Tantrums
OpenStorage API part II
Multiprocessor Cache Coherency
Lecture 10: Buffer Manager and File Organization
Lesson 6: Protecting, Maintaining and Managing Databases
Topics Introduction to File Input and Output
Consistency Models.
HDF5 Virtual Dataset Elena Pourmal Copyright 2017, The HDF Group.
Moving applications to HDF
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Lecture 10: Consistency Models
Chapter 15: File System Internals
Database Recovery 1 Purpose of Database Recovery
Today: Distributed File Systems
The University of Adelaide, School of Computer Science
THE GOOGLE FILE SYSTEM.
Topics Introduction to File Input and Output
File System Performance
The University of Adelaide, School of Computer Science
Isolation Levels Understanding Transaction Temper Tantrums
Lecture 4: File-System Interface
Lecture 11: Consistency Models
Presentation transcript:

Single Writer/Multiple Reader (SWMR) Copyright 2017, The HDF Group.

SWMR Outline Introduction SWMR programming model File locking under SWMR Copyright © 2015 The HDF Group. All rights reserved.

04/01/16 SWMR Concept Introduction

Data access to file being written New data elements… Writer Reader HDF5 File …which can be read by a reader… with no IPC necessary. No communications between the processes and no file locking are required. The processes can run on the same or on different platforms, as long as they share a common file system that is POSIX compliant. The orderly operation of the metadata cache is crucial to SWMR functioning. A number of APIs have been developed to handle the requests from writer and reader processes and to give applications the control of the metadata cache they might need. … are added to a dataset in the file… Copyright © 2015 The HDF Group. All rights reserved.

The Challenge Data Writer Reader Reader Reader HDF5 File The basic engineering challenge is to ensure that the readers always see a coherent (though possibly not up to date) HDF5 file. Writer Reader Reader Reader HDF5 File

HDF5 Metadata Cache Whenever object is read or written, metadata items (object headers, B-tree nodes, heaps, etc.) associated with the object are placed in the metadata cache Metadata items stay in cache until evicted using Least Recently Used policy Dirty entry that reaches the bottom is flushed and returned to the head of the list Clean entries that reach the bottom of the LRU list are evicted File may not be in consistent state unless all MD items are flushed to the file HDF5 application always sees a consistent file because current MD items are in cache or flushed How one can make HDF5 file always consistent?

Metadata Flush Dependencies Suppose we have a metadata item which refers to another metadata item in the file. metadata item 2 metadata item 1 1 (2) 2 reference to address of metadata item 2 Copyright © 2015 The HDF Group. All rights reserved.

Metadata Flush Dependencies If we add a new metadata item to the file and update the reference to point to it, we have to be careful about the order in which the metadata is flushed out of the cache. metadata item 1 metadata item 2 1 (3) 2 metadata item 3 3 reference to address of new metadata item 3

Metadata Flush Dependencies If the reference-containing item is flushed before the new item, the reader may read the new reference before the item, creating an invalid state. BAD 1 (3) 1 (3) 2 3 garbage? Writer HDF5 File Reader

Metadata Flush Dependencies If the new metadata item is flushed before the reference-containing item, the reader will not be fully up to date, but will still be consistent. OK 1 (2) 1 (3) 2 3 3 Writer HDF5 File Reader

Metadata Flush Dependencies Solution: HDF5 implements flush dependencies in the internal data structures to ensure that metadata cache flush operations occur in the proper order. OK 1 (2) 1 (3) 2 3 3 Writer HDF5 File Reader Copyright © 2015 The HDF Group. All rights reserved.

SWMR Approach All communications between processes are done through the HDF5 file HDF5 file under SWMR access has to reside on the file system that complies with the POSIX write() semantics: Write ordering is preserved http://pubs.opengroup.org/onlinepubs/9699919799/ "After a write() to a regular file has successfully returned: ·         Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. ·         Any subsequent successful write() to the same byte position in the file shall overwrite that file data. " And "Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect thatwrite(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics. Note that this is specified in terms of read() and write(). The XSI extensions readv() and writev() also obey these semantics. A new "high-performance" write analog that did not follow these serialization requirements would also be permitted by this wording. This volume of POSIX.1-2008 is also silent about any effects of application-level caching (such as that done by stdio). Also "This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control." Copyright © 2015 The HDF Group. All rights reserved.

SWMR Implementation Implemented for raw data “append only” scenario No creation or deletion of the datasets, groups, and attributes is allowed at this time Works on GPFS, Lustre, Linux Ext3, Ext4, FreeBSD USF2, OS X HDFS+ Does not work on NFS or Samba Documentation https://support.hdfgroup.org/HDF5/docNewFeatures/ Available in HDF5 1.10.* releases Copyright © 2015 The HDF Group. All rights reserved.

Building and using the feature Don’t build and run tests on NFS Use local directory, GPFS or Lustre To build and install HDF5, run configure <options> make make check make install Follow SWMR Programming Model http://pubs.opengroup.org/onlinepubs/9699919799/ "After a write() to a regular file has successfully returned: ·         Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. ·         Any subsequent successful write() to the same byte position in the file shall overwrite that file data. " And "Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect thatwrite(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics. Note that this is specified in terms of read() and write(). The XSI extensions readv() and writev() also obey these semantics. A new "high-performance" write analog that did not follow these serialization requirements would also be permitted by this wording. This volume of POSIX.1-2008 is also silent about any effects of application-level caching (such as that done by stdio). Also "This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control." Copyright © 2015 The HDF Group. All rights reserved.

SWMR Programming model 04/01/16 SWMR Programming model Copyright © 2015 The HDF Group. All rights reserved.

Setting SWMR Writer Precondition Create a file with the latest file format; close the file. Writer Call H5Fopen using the H5F_ACC_SWMR_WRITE flag. Start writing datasets. or Call H5Fcreate using the latest file format flag. Create groups, datasets; add attributes and close attributes. Call H5Fstart_swmr_write to start SWMR access to the file. Periodically flush data.

Caution Do not add new groups, datasets and attributes! HDF5 Library will not fail, but data may be corrupted We will try to address this in the future releases. http://pubs.opengroup.org/onlinepubs/9699919799/ "After a write() to a regular file has successfully returned: ·         Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. ·         Any subsequent successful write() to the same byte position in the file shall overwrite that file data. " And "Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect thatwrite(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics. Note that this is specified in terms of read() and write(). The XSI extensions readv() and writev() also obey these semantics. A new "high-performance" write analog that did not follow these serialization requirements would also be permitted by this wording. This volume of POSIX.1-2008 is also silent about any effects of application-level caching (such as that done by stdio). Also "This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control." Copyright © 2015 The HDF Group. All rights reserved.

Setting SWMR Reader Reader Call H5Fopen using the H5F_ACC_SWMR_READ flag. Poll, checking the size of the dataset to see if there is new data available for reading. Read new data, if any. Side affect of SWMR access Less chances to get a corrupted file when writer process is killed

Example of SWMR Writer //Create the file using the latest file format property as shown fapl = H5Pcreate(H5P_FILE_ACCESS); H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); fid = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl); // Create file objects such as datasets and groups. // Close attributes and named datatypes objects. Groups and // datasets may remain open before starting SWMR access to // them. // Start SWMR access the file status = H5Fstart_swmr_write(fid); // Reopen datasets and start writing H5Dwrite(dset_id); H5Dflush(dset_id); // periodically to flush the data for a particular dataset.

Example of SWMR Reader // Open the file using SWMR read flag fid = H5Fopen(filename, H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT); // Open the dataset, poll dimensions, read new data and refresh; repeat. dset_id = H5Dopen(…); space_id = H5Dget_space; while (…) { H5Dread(…); // read if any new data arrives H5Drefresh; H5Dget_space(…); }

Controlling SWmr access 04/01/16 Controlling SWmr access Copyright © 2015 The HDF Group. All rights reserved.

APIs for controlling SWMR writing and reading Application can control when data is visible using data flushing and refreshing: H5Dflush – flushes all buffers associated with a daatset H5Drefresh – clear the buffers and reload from the disk Application can control MDC flushing of an object: H5Odisable_mdc_flushes H5Oenable_mdc_flushes Copyright © 2015 The HDF Group. All rights reserved.

APIs for controlling SWMR writing H5DOappend to append data to a dataset Extends dataspace and writes new elements APIs to control flush behavior when append reaches a specified boundary H5Pget(set)_append_flush() for a dataset access property list Calls the specified callback function Flushes the dataset H5Pget(set)_object_flush_cb() for a file access property list Sets a callback function to invoke when a object flush occurs in the files Copyright © 2015 The HDF Group. All rights reserved.

H5watch and other tools 04/01/16 Copyright © 2015 The HDF Group. All rights reserved.

h5watch h5watch --help h5watch --polling=5 ./f.h5/g/ds Allows to monitor the growth of a dataset Prints new elements whenever the application extends the size and adds data For compound datasets prints data for specified fields Example: h5watch --help h5watch --polling=5 ./f.h5/g/ds Copyright © 2015 The HDF Group. All rights reserved.

04/01/16 File locking in HDF5 1.10.0 Copyright © 2015 The HDF Group. All rights reserved.

Concurrent Access to HDF5 file The HDF5 library employs two means to regulate access to HDF5 files: File locking API calls to apply or remove an advisory lock on an open file. Setting a flag in the file’s superblock to mark the file as open for writing.

Concurrent Access to HDF5 file File locking API calls to apply or remove an advisory lock on an open file. Files will be locked during the H5Fopen() or H5Fcreate() call. Locks can be shared (read) or exclusive (write). Locks will lock the entire file, not regions in the file. Locks will be released automatically when the file closes. Note that these will also be used for non-SWMR access as a way to prevent inappropriate file access (e.g., two writers). Copyright © 2015 The HDF Group. All rights reserved.

Concurrent Access to HDF5 file Setting a flag in the file’s superblock to mark the file as open for writing. The library will mark the file when opened for writing based on file open access flags. This will happen for both SWMR and non-SWMR reading. This marking ensures file consistency for concurrent accesses. The library will clear the flag when the file closes.

Writer Actions When a writer process creates/opens a file without SWMR: Place an exclusive lock on the file—the file will remain locked until it closes. Ensure the file's superblock is not already marked for writing or SWMR writing mode. Mark the file's superblock for writing mode. When a writer process creates/opens a file with SWMR write access: Place an exclusive lock on the file. Mark the file for writing and SWMR writing mode. Release the lock before returning from H5Fopen/H5Fcreate.

Reader Actions When a reader process opens a file without SWMR: Place a shared lock on the file. Ensure the file is not already marked for writing or SWMR writing mode. When a reader process opens a file with SWMR read: Ensure the file is marked in writing and SWMR writing mode Copyright © 2015 The HDF Group. All rights reserved.

File locking in HDF5 1.10.1 The feature was introduced to guard against ”unauthorized access’” to the file under construction. Prevent multiple writers to modify a file Prevent readers to access a file under construction in non-SWMR mode. The file locking calls used in HDF5 1.10.0 (including patch1) will fail when the underlying file system does not support file locking or where locks have been disabled. An environment variable named HDF5_USE_FILE_LOCKING can be set to 'FALSE’ to disable locking. It becomes user’s responsibility to avoid problematic access patterns (e.g., multiple writers accessing the same file) Error message was improved to identify the file locking problem.

Backward/forward compatibility issues 04/01/16 HDF5 1.10.0 Backward/forward compatibility issues

Backward/Forward compatibility issues HDF5 1.10.0 will always read files created by the earlier versions HDF5 1.10.0 by default will create files that can be read by HDF5 1.8.* HDF5 1.10.0 will create files incompatible with 1.8 version if new features are used Tools to “downgrade” the file created by HDF5 1.10.0 h5format_convert (SWMR files; doesn’t rewrite raw data) h5repack (VDS, SWMR and other; does rewrite data)

Known issues HDF5 command-line tools h5dump and h5ls are not “SWMR”ized H5DOappend is not atomic

Known limitations SWMR allows only to add new raw data – not new datasets, attributes, groups; extending current design to full SWMR is possible (modulo great complexity of implemntation), to MWMR is questionable. SWMR design cannot be extended to work on NFS or Object Store SWMR is slow and is not a real-time feature (doesn’t guarantee response within specified time constraints) We are looking into new designs based on page buffering feature

Thank You! Questions?