Www.hdfgroup.org The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.

Slides:



Advertisements
Similar presentations
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Advertisements

Configuration Management
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 10: File-System Interface.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 7: Advanced File System Management.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
Threads© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 7: Advanced File System Management.
Distributed File System: Design Comparisons II Pei Cao.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 7: Advanced File System Management.
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 7: Advanced File System Management.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group Virtual Object Layer in HDF5 Exploring new HDF5 concepts May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
May 30-31, 2012 HDF5 Workshop at PSI May Single Writer / Multiple Reader (SWMR) Dana Robinson The HDF Group Efficient Use of HDF5 With High Data.
Page 110/19/2015 CSE 30341: Operating Systems Principles Chapter 10: File-System Interface  Objectives:  To explain the function of file systems  To.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
4061 Session 23 (4/10). Today Reader/Writer Locks and Semaphores Lock Files.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
Processes and Virtual Memory
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
The HDF Group 10/17/151 HDF5 Tools Tutorial ICALEPCS 2015.
The HDF Group New Elements and Lessons Learned for New Mission HDF5 Products Ideas for new mission HDF5 data products 1July 8, 2013 Larry.
May 30-31, 2012 HDF5 Workshop at PSI May Metadata Journaling Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
Lecture 9 Page 1 CS 111 Online Deadlock Prevention Deadlock avoidance tries to ensure no lock ever causes deadlock Deadlock prevention tries to assure.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Month Day(s), Year Event Title and Customer Name Single Writer / Multiple Reader (SWMR) Dana Robinson The HDF Group Efficient Use of HDF5 With High Data.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Does the Optimistic Concurrency resolve your blocking problems Margarita Naumova, SQL Master Academy.
Import existing part with drawing
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
HDF and HDF-EOS Workshop XII
Chapter 13: File-System Interface
Single Writer/Multiple Reader (SWMR)
HDF5 Metadata and Page Buffering
Moving applications to HDF
Chapter 15: File System Internals
Presentation transcript:

The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15

SWMR Outline Introduction Current implementation SWMR programming model File locking under SWMR SWMR demo 2

INTRODUCTION 3 10/17/15 ICALPECS 2015

Basic Idea

Concurrent read access to HDF5 files 10/17/155 HDF5 File WriterReader … which can be read by a reader… with no IPC necessary. New data elements … … are added to a dataset in the file…

SWMR Approach All communication between processes must be performed via the HDF5 file. An HDF5 file under SWMR access must reside on the a system that complies with POSIX write() semantics. 10/17/156

The Challenge 10/17/157 HDF5 File Writer Reader The basic engineering challenge is to ensure that the readers always see a coherent (though possibly not up to date) HDF5 file. Data

HDF5 Writer State 10/17/158 Writer Process Writer State Metadata Cache Physical File

HDF5 Reader State 10/17/159 Reader Process Writer State Cache File Reader State A reader process can only see the state contained in the physical file.

HDF5 State Badness 10/17/1510 Reader Process So how do we address* this? :’( * pun very much intended

Preventing File Address Badness 10/17/1511 Physical File (2) metadata item 1 metadata item 2 address of metadata item 2 Most importantly, internal file pointers in the physical file must never point to invalid (unflushed, etc.) file addresses.

Preventing File Address Badness 10/17/1512 Most importantly, internal file pointers in the physical file must never point to invalid (unflushed, etc.) file addresses. Physical File 1 1 (2) BAD

Metadata Flush Dependencies (2) Suppose we have a metadata item (parent) which refers to another metadata item (child) in the file. metadata item 1 (parent) metadata item 2 (child) address of metadata item 2

Metadata Flush Dependencies (3) If we add a new child item to the file and update the reference in the parent, we have to be careful about the order in which the metadata is flushed out of the cache. parent new child address of new child 3 3 old child

HDF5 File WriterReader 1 1 (3) garbage? If the parent is flushed before the new child, the reader may attempt to load the unflushed child from the disk, creating an invalid state. BAD 15 Metadata Flush Dependencies

HDF5 File WriterReader 1 1 (3) (2) 2 2 If the new child metadata item is flushed before the updated parent item, the reader will not be fully up to date, but will still be consistent. 3 3 OK 16 Metadata Flush Dependencies

HDF5 File WriterReader 1 1 (3) (2) 2 2 Solution: HDF5 implements flush dependencies in the internal data structures to ensure that metadata cache flush operations occur in the proper order. 3 3 OK 17 Metadata Flush Dependencies

Data access to file being written 10/17/1518 Implemented for raw data “append only” scenario No creation or deletion of the datasets, groups, and attributes is allowed at this time Works on GPFS, Lustre, Linux Ext3, Ext4, FreeBSD USF2, OS X HDFS+ Does not work on NFS or Samba Documentation Source ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/SWMR/ ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/SWMR/ Testers are needed!

SWMR PROGRAMMING MODEL 10/17/1519 ICALPECS 2015

Setting SWMR writer 10/17/1520 Precondition Create a file with the latest file format; close the file. Writer Call H5Fopen using the H5F_ACC_SWMR_WRITE flag. Start writing datasets. Periodically flush data. or Writer Call H5Fcreate using the latest file format flag. Create groups, datasets; add attributes and close attributes. Call H5Fstart_swmr_write to start SWMR access to the file. Periodically flush data.

Setting SWMR reader 10/17/1521 Reader Call H5Fopen using the H5F_ACC_SWMR_READ flag. Poll, checking the size of the dataset to see if there is new data available for reading. Read new data, if any. Side affect of SWMR access Fault tolerance

Example of SWMR writer 10/17/1522 //Create the file using the latest file format property as shown fapl = H5Pcreate(H5P_FILE_ACCESS); H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); fid = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl); // Create file objects such as datasets and groups. // Close attributes and named datatypes objects. Groups and // datasets may remain open before starting SWMR access to // them. // Start SWMR access the file status = H5Fstart_swmr_write(fid); // Reopen datasets and start writing H5Dwrite(dset_id, …); H5Dflush(dset_id); // periodically to flush the data for a particular dataset.

Example of SWMR reader 10/17/1523 // Open the file using SWMR read flag fid = H5Fopen(filename, H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT); // Open the dataset, poll dimensions, read new data and refresh; repeat. dset_id = H5Dopen(…); space_id = H5Dget_space(…); while (…) { H5Dread(…); // read if any new data arrives H5Drefresh; H5Dget_space(…); }

CONTROLLING SWMR ACCESS 10/17/1524 ICALPECS 2015

APIs for controlling SWMR writing and reading 10/17/1525 Application can control when data is visible using data flushing and refreshing: H5Dflush – flushes all buffers associated with a dataset H5Drefresh – clear the buffers and reload from the disk Application can control MDC flushing of an object: H5Odisable_mdc_flushes H5Oenable_mdc_flushes

APIs for controlling SWMR writing 10/17/1526 H5DOappend to append data to a dataset Extends dataspace and writes new elements APIs to control flush behavior when append reaches a specified boundary H5Pget(set)_append_flush() for a dataset access property list Calls the specified callback function Flushes the dataset H5Pget(set)_object_flush_cb() for a file access property list Sets a callback function to invoke when a object flush occurs in the files

H5WATCH AND OTHER TOOLS 10/17/1527 ICALPECS 2015

h5watch 10/17/1528 Allows to monitor the growth of a dataset Prints new elements whenever the application extends the size and adds data For compound datasets prints data for specified fields Example: h5watch --help h5watch --polling=5./f.h5/g/ds

Other command-line tools 10/17/1529 We plan to make h5dump and h5ls SWMR enabled The rest of the tools will exit gracefully reporting that the file is under construction h5diff, h5repack, h5copy, h5jam, etc.

FILE LOCKING UNDER SWMR 10/17/1530 ICALPECS 2015

Concurrent Access to HDF5 file 10/17/1531 The HDF5 library will employ two means to regulate access to HDF5 files: File locking API calls to apply or remove an advisory lock on an open file. Setting a flag in the file’s superblock to mark the file as open for writing.

Concurrent Access to HDF5 file 10/17/1532 File locking API calls to apply or remove an advisory lock on an open file. Files will be locked during the H5Fopen() or H5Fcreate() call. Locks can be shared (read) or exclusive (write). Locks will lock the entire file, not regions in the file. When non-blocking lock calls are available, locks will not block. Locks will be released automatically when the file closes. Alternatively, the user can unlock the file using the system's unlock call, however care will have to be taken to match the HDF5 library's file locking scheme.

Concurrent Access to HDF5 file 10/17/1533 Setting a flag in the file’s superblock to mark the file as open for writing. The library will mark the file when opened for writing based on file open access flags. This will happen for both SWMR and non-SWMR reading. This marking ensures file consistency for concurrent accesses. The library will clear the flag when the file closes. Only understandable by HDF x (file format change)

Writer Actions 10/17/1534 When a writer process creates/opens a file without SWMR: Place an exclusive lock on the file—the file will remain locked until it closes. Ensure the file's superblock is not already marked for writing or SWMR writing mode. Mark the file's superblock for writing mode. When a writer process creates/opens a file with SWMR write access: Place an exclusive lock on the file. Ensure the file's superblock is not already marked for writing or SWMR writing mode. Mark the file for writing and SWMR writing mode. Release the lock before returning from H5Fopen/H5Fcreate.

Reader Actions 10/17/1535 When a reader process opens a file without SWMR: Place a shared lock on the file. Ensure the file is not already marked for writing or SWMR writing mode. When a reader process opens a file with SWMR read: Place a shared lock on the file. Ensure the file is marked in writing and SWMR writing mode

SWMR Compatibility Matrix 10/17/1536

SWMR Compatibility Matrix 10/17/1537

Is an HDF5 file under SWMR access? 10/17/1538 We will provide APIs to get information on a file access under SWMR: Does H5Fopen fail because of the existing file lock? H5LTcheck_lock_error (under implementation) When H5Fopen succeeds, is a file accessed by a SWMR writer? TBD

Demo 10/17/1539 HDF5 provides some tests you may try; see SWMR UG, section 6. We will be using test/use_append_chunk to write 3D dataset by planes (chunks 1x2056x256). Use h5watch to see data coming Interrupt use_append_chunk Use h5clear tool to clear the flags Use h5dump to see data chunksize

The HDF Group Thank You! Questions? 4010/17/15