Www.hdfgroup.org The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.

Slides:



Advertisements
Similar presentations
Operating Systems Part III: Process Management (Process Synchronization)
Advertisements

Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
8a-1 Programming with Shared Memory Threads Accessing shared data Critical sections ITCS4145/5145, Parallel Programming B. Wilkinson Jan 4, 2013 slides8a.ppt.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Computer Systems/Operating Systems - Class 8
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
Threading Part 4 CS221 – 4/27/09. The Final Date: 5/7 Time: 6pm Duration: 1hr 50mins Location: EPS 103 Bring: 1 sheet of paper, filled both sides with.
1 Threads CSCE 351: Operating System Kernels Witawas Srisa-an Chapter 4-5.
Threads Load new page Page is loading Browser still responds to user (can read pages in other tabs)
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
Deadlock CSCI 444/544 Operating Systems Fall 2008.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Chapter 6 – Concurrent Programming Outline 6.1 Introduction 6.2Monitors 6.2.1Condition Variables 6.2.2Simple Resource Allocation with Monitors 6.2.3Monitor.
PRASHANTHI NARAYAN NETTEM.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
1 Advanced Computer Programming Concurrency Multithreaded Programs Copyright © Texas Education Agency, 2013.
Understanding Operating Systems Fifth Edition Chapter 6 Concurrent Processes.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.
Threading and Concurrency Issues ● Creating Threads ● In Java ● Subclassing Thread ● Implementing Runnable ● Synchronization ● Immutable ● Synchronized.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
HDF 1 New Features in HDF Group Revisions HDF and HDF-EOS Workshop IX November 30, 2005.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
Games Development 2 Concurrent Programming CO3301 Week 9.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
Processor Architecture
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Department of Computer Science and Software Engineering
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Concurrency & Dynamic Programming.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Debugging Threaded Applications By Andrew Binstock CMPS Parallel.
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
Threads. Readings r Silberschatz et al : Chapter 4.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Concepts of Multithreading CS 378 – Mobile Computing for iOS Dr. William C. Bulko.
Introduction to Operating Systems Concepts
Chapter 4 – Thread Concepts
Process Management Deadlocks.
Background on the need for Synchronization
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
Chapter 4 – Thread Concepts
Multithreaded Programming in Java
Current status and future work
143a discussion session week 3
Multithreading.
Fast Communication and User Level Parallelism
Concurrency: Mutual Exclusion and Process Synchronization
CSE 153 Design of Operating Systems Winter 19
NT Executive Resources
CS 144 Advanced C++ Programming May 7 Class Meeting
CSC Multiprocessor Programming, Spring, 2011
CS Introduction to Operating Systems
Presentation transcript:

The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1

Outline Introduction Current implementation Paths forward: Improve concurrency Reduce latency Conclusions and Recommendations May 30-31, 2012HDF5 Workshop at PSI2

INTRODUCTION May 30-31, 2012HDF5 Workshop at PSI3

Introduction HDF5 design principles Flexibility Adaptability to new computational environments Current challenges: Multi-threaded applications run on multi-core systems HDF5 thread-safe library cannot support concurrency built into such applications May 30-31, 2012HDF5 Workshop at PSI4

CURRENT IMPLEMENTATION May 30-31, 2012HDF5 Workshop at PSI5

Current Implementation HDF5 uses single global semaphore Controls modification of memory and file data structures: One thread at a time enters the library An application thread enters HDF5 API routine, acquires semaphore Other threads are blocked until the thread completes API call and releases semaphore No simultaneous modifications of data structures that can cause file corruption No race conditions when several threads try to modify a memory data structure May 30-31, 2012HDF5 Workshop at PSI6

Current Implementation Pros: Current implementation provides thread-safety needed to avoid corruption of data structures Cons: No concurrent use of HDF5 library by multi- threaded applications May 30-31, 2012HDF5 Workshop at PSI7

PATHS FORWARD May 30-31, 2012HDF5 Workshop at PSI8

Improving Concurrency Replace single global semaphore with semaphores that guard individual data structures Pros: Greater level of concurrency No corruption of internal data structures Each thread waits only when it needs to modify a data structure locked by another thread Reduces waiting time for a resource to become available May 30-31, 2012HDF5 Workshop at PSI9

Improving Concurrency Cons: Replacing the global semaphore with individual semaphores, locks, etc. requires careful analysis of HDF5 data structures and their interactions 300K lines of C code in library will require 4-6 FTE years of knowledgeable staff Significant future maintenance effort Testing challenges May 30-31, 2012HDF5 Workshop at PSI10

Reducing Latency Reduce waiting time for each thread to acquire global semaphore Reduce time by removing known HDF5 bottlenecks: I/O performance “Compute bound” (CB) operations Datatype conversions Compression and other filters General overhead E.g., structures for storing and accessing chunked datasets and metadata May 30-31, 2012HDF5 Workshop at PSI11

Reducing Latency: I/O Performance Support asynchronous I/O (AIO) access to data in HDF5 file AIO initiated within the library in response to an API call Completes in the background after API call has returned Global semaphore is released when API call returns – less waiting time May 30-31, 2012HDF5 Workshop at PSI12

Reducing Latency: CB operations Use multiple threads within HDF5 library to Perform datatype conversion Perform compression on one chunk Multiple threads work on one chunk Perform compression on many chunks Each thread works on a chunk May 30-31, 2012HDF5 Workshop at PSI13

Reducing Latency: General Optimizations Traditional optimizations Some examples: Algorithm improvements for handling Chunk cache Hyperslab selections Memory usage Data structure improvements Chunk indices with O(1) lookup speed Advanced B-tree implementations May 30-31, 2012HDF5 Workshop at PSI14

Reducing Latency Pros: Smaller development effort, ~ 1.5 FTE years Localized changes to the library Easier to maintain Incremental improvements Cons: Still uses global semaphore May 30-31, 2012HDF5 Workshop at PSI15

RECOMMENDATIONS May 30-31, 2012HDF5 Workshop at PSI16

Decision Reduce Latency Decision factors: Available expertise Cost Already funded features: AIO Using multiple threads to compress a chunk Future maintainability May 30-31, 2012HDF5 Workshop at PSI17

Other considerations Approaches are not mutually exclusive Both can be implemented in the future if funding is available May 30-31, 2012HDF5 Workshop at PSI18

The HDF Group Thank You! Questions? May 30-31, 2012HDF5 Workshop at PSI 19