Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell

Slides:



Advertisements
Similar presentations
1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.
Advertisements

Best Practices to handle Attachments. Need for handling attachments: Many businesses run on s. Many s include attachments. Many attachments.
EEE 435 Principles of Operating Systems Interprocess Communication Pt II (Modern Operating Systems 2.3)
Secure Operating Systems Lesson 5: Shared Objects.
Interprocess Communication
Deadlocks, Message Passing Brief refresh from last week Tore Larsen Oct
MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
CMPT 300: Operating Systems Review THIS REIVEW SHOULD NOT BE USED AS PREDICTORS OF THE ACTUAL QUESTIONS APPEARING ON THE FINAL EXAM.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
02/23/2004CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.
Building Secure Software Chapter 9 Race Conditions.
02/17/2010CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.
A New Broadcasting Technique for An Adaptive Hybrid Data Delivery in Wireless Mobile Network Environment JungHwan Oh, Kien A. Hua, and Kiran Prabhakara.
PPA 502 – Program Evaluation Lecture 5b – Collecting Data from Agency Records.
PRASHANTHI NARAYAN NETTEM.
02/19/2007CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.
Deadlocks in Distributed Systems Deadlocks in distributed systems are similar to deadlocks in single processor systems, only worse. –They are harder to.
Distributed Deadlocks and Transaction Recovery.
CSS Sprites. What are sprites? In the early days of video games, memory for graphics was very low. So to make things load quickly and make graphics look.
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Address Policy SIG, APNIC Policy meeting, February 27th, 2003 (1) IPv6 Policy in action Feedback from other RIR communities David Kessens Chairperson RIPE.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Mutual Exclusion.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Host and Callback Tracking in OpenAFS Jeffrey Altman, Secure Endpoints, Inc Derrick Brashear, Sine Nomine Associates.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
OS2014 PROJECT 2 Supplemental Information. Outline Sequence Diagram of Project 2 Kernel Modules Kernel Sockets Work Queues Synchronization.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Update on CORBA Support for Babel RMI Nanbor Wang and Roopa Pundaleeka Tech-X Corporation Boulder, CO Funded by DOE OASCR SBIR.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
 Registry itself is easy and straightforward in implementation  The objects of registry are actually complicated to store and manage  Objects of Registry.
Solutions for the First Quiz COSC 6360 Spring 2014.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
Leaders of Learners: eduphoria! Update September 2012.
CS533 – Spring Jeanie M. Schwenk Experiences and Processes and Monitors with Mesa What is Mesa? “Mesa is a strongly typed, block structured programming.
Consultant Presentation Group B5. Presentation Outline Introduction How to design by Group A5 Future Data Structure Interface Future Conclusion.
Chapter 7 - Interprocess Communication Patterns
CS6502 Operating Systems - Dr. J. Garrido Memory Management – Part 1 Class Will Start Momentarily… Lecture 8b CS6502 Operating Systems Dr. Jose M. Garrido.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
4P13 Week 9 Talking Points
1 Processes and Threads Part II Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
1 Network Communications A Brief Introduction. 2 Network Communications.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Outlook / Exchange Training. Outlook / Exchange: Agenda What Can Microsoft Exchange Do / How works at UST? and Inbox Mailbox Quota Archiving.
Background on the need for Synchronization
Async or Parallel? No they aren’t the same thing!
Chapter 14 User Datagram Program (UDP)
INTER-PROCESS COMMUNICATION
Lecture 11: Mutual Exclusion
Introduction to Operating Systems
Lock Ahead: Shared File Performance Improvements
Recitation 14: Proxy Lab Part 2
Half-Sync/Half-Async (HSHA) and Leader/Followers (LF) Patterns
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Concurrency: Mutual Exclusion and Process Synchronization
Arrays.
CS703 – Advanced Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell

Lock Ahead: Quick Refresher Let user space request LDLM extent locks with an ioctl Allows optimizing for various IO patterns by avoiding unnecessary LDLM lock contention Focused on improving shared file IO performance You were at my LUG talk, right?

High Level Design Uses the same machinery as the existing asynchronous glimpse lock (AGL) implementation Glimpse locks are a special lock type which allows information to be extracted without taking a full lock In particular, glimpse locks on OSTs are for file size AGLs have a lot we need…

High Level Design AGLs: Used by statahead to speculatively gather size information Statahead thread requests AGL locks Notable features: LDLM lock request without a corresponding IO operation Asynchronous: Requesting thread does not wait for reply from server

High Level Design Lock ahead request has no IO to do, so AGL model is a good fit Asynchronous requests are critical to requesting a large number of locks ahead of IO If we had to wait for each lock request, performance gains would be lost Server must not expand lock ahead requests, so a new LDLM flag is added for that

High Level Design: Wrinkles Problems came in three forms: OFD glimpse callback/size checking problems Async lock request handling Race conditions Servers need to be able to get current file size from clients (ofd_intent_{policy,cb}) Exploit the assumption that every write lock is being used for actual IO So the most distant write lock on any object will know the current size, only need to ask that lock about size

OFD Changes Lock ahead violates that assumption: A write extent lock (PW) can exist without a corresponding IO request, so the ‘most distant’ lock may have incorrect size Solution: Starting from most distant lock, glimpse each lock until you find one which has size *inside* the extent of the lock (Thanks, Andreas) Not ideal, but except for lock ahead, there will almost never be a large number of write locks on one object

OFD Changes Err. Oleg felt there was a race condition: In the normal case, the “most distant lock” will not change midstream, because resource is locked & no new locks can be granted In this case, multiple clients can be writing, so while the glimpse callbacks are being sent, a different lock becomes “most distant” active lock Thoughts?

OFD Changes Possible performance problems for lock ahead when writing a large file For example, 100 GB per OST, 1 MB blocks: 100,000 locks per OST That’s a lot of callbacks, also lots of contention in there (Have to allocate lock lists atomicly to avoid deadlocks) Impact TBD – Race conditions have impeded larger tests that would show this problem…

Race Conditions “NEVER sleep in PTLRPCD! NEVER!” – Oleg Drokin Async lock requests are made by ptlrpcd threads (instead of requesting thread sleeping on reply) Ldlm_completion_ast: Can result in sleep. Ldlm_completion_ast_async: Alternate implementation, doesn’t sleep Long story, but the issue was the sleeping Required some other tweaks, will ask about on the mailing list(?), but looks good

Race Conditions LU-1669: Replace write mutex with range lock Now, multiple threads can race LDLM requests on the same object Lock ahead is an easy way to expose these, but most of them apply to normal IO as well IO completes, but unnecessary lock requests are generated

Race Conditions LU-6398: Two processes, P1 and P2 P1 starts a write, generates LDLM lock request P1 waits for reply from server P2 starts a read to same region of the file P2 cannot match lock requested by P1 since it’s still waiting for a reply P2 waits for a reply from server P1 Receives reply, lock is granted on whole file P2 Receives reply, lock is blocked by lock granted to P1 Lock for P1 is called back

Race Conditions Likely fix for LU-6398 is an enqueueing list to go with waiting & granted lists Lock resource(?) for duration of ldlm_lock_match & add to enqueueing after that (if necessary) Not essential to fix, but would be nice. LU-6397 is a special case related to new objects (Fixed – Thanks Jinshan)

Questions Do you have any? Lock ahead work in LU-6179 If you want to help, some test cases would be especially welcome Cray will provide these, but community assistance would speed things up Happy to answer questions later or by

Other Information Thanks to everyone for comments & input