Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell

Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell (paf@cray.com)

Lock Ahead: Quick Refresher Let user space request LDLM extent locks with an ioctl Allows optimizing for various IO patterns by avoiding unnecessary LDLM lock contention Focused on improving shared file IO performance You were at my LUG talk, right?

High Level Design Uses the same machinery as the existing asynchronous glimpse lock (AGL) implementation Glimpse locks are a special lock type which allows information to be extracted without taking a full lock In particular, glimpse locks on OSTs are for file size AGLs have a lot we need…

High Level Design AGLs: Used by statahead to speculatively gather size information Statahead thread requests AGL locks Notable features: LDLM lock request without a corresponding IO operation Asynchronous: Requesting thread does not wait for reply from server

High Level Design Lock ahead request has no IO to do, so AGL model is a good fit Asynchronous requests are critical to requesting a large number of locks ahead of IO If we had to wait for each lock request, performance gains would be lost Server must not expand lock ahead requests, so a new LDLM flag is added for that

High Level Design: Wrinkles Problems came in three forms: OFD glimpse callback/size checking problems Async lock request handling Race conditions Servers need to be able to get current file size from clients (ofd_intent_{policy,cb}) Exploit the assumption that every write lock is being used for actual IO So the most distant write lock on any object will know the current size, only need to ask that lock about size

OFD Changes Lock ahead violates that assumption: A write extent lock (PW) can exist without a corresponding IO request, so the ‘most distant’ lock may have incorrect size Solution: Starting from most distant lock, glimpse each lock until you find one which has size *inside* the extent of the lock (Thanks, Andreas) Not ideal, but except for lock ahead, there will almost never be a large number of write locks on one object

OFD Changes Err. Oleg felt there was a race condition: In the normal case, the “most distant lock” will not change midstream, because resource is locked & no new locks can be granted In this case, multiple clients can be writing, so while the glimpse callbacks are being sent, a different lock becomes “most distant” active lock Thoughts?

OFD Changes Possible performance problems for lock ahead when writing a large file For example, 100 GB per OST, 1 MB blocks: 100,000 locks per OST That’s a lot of callbacks, also lots of contention in there (Have to allocate lock lists atomicly to avoid deadlocks) Impact TBD – Race conditions have impeded larger tests that would show this problem…

Race Conditions “NEVER sleep in PTLRPCD! NEVER!” – Oleg Drokin Async lock requests are made by ptlrpcd threads (instead of requesting thread sleeping on reply) Ldlm_completion_ast: Can result in sleep. Ldlm_completion_ast_async: Alternate implementation, doesn’t sleep Long story, but the issue was the sleeping Required some other tweaks, will ask about on the mailing list(?), but looks good

Race Conditions LU-1669: Replace write mutex with range lock Now, multiple threads can race LDLM requests on the same object Lock ahead is an easy way to expose these, but most of them apply to normal IO as well IO completes, but unnecessary lock requests are generated

Race Conditions LU-6398: Two processes, P1 and P2 P1 starts a write, generates LDLM lock request P1 waits for reply from server P2 starts a read to same region of the file P2 cannot match lock requested by P1 since it’s still waiting for a reply P2 waits for a reply from server P1 Receives reply, lock is granted on whole file P2 Receives reply, lock is blocked by lock granted to P1 Lock for P1 is called back

Race Conditions Likely fix for LU-6398 is an enqueueing list to go with waiting & granted lists Lock resource(?) for duration of ldlm_lock_match & add to enqueueing after that (if necessary) Not essential to fix, but would be nice. LU-6397 is a special case related to new objects (Fixed – Thanks Jinshan)

Questions Do you have any? Lock ahead work in LU-6179 If you want to help, some test cases would be especially welcome Cray will provide these, but community assistance would speed things up Happy to answer questions later or by email (paf@cray.com)

Other Information Thanks to everyone for comments & input

Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell

Similar presentations

Presentation on theme: "Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell

Similar presentations

Presentation on theme: "Shared File Performance Improvements LDLM Lock Ahead Patrick Farrell"— Presentation transcript:

Similar presentations

About project

Feedback