Synchronizing Lustre file systems Dénes Németh Balázs Fülöp Dr. János Török Dr. Imre Szeberényi ( )
The current state of art Partially solved –Conventional local file systems –Off-line operation (rsync) Problems –Walk through the directory structure –Have to know what will change (Inotify) –Does not work on distributed file systems –Scalability problems
The environment - Lustre Distributed –Stripes (part of a file) on separate hosts –~ clients (reading writing) Redundant –File system and file metadata Fault tolerance –Transaction driven operations –Rollback capability
Lustre – synchronization Distributed –Hosts absolute event sequencing Is the time accurate enough? –Clients extreme efficiency Redundant – Fault tolerance –Pulling the plug during synchronizing Moving, tracking events –Rollback synchronize to transactions
The basic Lustre concept Object Storage Targets Lustre Server SideLustre Client Side Metadata Server failover ~ „inode”
Moving the information - metadata Object Storage Targets Lustre Server SideLustre Client Side Metadata Server ~ Lustre Metadata Access Kernel space Local Event Sequencer Global Event Sequencer Event Reporter Event Multiplexer Event Processor
How-to move the information Metadata Server Local Event Sequencer Global Event Sequencer Event Reporter Event Multiplexer Event Processor Block Device Proc File System TCP/IP Network TCP/IP Network TCP/IP Network Block Device Asynchrone notification system calls: Select (timeout) Read, write (blocking) Max events/sec Relative Complicated access Proc File System Easy access from user-space Notifications through signals Possibility for multiple reporters Minimal network usage Usually not a bottleneck ER & EM can be deployed together or separately TCP/IP Network Just multiplexing events No problems No authorization, registration (fix configuration) TCP/IP Network TCP/IP Network TCP/IP Network Big difficulties Sequencing = Accurate timing Network delay Delay from FS overload Connection to all MDS Can be a bottleneck
Accurate sequencing Linearly increasing output Number of local sequencers
Average sequence performance Server has enough threads - Performance OK - Server needs more threads - Performance DROPS - Why? ~ 5000 event/thread „Graceful degradation” Linear drop in performance Constant QoS
Resource usage on the global sequencer at most 2 ms in each second ~ 0
How-to commit the changes MDSOST SFS 2SFS 1 Committer Client Event Processor Committer Client Event Processor MDSOST SFS 3 Event Multiplexer MDSOST Event Reporter Event Multiplexer Event Reporter Committer Client Event Processor AB A 4 B 3 A 4 B 3 How-to execute „3” if „4” already happened? Unfortunately no real good solution
Event sequence error resolution 1.Ostrich politic Drop all evens with conflicting sequence 2.Conflict detection Is the event applicable? In design stage … 3.Replaying the already committed events Currently lack of Lustre support
Questions? Thank you for your Attention!