Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002
Outline Reference Lustre Cluster Lustre System Components Distributed Lock Manager Object Based Storage Conclusion (security issues)
Reference Lustre: A SAN File System for Linux – Several presentation materials from Dr. P eter J. Braam
A Lustre Cluster 10,000’s 10’s of nodes 1,000’s
Key Design Issue : Scalability I/O throughput –How to avoid bottlenecks Metadata scalability –How can 10,000’s of nodes work on files in same fol der Cluster Recovery –If sth fails, how can transparent recovery happen Management –Adding, removing, replacing, systems; data migratio n & backup
System Components
Interaction between systems OST MDS Client CMD protocol (directory) metadata handling, inodes updates, concurrency Pre-allocation file creation, recovery purpose, file status, OS protocol File I/O, allocation of blocks, striping, security enforcement
Client File System A directory tree, subdivision into filesets for cluster ▷ wide Unix file sharing semantics CMD protocol –Transaction-based –Authenticated access –Write-behind caching for MD updates with strict data/metadata coherency
Metadata Service (MDS) All access to the file is governed by MDS which will directly or indirectly authorize access. To control namespace and manage inodes Load balanced cluster service for the scalability (a well balanced API, a stackable framework for logical MDS, replicated MDS) Journaled batched metadata updates
Object Storage Targets (OST) Keep file data objects File I/O service ▷ Access to the objects The block allocation for data obj., leading distributed and scalability OST s/w modules –OBD server, Lock server –Obj. storage driver, OBD filter –Portal API
VAXCluster DLM adapted
Distributed Lock Manager For generic and rich lock service Lock resources: resource database –Organize resources in trees High performance –node that acquires resource manages tree
Big Picture Resource Tree and namespace Name1 Name2 Name3 Name4 : Obj.2 Obj.1 Obj.3 Obj.4 Resource manager RR RR distributed resource directory/hash function (LDWV)/lock directory Apps.
Mechanism in resource dB Hash binary string % N ▷ get h Lookup system in lock directory weight vector [h] ▷ find system K. Systems – may occupy 0, 1 or more slots in LDWV – Number of slots is lock directory weight
Lustre DLM features Low concurrency –Want write-back caching High concurrency –Want load balancing in cluster –Subdivide directories etc with hashes –Want server of request to limit lock revocations-> ops. on the MD cluster in a client server RPC model Deadlock detection
Object Based Storage
Object Based Storage Device –More intelligent than block device Speak storage at “inode level” –create, unlink, read, write, getattr, setattr… –Iterators, security, almost arbitrary processing
Components of OB Storage Storage Object Device Drivers –Class drivers : attach driver to interface Targets, clients : remote access Direct drivers : to manage physical storage Logical drivers: for intelligence & storage manage ment Object storage application (OSA) –(cluster) file systems –Advanced storage : parallel I/O, snapshots –Specialized apps. : caches, db’s, filesrv
System Interface Modules –Load the kernel modules to get drivers of a ce rtain type –Name devices to be of a certain type –Build stacks of devices with assigned types
Layering of Object Drivers
Interaction of Obj. Storage s/w modules
Benefits-clustering/SM Suitable for use in a SAN file system Shared at the level of an individual block Obj namespace : divided into obj group. This is very advantageous to be able to create obj w/ given obj id’s. Good for snapshot! Hot file migration
Conclusion Object Based Storage To process the disk operations on the higher concept of i ndividual files and the file inode level, rather than the lo w-level h/w disk block level. Security Issues –Auxiliary service in cluster LDAP, PKI, Kerberos –Purpose CFS/ MDS/ OST –Authenticate to each other –Set up session keys
Etc. GSS-API for authentication and Integrity Ch ecks Remote DMA –Layer for NEVER bypass security processing –Request processing for checking authentication by a higher level layer in the networking stack