Download presentation
Presentation is loading. Please wait.
1
XtreemFS Olga Rocheeva
Version 1.0 put out in 2009, still in use today. It’s written in Java and is a fault-tolerant object-based file system.
2
XtreemFS is… ... an open source file system. ... a POSIX file system.
... a multi-platform file system. ... a globally distributed file system. ... a failure-tolerant file system. ... a secure file system. ... a customizable file system. ... an open source file system. It is distributed freely and can be used by anyone without limitations. ... a customizable file system. Users can mount and access XtreemFS like any other common file system. Application can access XtreemFS via the standard file system interface, i.e. without having to be rebuilt against a specialized API. XtreemFS supports a POSIX-compliant access control model. Unlike cluster file systems, an XtreemFS installation is not restricted to a single administrative domain or cluster. It can span the globe and may comprise servers in different administrative domains. ... a failure-tolerant file system. As stated in the previous section, replication can keep the system alive and the data safe. In this respect, XtreemFS differs from most other open-source file systems ... a multi-platform file system. Server and client modules can be installed and run on different platforms, including most Linux distributions, Solaris, Mac OS X and Windows. Since XtreemFS can be used in different environments, we consider it necessary to give administrators the possibility of adapting XtreemFS to the specific needs of their users. Customizable policies make it possible change the behavior of XtreemFS in terms of authentication, access control, striping, replica placement, replica selection and others. Such policies can be selected from a set of predefined policies, or implemented by administrators and plugged in the system. To ensure security in an untrusted, worldwide network, all network traffic can be encrypted with SSL connections, and users can be authenticated with X.509 certificates.
3
... a high-performance cluster file system.
XtreemFS is not … ... a high-performance cluster file system. ... a replacement for a local file system. Even though XtreemFS reaches acceptable throughput rates on a local cluster, it cannot compete with specialized cluster file systems in terms of raw performance numbers. Most such file systems have an optimized network stack and protocols, and a substantially larger development team. If you have huge amounts of data on a local cluster with little requirements but high throughput rates to them, a cluster file system is probably the better alternative. Even though XtreemFS can be set up and mounted on a single machine, the additional software stack degrades the performance, which makes XtreemFS a bad alternative.
4
File System Architectures
1 File System Architectures First two slides are not XtreemFS
5
TYPICAL GRID FILE SYSTEM
6
OBJECT-BASED FILE SYSTEM
7
XTREEM FILE SYSTEM In contrast to block-based file systems, the management of available and used storage space is offloaded from the metadata server to the storage servers. Rather than inode lists with block addresses, file metadata contains lists of storage servers responsible for the objects, together with striping policies that define how to translate between byte offsets and object IDs. This implies that object sizes may vary from file to file.
8
2 Distribution
9
RAID 0 Striping Data is split into stripes.
Different stripes can be accessed in parallel XtreemFS splits a file up in a set of stripes of a fixed size. Size of stripes configured on a per-file or per-directory basis. RAID 0 slits data into stripes evenly across two or more disks, without parity information, redundancy, or fault tolerance. Since different stripes can be accessed in parallel, the whole file can be read or written with the aggregated network and storage bandwidth of multiple servers. XtreemFS splits a file up in a set of stripes of a fixed size, and distributes them across a set of storage servers in a round-robin fashion. The size of an individual stripe as well as the number of storage servers used can be configured on a per-file or per-directory basis.
10
3 XtreemFS Servers The XtreemFS URL shows you on which MRC the volume is hosted and the name of the volume. This file has three replicas and is replicated with the WqRq policy (majority voting).
11
DIR - DIRECTORY SERVICE
Central Service Registry Keeps all address mappings which services need to translate UUIDs to hostname and port. MRCs and OSDs use DIR to synchronize their clocks Currently a single instance Persistent data is stored in BabuDB Persistent data is stored in BabuDB, a non-transactional key-value-store
12
DIR - DIRECTORY SERVICE (CONT.)
The service and address mapping records are stored in their XDR representation This means that the DIR database must be deleted or converted if data structures change The MRC uses the DIR to discover storage servers (OSDs). External Data Representation (XDR) is a standard data serialization format. It allows data to be transferred between different kinds of computer systems. XDR is implemented as a software library of functions which is portable between different operating systems and is also independent of the transport layer.
13
MRC - METADATA AND REPLICA CATALOG
The MRC stores the directory tree and file metadata such as file name, size or modification time. The MRC authenticates users and authorizes access to files.
14
OSD - OBJECT STORAGE DEVICE
An OSD stores arbitrary objects of files Clients read and write file data on OSDs.
15
Client These servers are connected by the client to a file system.
A client mounts one of the volumes of the MRC in a local directory. It translates file system calls into RPCs sent to the respective servers. The client is implemented as a FUSE user-level driver that runs as a normal process. FUSE itself is a kernel-userland hybrid that connects the user-land driver to Linux' Virtual File System (VFS) layer where file system drivers usually live.
16
Client-Server Interaction
17
4 File Replication
18
Read-only Replication
Files that are read-only replicated can only be opened in read-only mode and cannot be modified. XtreemFS uses a rarest-first strategy (similar to BitTorrent) to increase the replication factor as quickly as possible. Should not be used for data safety as there are no guarantees that all replicas were created successfully when the close() operation returns. Read-only replicas are either full or partial. Full replicas immediately copy the file data from other replicas when they are created. XtreemFS uses a rarest-first strategy (similar to BitTorrent) to increase the replication factor as quickly as possible. In contrast, partial replicas are initially empty and fetch the file data (objects) on demand when requested by a client. Partial replicas also pre-fetch a small number of objects to reduce latency for further client reads. Limitations: Files that are read-only replicated can only be opened in read-only mode and cannot be modified. To allow existing applications to take advantage of the read-only replication without modifications, XtreemFS offers "replicate-on-close". When the default replication policy for a volume is set to "ronly", files can be opened and modified like regular files until they are closed. Once a file is closed, it is set to read-only and is replicated according to the replication factor set for the volume. This mode should, however, not be used for data safety as there are no guarantees that all replicas were created successfully when the close() operation returns. For data safety, users need to use read/write replication.
19
Read-Write File Replication Protocol
The XtreemFS URL shows on which MRC the volume is hosted and the name of the volume. This file has three replicas and is replicated with the WqRq policy (majority voting).
20
File Replication Implementation
1. A client contacts a random storage server to read or write a file. 2. The storage server coordinates the primary lease with the other storage servers and becomes the Primary replica for that file. 1. A client contacts a random storage server to read or write a file. Clients get a full list of replicas when opening a file from which they can choose a storage server. 2. The storage server coordinates the primary lease with the other storage servers and becomes the Primary replica for that file. The other replicas act as backups. Once a server has become primary, it brings the replicas to a consistent state by executing the so called Replica Reset. It collects the file's state from the other storage servers and calculates the correct state. If necessary, the replicas exchange data to bring themselves up-to-date.
21
File Replication Implementation (Cont.)
3. The client can now execute operations on the file. 4. Write operations executed by the client are applied first locally on the primary server. Then the primary sends the updates to the backups. 3. The client can now execute operations on the file. A read, as shown in the diagram, is executed locally without communication with other replicas. With the replica reset, the primary can guarantee that it has the latest state of the file on disk. 4. Write operations executed by the client are applied first locally on the primary server. Then the primary sends the updates to the backups. When the backups have finished the write, the client receives the acknowledgment.
22
File Replication Implementation (Cont.)
5. Should the primary server fail or get disconnected, a new primary will be elected. 5. Should the primary server fail or get disconnected, the client will try to contact the other storage servers. Once the storage servers see that the primary has failed, they will elect a new primary. The new primary will execute the primary reset as in step 4. The client does this transparently, i.e. applications and users will only notice a delay but they won't see any errors.
23
4 Security
24
Security Authentication - By default, authentication in XtreemFS is based on local user names and depends on the trustworthiness of clients and networks. In case a more secure solution is needed, X.509 certificates can be used. Authorization - XtreemFS supports the standard UNIX permission model, which allows for assigning individual access rights to file owners, owning groups and other users.
25
QUESTIONS?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.