RozoFS Architecture Overview: RozoFS components edition 1.4 23/01/2015.

Slides:



Advertisements
Similar presentations
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
File Systems Examples.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
File System Interface CSCI 444/544 Operating Systems Fall 2008.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Chapter 10: File-System Interface
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
File Systems (2). Readings r Silbershatz et al: 11.8.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
Chapter 10 File System Interface
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 10: File-System Interface.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 11 Case Study 2: Windows Vista Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Networked File System CS Introduction to Operating Systems.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Chapter 20 Distributed File Systems Copyright © 2008.
Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 11: File-System Interface Modified.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Serverless Network File Systems Overview by Joseph Thompson.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Project 6 Unix File System. Administrative No Design Review – A design document instead 2-3 pages max No collaboration with peers – Piazza is for clarifications.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
The UNIX File System (1) Some important directories found in most UNIX systems.
Linux File system Implementations
© 2006 EMC Corporation. All rights reserved. The Host Environment Module 2.1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 SCALABLE FILE STORAGE Christophe de La Guerrande Didier Féron.
W4118 Operating Systems Instructor: Junfeng Yang.
Module 11 Configuring and Managing Distributed File System.
File System Implementation
Storage Virtualization
XtreemFS Olga Rocheeva
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 11: File-System Interface
GARRETT SINGLETARY.
Chapter 2: The Linux System Part 5
CSE 451 Fall 2003 Section 11/20/2003.
Lecture 11: Flash Memory and File System Abstraction
Chapter 15: File System Internals
Lecture 4: File-System Interface
Presentation transcript:

RozoFS Architecture Overview: RozoFS components edition /01/2015

metadata Exportd Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Rozofsmount /fs1/home/ RozoFS architecture overview Components Rozofsmount Storage /fs1/home/ Metadata server Data path metadata Exportd Storage client node control path

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 3 Storage component [cid1,sid1] Storage process [cid2,sid1] [cidn,sid1] Storage Node File System (e.g: XFS) Raid 0 (0+1,5,6) Device 0 File System (e.g: XFS) Raid 0 (0+1,5,6) Device n Physical disks A storage (cid/sid) is a set of logical disks (devices) with the same capacity and performance On the same server, RozoFS can provide storages based on different technologies Note : configuration can be done with or without RAID controller storage

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 4 RozoFS clusters and Volumes Storage (host_1)Storage (host_n) Cluster 1 Cluster 2 Cluster n Volume 1 Cluster 1 Sid1:host_1.. Sidn:host_n Cluster 2 Sid1:host_1.. Sidn:host_n Cluster n Sid1:host_1.. Sidn:host_n A RozoFS cluster(cid) is an uniform set of storages (sid) in terms of disk capacity and performance A cluster id is unique within a RozoFS system

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 5 Mapping filesystems on volumes Volume 1 Cluster 1 Cluster n Volume 2 Cluster n+1 Cluster n+p Filesystem 1Filesystem jFilesystem j+1Filesystem j+k RozoFS supports configuration with multiple volumes A Volume can host more than one File system (thin provisioning) There are quotas (hard and soft) per file system A File system is identified by an unique id (eid) within the configuration

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 6 File localization within a filesystem Volume 1 Cluster 1 Cluster n Filesystem 1Filesystem j Mojette Transform Projections Storage nodes Storage (cid/sid)

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 7 RozoFS configuration Eid1:/metadata/fs1,vid=1 Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Volume 1 Eid1:/metadata/fs1,vid=1 Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Volume i Cluster 1 Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Storage_conf Listening_endpoints [cid1,sid1]:pathname1,device_count [cid2,sid1]: pathname2,device_count Exportd node Storage node fstab rozofsmount mount_path rozofs rozofsmount node

Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Rozofsmount /fs1/home/ Eid1:/metadata/fs1,vid=1 Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 RozoFS architecture overview Components Volume 1 conf Eid1:/metadata/fs1,vid=1 Rozofsmount Storage Sid1: host1 /fs1/home/ Metadata server Data path Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 RozoFS Export conf. Volume i Exportd Cluster 1 Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Storage Sid2: host2 Storage Sid3: host3 Storage Sid4: host4 client node control path

Typical RozoFS deployments

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 10 RozoFS native mode (scale-out NAS) GigE infrastructure (shared by Data storage and metadata) Native protocol Linux Client with RozoFS clients/applications Storage and metadata Rozofsmount Storage Exportd Note: the exportd function can reside on some storage nodes also.

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 11 RozoFS Cluster : NAS mode GigE infrastructure (data storage and metadata) SMB,NFS, AFP.. Windows, Linux, UNIX and Apple clients GigE Infrastructure clients/applications Rozofsmount Storage Exportd

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 12 Virtualisation solution with RozoFS: CloudStack+KVM GigE infrastructure (data storage and metadata) + Standard GigE Infrastructure Niveau clients/applications External Network Rozofsmount Storage Rozofsmount Storage Rozofsmount Storage Rozofsmount Exportd

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 13 RozoFS basic exchanges

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 14 RozoFS basic exchanges inter components interfaces Rozofsmount Storcli 1 Storcli n Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Storage Sid1: host1 Cluster conf. Metadata ops./ mount Storage monitoring Projections deletion Read/write truncate Metadata Server CLIENT NODE

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 15 Rozofsmount Eid1:/metadata/fs1,vid=1 Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 RozoFS basic exchanges Filesystem mounting Volume 1 conf Eid1:/metadata/fs1,vid=1 Rozofsmount /fs1/home/ Metadata server Cluster n Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 RozoFS Export conf. Volume i Exportd Cluster 1 Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Mount /metadata/fs1 Rozofsmount –H exportd_host –E/metadata/fs1 /fs1/home/ Clusters list Storcli 1 Storage Sid1: host1 Storage Sid2: host2 Storage Sid3: host3 Storage Sid4: host4 4 TCP open

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 16 RozoFS basic exchanges file creation Rozofsmount Metadata Server (exportd) Open(« /fs1/home/foo »,O_CREAT|O_RDWR,0640) Application/VFS Volume distribute(EID) Cluster 1 Sid1:host1 Sid2:host2 Sid3:host3 Sid4:host4 Sid5:host5 Sid6:host6 …… 1) Get the volume associated with EID (VID) 2) Get the Cluster list(CID) 3) Get 4 storages for a Cluster(SID) Export_mknod 1) allocate a unique file Id (FID) 2) Volume distribute(EID) 3) Insert(FID,« foo ») in parent directory 4) write new file attributes 5) update parent attributes DISK Eid1:/metadata/fs1,vid=1 mknod(EID,parent_fid,« foo »,O_RDWR,0640) attrs(FID,cid1:{sid1..sid4},0640,etc…} File_descriptor FID : Unique File Identifier Descriptor Parent_fid: FID of the parent directory

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 17 RozoFS basic exchanges file opening Rozofsmount Metadata Server (exportd) Open(« /fs1/home/foo »,O_RDWR,0640) application Directory entries cache Parent_dir. Name1->FID1 Name2->FID2 foo -> FID3 ……. Export_lookup 1)Get file FID from parent directory (cache or disk) 2) Get File attributes (cache or disk) DISK Eid1:/metadata/fs1,vid=1 lookup(EID,parent_fid,« foo », O_RDWR,0640) File_attributes(attrs3) l ookup attributes cache FID1->attrs1 FID2->attrs2 FID3->attrs3 ……. FID3 cid:{sid1,sid2,sid3,sid4} Atime,mtime …… attrs3open Fd File descriptor allocator FID3 cid:{sid1,sid2,sid3,sid4} Atime,mtime …… Fd 1 7 VFS

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 18 RozoFS basic exchanges synchronous file write Len = pwrite(fd,offset,size,buffer) Application/VFS 110 FID3 cid:{sid1,sid2,sid3,sid4} File size Atime,mtime …… Fd 1 write(fd1,offset,size,data) Storage Sid4: host4 Mojette Transform Forward Write projections 1) Generate projections 2) Send all the projections write in parallel 3) Wait for all the write responses Write 1) Find the context associated with fd1 2) Submit data to write to storcli 3) Wait for end of write 4) Update the blocks on exportd 3) Return written to upper layer Storage Sid3: host3 Storage Sid1: host1 Storage Sid2: host2 write(FID3,offset,data,size) Size or errcode write(FID3,prj1) status Prj1,prj2,prj3 Data,size Size or errcode Redundancy level (2+1): 2 reads 3 writes write(FID3,prj2) write(FID3,prj1) status Write_blocks ( file attributes update ) 1) Update time information 2) Update size if greater 3) Update cache and disk DISK Eid1:/metadata/fs1,vid=1 attributes cache FID1->attrs1 FID2->attrs2 FID3->attrs3 ……. Wr_blks(EID1,FID3,offset,size) Attrs(attrs3) 8 9 Metadata server (exportd) Redundancy level (2+1): 2 reads 3 writes

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 19 RozoFS basic exchanges file read Rozofsmount Len = pread(fd,offset,size,buffer) Application/VFS 18 FID3 cid:{sid1,sid2,sid3,sid4} File size Atime,mtime …… Fd 1 Pread(fd1,offset,size) Storage Sid4: host4 Storcli Mojette Transform Inverse Read projections 1) Send parallel read requests 2) Wait for projection data returned from storages 3) Rebuild initial block Read 1) Find the context associated with fd1 2) Request data to storcli 3) Return requested data to VFS Storage Sid3: host3 Storage Sid1: host1 Storage Sid2: host2 Read(FID3,offset,size) Data,length Read(FID3,prj1,offset_prj Read(FID3,prj2,offset_prj prj1prj2 Prj1,prj2 Data,length Redundancy level (2+1): 2 reads 3 writes

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 20 RozoFS basic exchanges file deletion Rozofsmount Metadata Server (exportd) unlink(« /fs1/home/foo ») Application/VFS File deletion 1)Remove the file from the parent directory (disk and cache) 2) Delete the attributes of the file (disk and cache) 3) Update the parent attributes 4) Insert file reference in the trash (list and disk) DISK Eid1:/metadata/fs1,vid=1 unlink(EID,parent_fid,«foo ») Parent_attributes Trash thread FID6->attrs6 FID7->attrs7 FID3->attrs3 ……. FID3 cid:{sid1,sid2,sid3,sid4} …… errcode Trash list attrs3 Storage Sid1: host1 Storage Sid2: host2 Storage Sid3: host3 Storage Sid4: host4 unlink(parent_fid,« /fs1/home/foo ») Projections deletions(FID3)

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 21 RozoFS data path  Mojette Transform performances  Mojette Transform uses cases

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 22 Mojette Transform performances Encoding/decoding performances with 2 redundancies projections (4+2) 1.Mojette decoding/encoding is not CPU intensive and fits well on client side 2.Mojette decoding time does not depend on number of failures

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 23 Mojette Transform performances Encoding/decoding performances with 4 redundancies projections (8+4)

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 24 Write: Mojette Transform Forward (4+2) storaged Local FS File system block (4KB) Mojette erasure coding transform Projections storaged Local FS storaged Local FS storaged Local FS storaged Local FS storaged Local FS 1kB The initial block is divided in 4 parts. The Mojette Transform generates 6 projections. Among the 6 projections any 4 projections are enough to rebuild the initial block

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 25 Mojette Transform Forward + Write Process RozoFS Layout Distribution OSD Node 1 OSD Node 2 OSD Node 3 OSD Node 4 OSD nodes 1,2,3,4) User payload RozoFS data-path write service File system block forward transformation (nominal use case) proj 1.1 proj 2.1 proj 3.1 proj1.2 proj2.2 proj 3.2 proj 1.3 proj2.3 proj 3.3 proj 1.4 proj 2.4 proj 3.4 proj 1.5 proj 2.5 proj 3.5  The set of OSD is provided within the metadata associated with the file  User payload is split in User Data Blocks (4K or 8K)  Mojette transform is applied on each UDB Optimal distribution Spare Node(s) UDB 1 UDB 2 UDB 3 UDB 4 UDB 5

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 26 RozoFS data-path write service nominal use case sequence diagram  Write transactions are performed in parallel  Write service ends upon receiving all the responses from OSD nodes

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 27 Mojette Transform Forward + Write Process RozoFS Layout Distribution OSD Node 1 OSD Node 2 OSD Node 3 OSD Node 4 OSD nodes (1,2,3,4) User payload RozoFS data-path write service failure use case proj 1.1 proj 2.1 proj 3.1 proj1.2 proj2.2 proj 3.2 proj 1.3 proj2.3 proj 3.3 proj 1.4 proj 2.4 proj 3.4 proj 1.5 proj 2.5 proj 3.5  Spare OSD is used in case of failure of OSD belonging to the optimal distribution  Write operation is successful when n+m projections are successfully written Optimal distribution Spare Node(s) UDB 1 UDB 2 UDB 3 UDB 4 UDB 5 proj 3.1 proj 3.2 proj 3.3 proj 3.4 proj 3.5

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 28 RozoFS data-path write service failure sequence diagram

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 29 Read: Mojette Transform Inverse (1/2) storaged Local FS File system blocks (4KB) Mojette erasure coding transform Projections storaged Local FS storaged Local FS storaged Local FS storaged Local FS storaged Local FS 1kB Read 4 projections among any of the 6 storage nodes

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 30 Read: Mojette Transform Inverse (2/2) storaged Local FS File system blocks (4KB) Mojette erasure coding transform Projections storaged Local FS storaged Local FS storaged Local FS storaged Local FS storaged Local FS 1kB In case of a failure of one node another one is selected among the set of servers associated with the file

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 31 RozoFS data-path read service Filesystem block Mojette inverse transformation (nominal use case) optimal distribution UDB (4K or 8K) OSD NODES projection 2 projection projection 3 Read + Inverse Mojette Transform 3 4 RozoFS Layout Distribution OSD nodes (1,2,3,4) Read  Read process selects n projections among the n+m projections to rebuild a User Data Block  It can be any projection subset (n) in the n+m projection set  Read transactions towards the OSD are performed in parallel:  Minimize data transfer delay over the network

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 32 RozoFS data-path read service sequence diagram (nominal use case)

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 33 RozoFS data-path read service failure use case optimal distribution UDB (4K or 8K) OSD NODES projection 2 projection projection 3 Read + Inverse Mojette Transform 3 4 RozoFS Layout Distribution OSD nodes (1,2,3,4) Read  Attempt reading on remaining OSD in case of read projection failure:  Disk failure  Network failure  Out of date projection Read

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 34 RozoFS data-path read service failure sequence diagram  Fast projection recovery time:  Start a guard timer on first projection read reply  At timer expiration read requests are propagated towards remaining OSD

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 35 RozoFS data-path read service failure sequence diagram: case of a CRC 32 error  The crc error is detected on the storage node  The storage nodes informs that the read failure is due to a CRC error  After rebuilding the initial data, the storcli process triggers a transform forward  The transform forward concerns only the faulty projection  It might more that one block to regenerate (depends on the number of CRC errors)  Once the projection has been regenerated, it is sent back the associated storage node

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 36 Data integrity

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 37 End to end data integrity RozoFS projection authentication: One crc32 per projection: The crc includes the payload of the projection as well as the block identifier. A block identifier is defined by the file inode allocated by RozoFS and its offset in the file. The crc32 is stored along with the projection 1kB Block_id=fid+offset(i) checksum 1kB checksum 1kB checksum 1kB checksum 1kB checksum 1kB checksum Mojette erasure coding transform

© This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of FIZIANS SAS. 38 Projection self-healing in RozoFS storaged Local FS 1kB storaged Local FS 1kB storaged Local FS 1kB Application RozoFS Application issues a read to RozoFS. Data can be rebuilt with OSD 1 and 2. Checksum on OSD1 reveals that projection is corrupted on disk storaged Local FS 1kB storaged Local FS 1kB storaged Local FS 1kB Application RozoFS RozoFS reads projection on OSD 3. Block 0 (red) is rebuit with projection from OSD 2 & 3 Good data is returned to application 4kB storaged Local FS 1kB Application RozoFS OSD1 RozoFS regenerates the corrupted projection from the rebuilt block RozoFS sends it to OSD1 for re- writing OSD2OSD3OSD1