Download presentation
Presentation is loading. Please wait.
1
Deduplication in Storage Systems
Joseph Fernandes Ewen Pinto Srinivas Billava
2
Who we are ? Joseph Fernandes (Senior Engineer, Red Hat Storage)
Ewen Pinto (VI Sem MCA, NMAMIT, Nitte) Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)
3
Agenda What is Dedupe Why Dedupe Type of Dedupe What is Deduped
Where its Deduped When its Deduped Challenges in Dedupe Current work
4
What is Deduplication? Intelligent way of storing data, by removing redundant copies of data and storing only one instance.
5
What is Deduplication? Data units are identified by hash index
Redundant data units replaced by pointers Hash algorithm with minimum collision Search should be precise and fast Should have rich metadata filter : Modification Frequency, IO Sizes etc Should deal with distributed nature of data Should do load balancing
6
Why dedupe? Reduces Total Cost of Ownership (TCO) Storage Network
Used in Backup/Archive Disaster Recovery Replication local/remote
7
What is deduped? File Level (Single instancing) File 1 # HASH 1 File 2
8
What is deduped? File Level (Single instancing) File 1 # HASH 1
Pointer File 2
9
What is deduped? File Level (Single instancing) File 1 # HASH 1 File 2
10
What is deduped? Block Level File 1 File 2 B1 File 1 # HASH 1 File 1
11
Fixed Block Chucking File is divided in even/equal length blocks
Pros: Faster! Cons: Not space efficient!
12
Fixed Block Chunking File
13
Variable Block Chunking
File is chucked in variable block length Block size is determined by content Rolling Hash algorithm : Rabin Karp RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] …..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk! }
14
Variable Block Chunking
File
15
Variable Block Chucking
Pros: Space Efficiency! Cons: Slower !
16
Where its Deduped? Client Side Pros: Less network traffic
Cros: Heavier Clients CPU/Memory Metadata storage
17
Where its Deduped? Server Side Pros: Lighter Clients
Cons: more network traffic
18
When its Deduped? Inline Deduped Offline Deduped
19
Challenges in Dedupe Single point of failure
“Last line of defense! Or fall off the cliff!” Performance Distributed Dedupe
20
Current Work: YADL “Yet Another Dedupe Library”
Stream based user space dedupe library File or Object or Block The Future : YADL-E
21
Current Work: YADL https://github.com/YADL/yadl Contributors:
Ewen Pinto Srinivas B Karthik US Sukumar Poojary
22
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.