Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deduplication in Storage Systems

Similar presentations


Presentation on theme: "Deduplication in Storage Systems"— Presentation transcript:

1 Deduplication in Storage Systems
Joseph Fernandes Ewen Pinto Srinivas Billava

2 Who we are ? Joseph Fernandes (Senior Engineer, Red Hat Storage)
Ewen Pinto (VI Sem MCA, NMAMIT, Nitte) Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)

3 Agenda What is Dedupe Why Dedupe Type of Dedupe What is Deduped
Where its Deduped When its Deduped Challenges in Dedupe Current work

4 What is Deduplication? Intelligent way of storing data, by removing redundant copies of data and storing only one instance.

5 What is Deduplication? Data units are identified by hash index
Redundant data units replaced by pointers Hash algorithm with minimum collision Search should be precise and fast Should have rich metadata filter : Modification Frequency, IO Sizes etc Should deal with distributed nature of data Should do load balancing

6 Why dedupe? Reduces Total Cost of Ownership (TCO) Storage Network
Used in Backup/Archive Disaster Recovery Replication local/remote

7 What is deduped? File Level (Single instancing) File 1 # HASH 1 File 2

8 What is deduped? File Level (Single instancing) File 1 # HASH 1
Pointer File 2

9 What is deduped? File Level (Single instancing) File 1 # HASH 1 File 2

10 What is deduped? Block Level File 1 File 2 B1 File 1 # HASH 1 File 1

11 Fixed Block Chucking File is divided in even/equal length blocks
Pros: Faster! Cons: Not space efficient!

12 Fixed Block Chunking File

13 Variable Block Chunking
File is chucked in variable block length Block size is determined by content Rolling Hash algorithm : Rabin Karp RHash = (p^n) * a[0]   +   (p^[n-1]) * a[1]   +   (p^[n-2]) * a[2] …..p * a[n-2]    +    a[n-1] If (RHash & fingerprint) == 0 { Chunk! }

14 Variable Block Chunking
File

15 Variable Block Chucking
Pros: Space Efficiency! Cons: Slower !

16 Where its Deduped? Client Side Pros: Less network traffic
Cros: Heavier Clients CPU/Memory Metadata storage

17 Where its Deduped? Server Side Pros: Lighter Clients
Cons: more network traffic

18 When its Deduped? Inline Deduped Offline Deduped

19 Challenges in Dedupe Single point of failure
“Last line of defense! Or fall off the cliff!” Performance Distributed Dedupe

20 Current Work: YADL “Yet Another Dedupe Library”
Stream based user space dedupe library File or Object or Block The Future : YADL-E

21 Current Work: YADL https://github.com/YADL/yadl Contributors:
Ewen Pinto Srinivas B Karthik US Sukumar Poojary

22 THANK YOU


Download ppt "Deduplication in Storage Systems"

Similar presentations


Ads by Google