Presentation is loading. Please wait.

Presentation is loading. Please wait.

DeDu: Building a Deduplication Storage system over Cloud computing This paper appears in : Computer Supported Cooperative work in Design(CSCWD),2011 15.

Similar presentations


Presentation on theme: "DeDu: Building a Deduplication Storage system over Cloud computing This paper appears in : Computer Supported Cooperative work in Design(CSCWD),2011 15."— Presentation transcript:

1 DeDu: Building a Deduplication Storage system over Cloud computing This paper appears in : Computer Supported Cooperative work in Design(CSCWD),2011 15 th International Data of Conference: 8-10 June 2011 Author(s): Zhe Sun, Jun Shen, Fac. of inf., Univ. of Wollongong, Wollongong, NSW, Australia Jianming Yong, Fac. of bus., Univ. of Southern Queensland, Toowoomab, QLD,Australia Speaker: Yen-Yi Chen MA190104 Date:2013/05/28

2 Outline Introduction Two issues to be addressed Deduplication Theories and approaches System design Simulations and Experiments Conclusions

3 Introduction 雲端運算興起、分散式系統架構 資訊爆炸、資料海量 儲存設備成本上升 增加資料傳輸與減緩佔用網路頻寬

4 Introduction System name : DeDu Front-end: deduplication application Back-end: Hadoop Distributed File System HDFS HBase

5 Two issues to be addressed How does the system identify the duplication? *hash function-MD5 and SHA-1 How does the system manage the data? *HDFS and HBase

6 Deduplication A C B A C B C CA A B A A A B B C C B Data Store a a a c b b 1. Data chunks are evaluated to determine a unique signature for each 2. Signature values are compared to identify all duplicates 3.Duplicate data chunks are replaced with pointes to a single stored chunk. Saving storage space 類別 File-levelBlock-level 重複資料比對層級檔案區塊 重複資料比對範圍 整個指定磁碟區 優點 對單一檔案的容量刪減 效果最好 可跨檔案比對,也能比 對不同檔案底層的重複 部份 缺點 對已編碼檔案無效,對 完全相同的兩份檔案仍 會重複儲存 較消耗處理資源 重複資料刪檢比例 1:2~1:5 1:200 甚至更高

7 Theories and approaches A. The architecture of source data and link files B. Architecture of deduplication cloud storage system

8 Source data and link files

9 Deduplication Cloud storage system

10 System design A.Data organisation B.Storage of the files C.Access to the files D.Deletion of files

11 Data organisation

12 Storage of the files

13 Access to the files

14 Deletion of files

15 Simulations and Experiments

16 Performance evaluations

17 Conclusions 1. The fewer the data nodes, the writing efficiency is high; but the reading efficiency is low; 2. The more data nodes, the writing efficiency is low, but reading efficiency is hight; 3. single file is big, the time to calculate hash values becomes higher ; but transmission cost is low; 4.single file is small, the time to calculate hash values becomes lower ; but transmission cost is high.

18 Thanks for your listening


Download ppt "DeDu: Building a Deduplication Storage system over Cloud computing This paper appears in : Computer Supported Cooperative work in Design(CSCWD),2011 15."

Similar presentations


Ads by Google