Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei.

Similar presentations


Presentation on theme: "School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei."— Presentation transcript:

1 School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei Dai †, Fengqian Li #, Lintao Zhang ∗ † Peking University # Shanghai Jiao Tong University * Microsoft Research Asia

2 School of EECS, Peking University Microsoft Research Asia A BRIEF INTRODUCTION TO CLOUD STORAGE

3 School of EECS, Peking University Microsoft Research Asia “Cold Storage Is Hot Again” -- IDC Technology Assessment

4 School of EECS, Peking University Microsoft Research Asia  Hotmail: 5~22 GB per account, OneDrive: 7~25 GB per account  User generated data: video feeds, sensor inputs, operational logs  Long term archiving for financial and medical data  System backups  … …

5 School of EECS, Peking University Microsoft Research Asia Source: Managing Storage: Trends, Challenges, and options (2013-2014). Managing storage growth Designing, deploying, and managing Backup, Recovery, and Archive solutions

6 School of EECS, Peking University Microsoft Research Asia Much of the Data are Cold or Archiving Hot data: very low latency, high bandwidth Cold data: low bandwidth, (relatively) low latency Archival data: predictable workload, can tolerate long latency Source: Facebook, 2013 Facebook Photo Access Patterns

7 School of EECS, Peking University Microsoft Research Asia What Characteristics Does An Ideal Cold and Archival Storage Possess? Cheap – Low capital expense – Low operational expense Incrementally deployable – No need to over-provision too much Good Performance – Reasonable throughput – Relatively low access latency Reliable and Available

8 School of EECS, Peking University Microsoft Research Asia Which Storage Media? Magnetic DiskOptical DiskTape

9 School of EECS, Peking University Microsoft Research Asia Magnetic Disk is Promising for Cold and Archival Storage The average cost per gigabyte fell from $437,500 in 1980 to $0.05 in 2013 Shingled Magnetic Recording – High capacity Helium-filled hard drives – Low power – High capacity

10 School of EECS, Peking University Microsoft Research Asia How to Connect and Manage Large Numbers of Disks to Provide Storage Service?

11 School of EECS, Peking University Microsoft Research Asia Interconnection Technologies SATA – 6.0 Gb/s transfer speed – SATA multiplier  support only 15 devices, not support cascade SAS – 6 Gb/s transfer speed – SAS expander Fibre Channel – High bandwidth, also high expense Ethernet – ARM attaches and exposes disk –  dedicated ARMs and network infrastructure

12 School of EECS, Peking University Microsoft Research Asia USB based Storage for Data Center USB 3.0 – 5.0Gb/s transfer speed (up to 10Gb/s for USB 3.1), 300~400MB/s realistic throughput – Tree structured hubs to address up to 127 devices – Supported by most new chipsets, very (very) cheap USB Hub Existing Server Disk Array Box

13 School of EECS, Peking University Microsoft Research Asia The Problems of the Naïve Design Limited performance – An enclosure of ~100 disks with only 400MB/s throughput Single point of failure – Failure of the root hub or the server cause total data loss Desired Design Traditional wisdom: multi-path attached storage is expensive

14 School of EECS, Peking University Microsoft Research Asia Two Primitives Hub Switch Control

15 School of EECS, Peking University Microsoft Research Asia The Data Plane (Simple Tree)

16 School of EECS, Peking University Microsoft Research Asia The Data Plane (2-Way Redundancy) Server 1Server 2

17 School of EECS, Peking University Microsoft Research Asia The Data Plane (4 Output Ports)

18 School of EECS, Peking University Microsoft Research Asia The Data Plane (4 Output Ports)

19 School of EECS, Peking University Microsoft Research Asia The Control Plane What can be controlled? – Switches and Power to each disk Control Plane

20 School of EECS, Peking University Microsoft Research Asia SOFTWARE DESIGN

21 School of EECS, Peking University Microsoft Research Asia Software Design Serve the storage allocation and access Detect failures and implement quick failover Provide an appropriate interface for upper layer services and applications

22 School of EECS, Peking University Microsoft Research Asia Software Architecture … Interconnect Fabric Host iSCSI Target USB Monitor UStore EndPoint UStore ClientLib iSCSI Initiator iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor

23 School of EECS, Peking University Microsoft Research Asia Software Architecture … Interconnect Fabric UStore ClientLib iSCSI Initiator iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor UStore Master Heartbeat Messages Primary Controller Backup Controller Control Hubs, Switches, Disks Control Commands Paxos

24 School of EECS, Peking University Microsoft Research Asia Configuring Interconnect Fabric S1S2 S3S4 D1D2D3D4D5D6D7D8D9 UStore Master S1 : D1,D4 S2 : D7,D8 S3 : D2,D3 S4 : D5,D6,D7 Connect D1 to S3 and D4 to S2

25 School of EECS, Peking University Microsoft Research Asia Configuring Interconnect Fabric S1S2 S3S4 D1D2D3D4D5D6D7D8D9 UStore Master S1 : Crash S2 : D7,D8, D4 S3 : D2,D3, D1 S4 : D5,D6,D7 Connect D1 to S3 and D4 to S2 Reconfiguration Completion

26 School of EECS, Peking University Microsoft Research Asia UStore Prototype

27 School of EECS, Peking University Microsoft Research Asia COST COMPARISON

28 School of EECS, Peking University Microsoft Research Asia Cost Comparison SystemMediaCapital ExpenseWithout Disks DELL PowerVault MD3260i Near-line SAS$3,340,000$1,525,000 Sun StorageTek SL150 LTO6 Tape$1,748,000- PergamumSATA HD$756,000$415,000 BACKBLAZESATA HD$598,000$257,000 UStoreSATA HD$456,000$115,000 Capital Expense Operational Expense – Low power consumption – Low cooling cost – Low space occupation – Low operational cost DELL PowerVault MD3260i Near-line SAS$3,340,000$1,525,000 Sun StorageTek SL150 LTO6 Tape$1,748,000- PergamumSATA HD$756,000$415,000 BACKBLAZESATA HD$598,000$257,000 UStoreSATA HD$456,000$115,000

29 School of EECS, Peking University Microsoft Research Asia PERFORMANCE EVALUATION

30 School of EECS, Peking University Microsoft Research Asia Throughput 4MB Sequence 4KB Sequence SATA to USB bridge, USB hub, and USB switch have little impact on disk performance

31 School of EECS, Peking University Microsoft Research Asia Total Throughput  Duplex throughput of one root: 540MB/s  Total throughput of our prototype: 2160MB/s Total throughput increases with the increase of disks

32 School of EECS, Peking University Microsoft Research Asia Switching Time

33 School of EECS, Peking University Microsoft Research Asia Whole System’s Power Consumption

34 School of EECS, Peking University Microsoft Research Asia CONCLUSION AND FUTURE WORK

35 School of EECS, Peking University Microsoft Research Asia Conclusion Cheap – Low capital expense – Low operational expense Incrementally deployable – No need to over-provision too much Good Performance – Reasonable throughput – Relatively low access latency Reliable and Available

36 School of EECS, Peking University Microsoft Research Asia Future Work Provide data redundancy in UStore, leveraging low coupling of disks and servers

37 School of EECS, Peking University Microsoft Research Asia Thank You! Questions?

38 School of EECS, Peking University Microsoft Research Asia Failure Rate MTTF of servers is 3.4 months MTTF of disks is 10-50 years

39 School of EECS, Peking University Microsoft Research Asia Prototype’s interconnect topology

40 School of EECS, Peking University Microsoft Research Asia Power Management A lot of mechanisms proposed for power saving in storage system – managing data redundancy and placement Provide disk control interface that allows upper layer services to control the state of the disks that belong to them (spin-down/spin- up). Spin down disks after a configured interval

41 School of EECS, Peking University Microsoft Research Asia Power Consumption 1W DiskHub


Download ppt "School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei."

Similar presentations


Ads by Google