Download presentation
Presentation is loading. Please wait.
Published byAubrey Brown Modified over 8 years ago
1
School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei Dai †, Fengqian Li #, Lintao Zhang ∗ † Peking University # Shanghai Jiao Tong University * Microsoft Research Asia
2
School of EECS, Peking University Microsoft Research Asia A BRIEF INTRODUCTION TO CLOUD STORAGE
3
School of EECS, Peking University Microsoft Research Asia “Cold Storage Is Hot Again” -- IDC Technology Assessment
4
School of EECS, Peking University Microsoft Research Asia Hotmail: 5~22 GB per account, OneDrive: 7~25 GB per account User generated data: video feeds, sensor inputs, operational logs Long term archiving for financial and medical data System backups … …
5
School of EECS, Peking University Microsoft Research Asia Source: Managing Storage: Trends, Challenges, and options (2013-2014). Managing storage growth Designing, deploying, and managing Backup, Recovery, and Archive solutions
6
School of EECS, Peking University Microsoft Research Asia Much of the Data are Cold or Archiving Hot data: very low latency, high bandwidth Cold data: low bandwidth, (relatively) low latency Archival data: predictable workload, can tolerate long latency Source: Facebook, 2013 Facebook Photo Access Patterns
7
School of EECS, Peking University Microsoft Research Asia What Characteristics Does An Ideal Cold and Archival Storage Possess? Cheap – Low capital expense – Low operational expense Incrementally deployable – No need to over-provision too much Good Performance – Reasonable throughput – Relatively low access latency Reliable and Available
8
School of EECS, Peking University Microsoft Research Asia Which Storage Media? Magnetic DiskOptical DiskTape
9
School of EECS, Peking University Microsoft Research Asia Magnetic Disk is Promising for Cold and Archival Storage The average cost per gigabyte fell from $437,500 in 1980 to $0.05 in 2013 Shingled Magnetic Recording – High capacity Helium-filled hard drives – Low power – High capacity
10
School of EECS, Peking University Microsoft Research Asia How to Connect and Manage Large Numbers of Disks to Provide Storage Service?
11
School of EECS, Peking University Microsoft Research Asia Interconnection Technologies SATA – 6.0 Gb/s transfer speed – SATA multiplier support only 15 devices, not support cascade SAS – 6 Gb/s transfer speed – SAS expander Fibre Channel – High bandwidth, also high expense Ethernet – ARM attaches and exposes disk – dedicated ARMs and network infrastructure
12
School of EECS, Peking University Microsoft Research Asia USB based Storage for Data Center USB 3.0 – 5.0Gb/s transfer speed (up to 10Gb/s for USB 3.1), 300~400MB/s realistic throughput – Tree structured hubs to address up to 127 devices – Supported by most new chipsets, very (very) cheap USB Hub Existing Server Disk Array Box
13
School of EECS, Peking University Microsoft Research Asia The Problems of the Naïve Design Limited performance – An enclosure of ~100 disks with only 400MB/s throughput Single point of failure – Failure of the root hub or the server cause total data loss Desired Design Traditional wisdom: multi-path attached storage is expensive
14
School of EECS, Peking University Microsoft Research Asia Two Primitives Hub Switch Control
15
School of EECS, Peking University Microsoft Research Asia The Data Plane (Simple Tree)
16
School of EECS, Peking University Microsoft Research Asia The Data Plane (2-Way Redundancy) Server 1Server 2
17
School of EECS, Peking University Microsoft Research Asia The Data Plane (4 Output Ports)
18
School of EECS, Peking University Microsoft Research Asia The Data Plane (4 Output Ports)
19
School of EECS, Peking University Microsoft Research Asia The Control Plane What can be controlled? – Switches and Power to each disk Control Plane
20
School of EECS, Peking University Microsoft Research Asia SOFTWARE DESIGN
21
School of EECS, Peking University Microsoft Research Asia Software Design Serve the storage allocation and access Detect failures and implement quick failover Provide an appropriate interface for upper layer services and applications
22
School of EECS, Peking University Microsoft Research Asia Software Architecture … Interconnect Fabric Host iSCSI Target USB Monitor UStore EndPoint UStore ClientLib iSCSI Initiator iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor
23
School of EECS, Peking University Microsoft Research Asia Software Architecture … Interconnect Fabric UStore ClientLib iSCSI Initiator iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor iSCSI Target UStore EndPoint USB Monitor UStore Master Heartbeat Messages Primary Controller Backup Controller Control Hubs, Switches, Disks Control Commands Paxos
24
School of EECS, Peking University Microsoft Research Asia Configuring Interconnect Fabric S1S2 S3S4 D1D2D3D4D5D6D7D8D9 UStore Master S1 : D1,D4 S2 : D7,D8 S3 : D2,D3 S4 : D5,D6,D7 Connect D1 to S3 and D4 to S2
25
School of EECS, Peking University Microsoft Research Asia Configuring Interconnect Fabric S1S2 S3S4 D1D2D3D4D5D6D7D8D9 UStore Master S1 : Crash S2 : D7,D8, D4 S3 : D2,D3, D1 S4 : D5,D6,D7 Connect D1 to S3 and D4 to S2 Reconfiguration Completion
26
School of EECS, Peking University Microsoft Research Asia UStore Prototype
27
School of EECS, Peking University Microsoft Research Asia COST COMPARISON
28
School of EECS, Peking University Microsoft Research Asia Cost Comparison SystemMediaCapital ExpenseWithout Disks DELL PowerVault MD3260i Near-line SAS$3,340,000$1,525,000 Sun StorageTek SL150 LTO6 Tape$1,748,000- PergamumSATA HD$756,000$415,000 BACKBLAZESATA HD$598,000$257,000 UStoreSATA HD$456,000$115,000 Capital Expense Operational Expense – Low power consumption – Low cooling cost – Low space occupation – Low operational cost DELL PowerVault MD3260i Near-line SAS$3,340,000$1,525,000 Sun StorageTek SL150 LTO6 Tape$1,748,000- PergamumSATA HD$756,000$415,000 BACKBLAZESATA HD$598,000$257,000 UStoreSATA HD$456,000$115,000
29
School of EECS, Peking University Microsoft Research Asia PERFORMANCE EVALUATION
30
School of EECS, Peking University Microsoft Research Asia Throughput 4MB Sequence 4KB Sequence SATA to USB bridge, USB hub, and USB switch have little impact on disk performance
31
School of EECS, Peking University Microsoft Research Asia Total Throughput Duplex throughput of one root: 540MB/s Total throughput of our prototype: 2160MB/s Total throughput increases with the increase of disks
32
School of EECS, Peking University Microsoft Research Asia Switching Time
33
School of EECS, Peking University Microsoft Research Asia Whole System’s Power Consumption
34
School of EECS, Peking University Microsoft Research Asia CONCLUSION AND FUTURE WORK
35
School of EECS, Peking University Microsoft Research Asia Conclusion Cheap – Low capital expense – Low operational expense Incrementally deployable – No need to over-provision too much Good Performance – Reasonable throughput – Relatively low access latency Reliable and Available
36
School of EECS, Peking University Microsoft Research Asia Future Work Provide data redundancy in UStore, leveraging low coupling of disks and servers
37
School of EECS, Peking University Microsoft Research Asia Thank You! Questions?
38
School of EECS, Peking University Microsoft Research Asia Failure Rate MTTF of servers is 3.4 months MTTF of disks is 10-50 years
39
School of EECS, Peking University Microsoft Research Asia Prototype’s interconnect topology
40
School of EECS, Peking University Microsoft Research Asia Power Management A lot of mechanisms proposed for power saving in storage system – managing data redundancy and placement Provide disk control interface that allows upper layer services to control the state of the disks that belong to them (spin-down/spin- up). Spin down disks after a configured interval
41
School of EECS, Peking University Microsoft Research Asia Power Consumption 1W DiskHub
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.