Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco.

Slides:



Advertisements
Similar presentations
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Advertisements

Enhancing Demand Response Signal Verification in Automated Demand Response Systems Daisuke Mashima, Ulrich Herberg, and Wei-Peng Chen SEDN (Solutions for.
What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Henry C. H. Chen and Patrick P. C. Lee
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage.
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
Pond The OceanStore Prototype. Pond -- Dennis Geels -- January 2003 Talk Outline System overview Implementation status Results from FAST paper Conclusion.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
Pond The OceanStore Prototype. Introduction Problem: Rising cost of storage management Observations: Universal connectivity via Internet $100 terabyte.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility John Kubiatowicz University of California at Berkeley.
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
An OceanStore Retrospective John Kubiatowicz University of California at Berkeley.
Failure Independence in Oceanstore Archive Hakim Weatherspoon University of California, Berkeley.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
OceanStore Status and Directions ROC/OceanStore Retreat 1/13/03 John Kubiatowicz University of California at Berkeley.
Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.
Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.
OceanStore: Data Security in an Insecure world John Kubiatowicz.
OceanStore Theoretical Issues and Open Problems John Kubiatowicz University of California at Berkeley.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Wide-area cooperative storage with CFS
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore.
OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels,
Efficient Proactive Security for Sensitive Data Storage Arun Subbiah Douglas M. Blough School of ECE, Georgia Tech {arun,
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz, et al ASPLOS 2000.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
OceanStore: In Search of Global-Scale, Persistent Storage John Kubiatowicz UC Berkeley.
Scalability in a Secure Distributed Proof System Kazuhiro Minami and David Kotz May 9, 2006 Institute for Security Technology Studies Dartmouth College.
Intrusion Tolerant Software Architectures Bruno Dutertre, Valentin Crettaz, Victoria Stavridou System Design Laboratory, SRI International
OceanStore: An Architecture for Global- Scale Persistent Storage.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
Database Laboratory Regular Seminar TaeHoon Kim Article.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Overview Issues in Mobile Databases – Data management – Transaction management Mobile Databases and Information Retrieval.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Clouding with Microsoft Azure
Persistence of Data in a Dynamic Unreliable Network
Data Management on Opportunistic Grids
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage
OceanStore: An Architecture for Global-Scale Persistent Storage
Making the Archive Real
OceanStore August 25, 2003 John Kubiatowicz
OceanStore: Data Security in an Insecure world
A Redundant Global Storage Architecture
Pond: the OceanStore Prototype
OceanStore: An Architecture for Global-Scale Persistent Storage
Content Distribution Network
Outline for today Oceanstore: An architecture for Global-Scale Persistent Storage – University of California, Berkeley. ASPLOS 2000 Feasibility of a Serverless.
Presentation transcript:

Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco. Thursday, July 31, 2003

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:2 OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop. –Cars, Cellphones. –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net. –Broadband to the home and office. –Wireless technologies such as CDMA, Satellite, laser.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:3 Archival Storage Where is persistent information stored? –Want: Geographic independence for availability, durability, and freedom to adapt to circumstances How is it protected? –Want: Encryption for privacy, secure naming and signatures for authenticity, and Byzantine commitment for integrity Is it Available/Durable? –Want: Redundancy with continuous repair and redistribution for long-term durability

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:4 Path of an Update

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:5 Questions about Data? How to use redundancy to protect against data being lost? How to verify data? Amount of resources used to keep data durable? Storage? Bandwidth?

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:6 Archival Dissemination Built into Update Erasure codes –redundancy without overhead of strict replication –produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:7 Durability Fraction of Blocks Lost Per Year (FBLPY)* –r = ¼, erasure-enncoded block. (e.g. m = 16, n = 64) –Increasing number of fragments, increases durability of block Same storage cost and repair time. –n = 4 fragment case is equivalent to replication on four servers. * Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS 2002.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:8 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1F2F3F4 H1H2H3H4 H12H34 H14 B-GUID Hd Data Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:9 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1 H1H2 H12H34 H14 B-GUID Hd Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:10 Enabling Technology GUID Fragments

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:11 Complex Objects I Unit of Coding data Verification Tree GUID of d Encoded Fragments: Unit of Archival Storage

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:12 Complex Objects II GUID of d 1 d1d1 Unit of Coding Encoded Fragments: Unit of Archival Storage Verification Tree Blocks Data VGUID d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 Data B -Tree Indirect Blocks M

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:13 Complex Objects III Data Blocks VGUID i VGUID i + 1 d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 d1d1 Data B -Tree Indirect Blocks M d' 8 d' 9 M backpointer copy on write AGUID = hash{name+keys}

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:14 Mutable Data Need mutable data for real system. –Entity in network. –A-GUID to V-GUID mapping. –Byzantine Commitment for Integrity –Verifies client privileges. –Creates a serial order. –Atomically applies update. Versioning system –Each version is inherently read-only.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:15 Deployment Planet Lab global network –98 machines at 42 institutions, in North America, Europe, Asia, Australia. –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –North American machines (2/3) on Internet2

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:16 Deployment Deployed storage system in November of –~ 50 physical machines. –100 virtual nodes. 3 clients, 93 storage serves, 1 archiver, 1 monitor. –Support OceanStore API NFS, IMAP, etc. –Fault injection. –Fault detection and repair.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:17 Performance Performance of the Archival Layer –Performance of an OceanStore server in archiving a objects. –analyze operations of archiving data (this includes signing updates in a BFT protocol). No archiving Archiving (synchronous) (m = 16, n = 32) Experiment Environment –OceanStore servers were analyzed on a 42-node cluster. –Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs 1.5 GB ECC PC133 SDRAM two 36 GB IBM UltraStar 36LZX hard drives. The machines use a single Intel PRO/1000 XF gigabit Ethernet adaptor to connect to a Packet Engines Linux SMP kernel.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:18 Performance: Throughput Data Throughput –No archive 8MB/s. –Archive 2.8MB/s.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:19 Performance: Latency Latency –Fragmentation Y-intercept 3ms, slope 0.3s/MB. –Archive = No archive + Fragmentation.

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:20 Closer Look: Update Latency Threshold Signature dominates small update latency –Common RSA tricks not applicable Batch updates to amortize signature cost Tentative updates hide latency Update Latency (ms) Key Size Update Size 5% Time Median Time 95% Time 512b 4kB MB b 4kB MB Latency Breakdown PhaseTime (ms) Check0.3 Serialize6.1 Apply1.5 Archive4.5 Sign77.8

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:21 Current Situation Stabilized routing layer under churn and extraordinary circumstances NSF infrastructure grant –Deploy code as a service for Berkeley –Target 1/3 PB Future Collaborations –CMU for PB Store. –Internet Archive?

FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:22 Conclusion Storage efficient, self-verifying mechanism. –Erasure codes are good. Self-verifying data assist in –Secure read-only data –Secure caching infrastructures –Continuous adaptation and repair For more information: Papers: Pond: the OceanStore Prototype - Naming and Integrity: Self-Verifying Data in P2P Systems