Download presentation
Presentation is loading. Please wait.
1
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University
2
Data trading 2 Problem: Fragile Data Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business
3
Data trading 3 Replication-based preservation
4
Data trading 4 Replication-based preservation
5
Data trading 5 Motivation Several systems use replication Preserve digital collections SAV, others Archival part of digital library Individual organizations cooperate Not a lot of money to spend
6
Data trading 6 Goal Reliable replication of digital collections Given that Resources are limited Sites are autonomous Not all sites are equal Traditional methods Central control Random Replicate popular Metric Reliability Not necessarily “efficiency”
7
Data trading 7 Our solution Data trading “I’ll store a copy of your collection if you’ll store a copy of mine” Sites make local decisions Who to trade with How many copies to make How much space to provide Etc.
8
Data trading 8 Trading network A series of binary, peer-to-peer trading links A D B H C E G F
9
Data trading 9 Reliability layer Archived data Architecture Users Filesystem InfoMonitor SAV Archive Archived data Internet Local archive Remote archive Reliability layer
10
Data trading 10 Overview Trading model Trading algorithm Simulating trading Simulation results
11
Data trading 11 Trading model
12
Data trading 12 Trading model Archive site: an autonomous archiving provider
13
Data trading 13 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials
14
Data trading 14 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections
15
Data trading 15 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials
16
Data trading 16 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials Data reliability: probability that data is not lost
17
Data trading 17 Deeds A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred Trading algorithm Sites trade deeds Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University
18
Data trading 18 CA B Deed trading Collection 1 Collection 2 Collection 3
19
Data trading 19 C The challenge A B Collection 3 Collection 1 Collection 2Collection 1 Collection 2 Collection 3
20
Data trading 20 C The challenge A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
21
Data trading 21 Alternative solutions Are there other ways besides trading?
22
Data trading 22 Other solutions: central control C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
23
Data trading 23 Other solutions: client-based C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
24
Data trading 24 Other solutions: random C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
25
Data trading 25 Why is trading good? High reliability Framework for replication Site autonomy Make local decisions No submission to external authority Fairness Contribute more = more reliability Must contribute resources A D B H C E G F
26
Data trading 26 Decisions facing an archive Who to trade with Providing space Advertising space Picking a number of copies Joining a cluster Coping with varying site reliabilities
27
Data trading 27 How do we evaluate policies? Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy
28
Data trading 28 Simulation parameters Number of sites2 to 15 Site reliability0.5 to 0.8 Collections per site4 to 25 Data per collection50 Gb to 1000 Gb Space per site2x data to 7x data Replication goal2 to 15 copies Scenarios per simulation 200
29
Data trading 29 Reliability Site reliability Will a site fail? Example: 0.9 = 10% chance of failure Data reliability How safe is the data? Despite site failures Example: 320 year MTTF
30
Data trading 30 Example: trading strategy Who should we try to trade with? The most reliable sites? Sites with reliability close to ours? The sites we have traded with before? Some other policy (like random)?
31
Data trading 31 Example: trading strategy R=0.8
32
Data trading 32 Example: reliability estimates Cannot predict when a site will fail Estimate site reliabilities Past performance Reputation Components Arturo Crespo’s work How does that impact policies? Estimate error affects resulting data reliability
33
Data trading 33 Example: reliability estimates Ignore reliability
34
Data trading 34 Results How much space? How many copies? Related questions More space = more copies Result For n copies, provide n + 1 space No need for central control, lots of space
35
Data trading 35 Results Clusters of sites? Social or political clusters E.g. all universities within a particular state Is the cluster big enough? What if it isn’t? Result A few archives are sufficient E.g. 5 archives to make 3 copies Too many sites is counter-productive
36
Data trading 36 Trading clusters
37
Data trading 37 Trading strategy Goal: pick a good trading partner Strategy = order to contact remote sites Strategies Clustering: trade with previous partners Best fit: trade with the site whose free space “best fits” the collection
38
Data trading 38 Trading strategy New strategies to deal with reliability Highest reliability Lowest reliability Closest reliability Weighted strategies Weighted clustering Weighted best fit
39
Data trading 39 Current and future work Bidding versus direct trading Local site holds an auction Bids = size of local site’s deed “Deviant” sites Greedy sites Follow protocol but do not play nice Access Support searching over collections Distribute indexes via trading
40
Data trading 40 Current and future work Security Will sites actually preserve data? Will they give it to others? Can I protect sensitive information? What if I fail and lose my keys? Can I authenticate myself?
41
Data trading 41 Other parts of SAV project SAV data model Write-once objects Signature-based naming How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.) Modeling archival repositories Arturo Crespo Choose best components and design
42
Data trading 42 Related work Peer-to-peer replication SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Barter/auction based systems ContractNet Distributed resource allocation File Allocation Problem
43
Data trading 43 Conclusion Important, exciting area Preservation critical Difficult to accomplish Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions Trading networks replicate data Model for trading networks Trading algorithm Simulation results A D B H C E G F
44
Data trading 44 For more information cooperb@stanford.edu http://www-diglib.stanford.edu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.