Download presentation
Presentation is loading. Please wait.
1
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University
2
Data trading 2 Problem: Fragile Data Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business
3
Data trading 3 Replication-based preservation
4
Data trading 4 Replication-based preservation
5
Data trading 5 Motivation Several systems use replication Preserve digital collections SAV, others Archival part of digital library Individual organizations cooperate Not a lot of money to spend
6
Data trading 6 Goal Reliable replication of digital collections Given that Resources are limited Sites are autonomous Not all sites are equal Traditional methods Central control Random Replicate popular Metric Reliability Not necessarily “efficiency”
7
Data trading 7 Our solution Data trading “I’ll store a copy of your collection if you’ll store a copy of mine” Sites make local decisions Who to trade with How many copies to make How much space to provide Etc.
8
Data trading 8 Trading network A series of binary, peer-to-peer trading links A D B H C E G F
9
Data trading 9 Reliability layer Archived data Architecture Users Filesystem InfoMonitor SAV Archive Archived data Internet Local archive Remote archive Reliability layer Service layer This architecture developed with Arturo Crespo
10
Data trading 10 Overview Trading model Trading algorithm Optimizing (and simulating) trading Some results Some stuff we are still working on
11
Data trading 11 Trading model
12
Data trading 12 Trading model Archive site: an autonomous archiving provider
13
Data trading 13 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials
14
Data trading 14 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections
15
Data trading 15 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials
16
Data trading 16 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials Data reliability: probability that data is not lost
17
Data trading 17 Deeds A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred Trading algorithm Sites trade deeds Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University
18
Data trading 18 CA B Deed trading Collection 1 Collection 2 Collection 3
19
Data trading 19 C The challenge A B Collection 3 Collection 1 Collection 2Collection 1 Collection 2 Collection 3
20
Data trading 20 C The challenge A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
21
Data trading 21 Alternative solutions Are there other ways besides trading?
22
Data trading 22 Other solutions: central control C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
23
Data trading 23 Other solutions: client-based C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
24
Data trading 24 Other solutions: random C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3
25
Data trading 25 Why is trading good? High reliability Framework for replication Site autonomy Make local decisions No submission to external authority Fairness Contribute more = more reliability Must contribute resources A D B H C E G F
26
Data trading 26 Decisions facing an archive Who to trade with How much to trade When to ask for a trade Providing space Advertising space Picking a number of copies Coping with varying site reliabilities What to do with acquired resources How to deliver other services Many many degrees of freedom!
27
Data trading 27 Our approach Define a basic trading protocol Deed trading Assume all sites follow same rules Basic system for trading Extend: not all sites are equal Some are more reliable or trusted Extend: sites have freedom to negotiate Bid trading Extend: some sites are malicious Ensure documents survive despite evildoers For each model, what policies are best?
28
Data trading 28 How do we evaluate policies? Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy
29
Data trading 29 Simulation parameters Number of sites2 to 15 Site reliability0.5 to 0.8 Collections per site4 to 25 Data per collection50 Gb to 1000 Gb Space per site2x data to 7x data Replication goal2 to 15 copies Scenarios per simulation 200
30
Data trading 30 Reliability Site reliability Will a site fail? Example: 0.9 = 10% chance of failure Data reliability How safe is the data? Despite site failures Example: 320 year MTTF
31
Data trading 31 Basic trading approach How does trading work? Assuming all sites follow “the rules” Example: advertising policy “Let’s trade. How much space do you have?” A B
32
Data trading 32 Advertising policy “I have 120 GB” 120 GB Space fractional policy “I have 60 GB” 60 GB Data proportional policy “I have 40 GB” 40 GB Data A B A B A B
33
Data trading 33 Result
34
Data trading 34 Extend: some sites > others May prefer certain sites More reliable Better reputation Part of same system Example: who to trade with? ? ? ? A
35
Data trading 35 Who to trade with?
36
Data trading 36 Extend: freedom to negotiate Bid for trades “80 GB” “95 GB” “120 GB” “How much do I pay for 100 GB of your space?” A
37
Data trading 37 Bid trading Questions When do I call auctions? How much do I bid? Can I take advantage of the system by being clever?
38
Data trading 38 Extend: some sites are malicious Secure services Publish: Makes copies to survive failures Search: Find documents Retrieve: Get a copy of a document Challenges Attacker may delete copy Attacker may provide fake search results Attacker may provide altered document Attacker may disrupt message routing … Joint work with Mayank Bawa and Neil Daswani
39
Data trading 39 Current and future work Access Support searching over collections Distribute indexes via trading Prototype implementation Basic SAV architecture implemented Trading protocol/policies must be added Develop security techniques further
40
Data trading 40 Current and future work Other topics of interest Designing peer-to-peer primitives Building other p2p services Other ways of acquiring data How to archive active systems Semantic archiving Managing “format obsolescence” Finding data once it is archived
41
Data trading 41 Other parts of SAV project SAV data model Write-once objects Signature-based naming How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.) Modeling archival repositories Arturo Crespo Choose best components and design
42
Data trading 42 Related work Peer-to-peer replication SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Deep storage (Tivoli) Barter/auction based systems ContractNet Distributed resource allocation File Allocation Problem
43
Data trading 43 Conclusion Important, exciting area Preservation critical Difficult to accomplish Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions Trading networks replicate data Model for trading networks Trading algorithm Simulation results A D B H C E G F
44
Data trading 44 For more information cooperb@stanford.edu http://www-diglib.stanford.edu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.