Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.

Similar presentations


Presentation on theme: "Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University."— Presentation transcript:

1 Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

2 Data trading 2 Problem: Fragile Data Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

3 Data trading 3 Replication-based preservation

4 Data trading 4 Replication-based preservation

5 Data trading 5 Motivation Several systems use replication Preserve digital collections SAV, others Archival part of digital library Individual organizations cooperate Not a lot of money to spend

6 Data trading 6 Goal Reliable replication of digital collections Given that Resources are limited Sites are autonomous Not all sites are equal Traditional methods Central control Random Replicate popular Metric Reliability Not necessarily “efficiency”

7 Data trading 7 Our solution Data trading “I’ll store a copy of your collection if you’ll store a copy of mine” Sites make local decisions Who to trade with How many copies to make How much space to provide Etc.

8 Data trading 8 Trading network A series of binary, peer-to-peer trading links A D B H C E G F

9 Data trading 9 Reliability layer Archived data Architecture Users Filesystem InfoMonitor SAV Archive Archived data Internet Local archive Remote archive Reliability layer Service layer This architecture developed with Arturo Crespo

10 Data trading 10 Overview Trading model Trading algorithm Optimizing (and simulating) trading Some results Some stuff we are still working on

11 Data trading 11 Trading model

12 Data trading 12 Trading model Archive site: an autonomous archiving provider

13 Data trading 13 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials

14 Data trading 14 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections

15 Data trading 15 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials

16 Data trading 16 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials Data reliability: probability that data is not lost

17 Data trading 17 Deeds A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred Trading algorithm Sites trade deeds Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University

18 Data trading 18 CA B Deed trading Collection 1 Collection 2 Collection 3

19 Data trading 19 C The challenge A B Collection 3 Collection 1 Collection 2Collection 1 Collection 2 Collection 3

20 Data trading 20 C The challenge A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

21 Data trading 21 Alternative solutions Are there other ways besides trading?

22 Data trading 22 Other solutions: central control C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

23 Data trading 23 Other solutions: client-based C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

24 Data trading 24 Other solutions: random C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

25 Data trading 25 Why is trading good? High reliability Framework for replication Site autonomy Make local decisions No submission to external authority Fairness Contribute more = more reliability Must contribute resources A D B H C E G F

26 Data trading 26 Decisions facing an archive Who to trade with How much to trade When to ask for a trade Providing space Advertising space Picking a number of copies Coping with varying site reliabilities What to do with acquired resources How to deliver other services Many many degrees of freedom!

27 Data trading 27 Our approach Define a basic trading protocol Deed trading Assume all sites follow same rules Basic system for trading Extend: not all sites are equal Some are more reliable or trusted Extend: sites have freedom to negotiate Bid trading Extend: some sites are malicious Ensure documents survive despite evildoers For each model, what policies are best?

28 Data trading 28 How do we evaluate policies? Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

29 Data trading 29 Simulation parameters Number of sites2 to 15 Site reliability0.5 to 0.8 Collections per site4 to 25 Data per collection50 Gb to 1000 Gb Space per site2x data to 7x data Replication goal2 to 15 copies Scenarios per simulation 200

30 Data trading 30 Reliability Site reliability Will a site fail? Example: 0.9 = 10% chance of failure Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

31 Data trading 31 Basic trading approach How does trading work? Assuming all sites follow “the rules” Example: advertising policy “Let’s trade. How much space do you have?” A B

32 Data trading 32 Advertising policy “I have 120 GB” 120 GB Space fractional policy “I have 60 GB” 60 GB Data proportional policy “I have 40 GB” 40 GB Data A B A B A B

33 Data trading 33 Result

34 Data trading 34 Extend: some sites > others May prefer certain sites More reliable Better reputation Part of same system Example: who to trade with? ? ? ? A

35 Data trading 35 Who to trade with?

36 Data trading 36 Extend: freedom to negotiate Bid for trades “80 GB” “95 GB” “120 GB” “How much do I pay for 100 GB of your space?” A

37 Data trading 37 Bid trading Questions When do I call auctions? How much do I bid? Can I take advantage of the system by being clever?

38 Data trading 38 Extend: some sites are malicious Secure services Publish: Makes copies to survive failures Search: Find documents Retrieve: Get a copy of a document Challenges Attacker may delete copy Attacker may provide fake search results Attacker may provide altered document Attacker may disrupt message routing … Joint work with Mayank Bawa and Neil Daswani

39 Data trading 39 Current and future work Access Support searching over collections Distribute indexes via trading Prototype implementation Basic SAV architecture implemented Trading protocol/policies must be added Develop security techniques further

40 Data trading 40 Current and future work Other topics of interest Designing peer-to-peer primitives Building other p2p services Other ways of acquiring data How to archive active systems Semantic archiving Managing “format obsolescence” Finding data once it is archived

41 Data trading 41 Other parts of SAV project SAV data model Write-once objects Signature-based naming How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.) Modeling archival repositories Arturo Crespo Choose best components and design

42 Data trading 42 Related work Peer-to-peer replication SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Deep storage (Tivoli) Barter/auction based systems ContractNet Distributed resource allocation File Allocation Problem

43 Data trading 43 Conclusion Important, exciting area Preservation critical Difficult to accomplish Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions Trading networks replicate data Model for trading networks Trading algorithm Simulation results A D B H C E G F

44 Data trading 44 For more information cooperb@stanford.edu http://www-diglib.stanford.edu/


Download ppt "Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University."

Similar presentations


Ads by Google