Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating Trading Networks

Similar presentations


Presentation on theme: "Creating Trading Networks"— Presentation transcript:

1 Creating Trading Networks
Thom Lutkenhouse Digital Preservation March 25, 2004

2 By Brian Cooper and Hector Garcia-Molina
Paper Covered: "Creating Trading Networks of Digital Archives” - Proceedings of JCDL 2001 By Brian Cooper and Hector Garcia-Molina

3 Focus of Research “Preserving the Bits”
Developing P2P trading networks considering: Varying reliability rates Political affiliations between archives Different levels of archival investment Estimation of global failure rates

4 Outside the Scope Legal issues Format and platform changes
Access issues Provenance

5 Digital Collection “A set of related digital material that is managed by an archive site. Examples include issues of a digital journal, geographic information service data, or a collection of technical reports. . . we consider the collection to be a single unit for the purposes of replication.”

6 Archival Storage Calculation
Bytes of local digital collection Ptotal (Public Storage to Trade) N Total Storage Space Ptotal = F X N - N

7 Data Reliability Global data reliability: the probability that no collection owned by any site is lost Local data reliability: the probability that no collection owned by a particular site is lost

8 Calculation of Global Reliability
SITE A Each site represents an archive, each number a digital collection stored in that archive. Here we assume we can accurately estimate local data reliability, and for this example assume each site has a reliability of 0.9. (Each site has 10% chance of data loss) 1 3 SITE B 2 3 SITE C 3 1 2

9 Calculation of Global Reliability
Probability of losing collection 1: 0.1 * 0.9 * 0.1 = 0.009 Probability of losing collection 2: 0.9 * 0.1 * 0.1 = 0.009 Probability of losing collection 3: 0.1 * 0.1 * 0.1 = .001 Sum of above = .019 = 0.981 Global Data Reliability SITE A 1 3 SITE B 2 3 SITE C 3 1 2

10 Mean Time to Failure Given data reliability over a certain interval we can calculate Mean Time to Failure (MTTF). This is the expected number of years before data loss. MTTF is the principal metric used to judge the effectiveness of the simulated trading strategies.

11 Clusters Sites that have agreed to form partnerships for political, social or economic reasons. e.g., all libraries in a state university system

12 Building the Trading Network
Determine the number of sites in the network Estimate the reliability of each site Past behavior of site Components of site’s storage mechanism Reputation of site or institution

13 Deeds “A deed represents the right of a local site to use space at a remote site. Deeds can be used to store collections, kept for future use, transferred to other sites that need them or split into smaller deeds.”

14 Clustering ( trading with sites you’ve traded with before )
Conducting Trades Trades are executed by means of a Deed Trading algorithm that is run at each participating site in accordance with its local Trading Strategy. The Trading Strategy determines the order in which other sites will be contacted to initiate a trade. Strategies may include: Best Fit Worst Fit Clustering ( trading with sites you’ve traded with before ) Best Reliability

15 Weighted Trading Available space is weighted by reliability in determining fair trade. Example: 100 GB X 0.75 Reliability = 75 GB X 1.00 Reliability

16 Deed Trading Algorithm
Deeds with enough space available? Determine size of deed needed and number of copies to make Check if Adequate Local Space for Trade No Yes Check if Other Sites Have Deeds for Trade Site Choose a site to trade with based on Trading Strategy Enough space? No No Enough space? Yes Yes Trade Have deed? Check Available Space at Trade Site No Copy Collection Yes Big enough? No Seek Deed for Remaining Needed Space No Enough copies? Yes Yes Done

17 Simulation Results

18 Trading Policy Results
“. . .it is always best for the high reliability sites to use the closest reliable strategy, and for the low reliability sites to use clustering.”

19 Same Size vs. Weighted Strategy

20 Reliability Estimates
“. . .when estimates are innaccurate by 30 percent, archives using closest reliability can only achieve a local MTTF of 200 years, versus 500 in the ideal case.”

21 Cluster Sizes

22 Weaknesses of the System
Class warfare: High reliability sites realize maximum performance when they trade exclusively amongst themselves, but the system as a whole performs best when site reliability is ignored Accurate estimation of site reliability: Difficult to account for all factors: hardware, bankruptcy, natural disasters, war, terrorism, interdimensional crossrips

23 Further Work Distributed access services
Additional compensation means: Money Processing power Accomodating more dynamic collections

24 Thank you!


Download ppt "Creating Trading Networks"

Similar presentations


Ads by Google