Download presentation
Presentation is loading. Please wait.
Published byScot Morgan Modified over 9 years ago
1
TeraGrid Archival Migration of Data to the XD Era Phil Andrews et al The Moving Finger writes; and having writ, Moves on; nor all your Piety nor Wit Shall lure it back to cancel half a Line, Nor all your Tears wash out a Word of it. -Omar Khayyam Users’ view: When they give us their data, they expect it to be available even when the original recipient is not.
2
2 Significant Archival Data (~20PB) is at TG RP sites unfunded in XD TeraGrid Quarterly, Dec’08 What to do about data at current TeraGrid RP sites that do not yet have funds for the XD era? Do we have a communal obligation to continue data availability past the funding of centers that accepted it? The NSF thinks so! It’s Later Than You Think! – A Tale of Two Cities, Charles Dickens
3
3 Task Force to consider issue Members from most (maybe all) sites Send me Email (andrewspl@utk.edu) if you want to participateandrewspl@utk.edu NSF wants to see plan, encouraging idea of general replication approach at remote sites If replicate data at currently unfunded sites, then we are covered whatever happens Awkward funding implications TeraGrid Quarterly, Oct’08
4
4 More than one approach possible TeraGrid Quarterly, Dec’08 In the past, moved one archive (CTC) physically, another (PSC) across the network. Both moves successful, never replicated entire archive. Network moves require several months. Physical moves very concerning. Data offline or frozen during move. A merry road, a mazy road, and such as we did tread The night we went to Birmingham by way of Beachy Head! – G.K Chesterton
5
5 How much data are we talking? Approximately 10 PB total at each of SDSC and NCSA Other sites also have significant data 10 Gb/s = 10 PB/100 days Only TACC, NICS, PSC, continually funded into XD era at the moment NCSA -> track1 funding TeraGrid Quarterly, Oct’08
6
6 Option 1: Physical move TeraGrid Quarterly, Dec’08 Advantages: can wait until last minute, possibly funding neutral, doesn’t stress network, keeps physical resources in TG Disadvantages: dangerous, data unavailable for weeks, site could regain funding later, new host must handle format The Nuclear option: very awkward, mixed data tapes, lays waste to an existing archive. Forced upon us if we wait too long! Out of this nettle, danger, we pluck this flower, safety. – Shakespeare
7
7 Option 2: Network Transfer Cannot move 20 PB in any reasonable time Must rely on only 2-3 PB real data/site Advantages: Data checked during transfer. No danger of data loss. Site can recover. Disadvantages: ties up network, people resources. Long process. Doubles archival requirements. TeraGrid Quarterly, Oct’08 For though his body ’s under hatches, His soul has gone aloft. – Charles Dibdin
8
8 Option 3: Archival Replication Advantages: More general approach; increases TeraGrid value added. Intellectually stimulating rather than maudlin Disadvantages: more involved process. Could lead to drastically increased archival requirements. TeraGrid Quarterly, Oct’08 There is a tide in the affairs of men Which taken at the flood, leads on to fortune– Shakespeare
9
9 Replication Approaches: General Middleware: 1. iRODS can do replication, but must manage the data. Can’t import general or SRB data 2. SRB is slow Infrastructure: 1. HPSS archives can be connected via wide area GPFS. (HPSS 6.2.2, June’08) TeraGrid Quarterly, Oct’08
10
10 What to do now? The clock is ticking; if we are to investigate options, must do it soon SDSC, TACC, NCSA looking at iRODS SDSC runs HPSS as one archive, and exports GPFS Propose trying the GPFS-HPSS Integration (GHI) approach for replication between HPSS archives TeraGrid Quarterly, Oct’08
11
11 Using WAN GPFS to connect Archives
12
12 Will other file systems work? Can we use other approaches for Lustre? pNFS does have a proposed mechanism for replication via caching: Panache Will global file systems and HPSS come in pairs? Is a more general (but less efficient) middleware approach (iRODS?) preferable? TeraGrid Quarterly, Oct’08 Pay no attention to that man behind the curtain – L. Frank Baum
13
13 GHI status Some features already released Multiple HPSS archives not there yet; due next year Timing could be tricky, but could start with pre-release software Due for beta testing at NERSC, NCSA GPFS, HPSS guys interested (spoke at SC) TeraGrid Quarterly, Oct’08
14
14 Discussion: Is replication worth the effort? Will sites be prepared for physical move, if necessary? If no physical move, how do we fund resources? Do we let users say: “move everything”? We need an inventory of data! Are users rendering this discussion moot? TeraGrid Quarterly, Oct’08
15
15 Philosophy: Current funding approach allows continual ebb and flow of RP sites: we can handle Computational impact, but not Archival! Need Archival organization that allows for a frequent gain and loss of Data RP’s Hard to wait for XD to solve this problem TeraGrid Quarterly, Oct’08 I must go in and out – Bernard Shaw
16
16 Need to know what data is where: We don’t know which site has how much data, and on what media Different media can have major impacts on how quickly it can be moved or replicated Need a good story to go to NSF with funding request for a better Archival organization Need a Data census! TeraGrid Quarterly, Oct’08 In those days a decree went out from Caesar Augustus that the whole world should be counted – Luke, 2:1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.