Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
Archiving Issues NSF What do we save? Reduced Data? Raw Data? Calibration data? Data Products? Publications? How do we access the data? Google Search? (ADS) Through discipline Portal(s)? (VO registries) Through Data Center? Where does the data live? One or few Data Center(s)? Many Data Centers reachable through few Portals Number of Centers is limited by long-term funding
Migration Issues NSF Why do we migrate archives? Better access, Cheaper storage, more compact storage What do we migrate from an archive? Everything? Reduced Data only? Data Products? What do we do with the old media? Paper and glass are more stable than digital media! Magnetic tapes may be more stable than optical disks! Who pays? Is migration maintenance? Is a new, more useful archive being created?
Maintenance Issues NSF What are the costs? Space, Staffing Equipment maintenance and repair Backup protection? What is the safest way to back up a multi-Terabyte archive? Cloning to other sites improves access as well as providing backup How do we maintain old media? Old media may be more stable over time than new media! Can we maintain older, less compact data?
Some US Astronomical Archives NSF Online (All NASA Funded) Hubble Space Telescope 17.6 Terabytes Two-Micron All-Sky Survey 5 Terabytes Sloane Digital Sky Survey 1 Terabyte online (50 more offline) Palomar-QUEST 6 Terabytes (1/month since 9/2003) Off-Line NOAO Save-the-Bits 44.4 Terabytes (7.6 Terabytes/year) HPSSP (Harvard Plate Stack Scanning Project) 200 Terabytes Future LSST (Large Scale Synoptic Array) 7 Terabytes per night!
Growing Astronomical Catalogs ● 1989HST Guide Star Catalog 25,541,952 sources ● 1996USNO-A1.0 Catalog 488,006,860 sources ● 1998USNO-A2.0 Catalog 526,280,881 sources ● 2001GSC II Catalog (2.2.01) 998,402,801 sources ● 2002USNO-B1.0 Catalog 1,036,366,767 sources NSF
Virtual Observatory Portals NSF US: ADS (links from publications) Goddard (Skyview vizualization) JHU,NCSA,Caltech (VO Registry modelling) IPAC (IRSA, etc.) SAO WCSTools (desktop catalog access) England: The Grid France: CDS (Aladin/Vizier/Simbad)
International Virtual Observatory Alliance (IVOA) NSF Registries: Searchable databases containing descriptions of data available in the Virtual Observatory Data Model: Standards for data format and content VOTable: XML transfer format for metadata UCD: Uniform Column Descriptors (so everyone doesn't make up their own names for the same things) Data Access Layer (DAL): User interface Protocols: Open interfaces to large archives ease multi-level links (NSF funds US participation in IVOA)
IVOA Registries NSF Full Searchable Registry Full Searchable Registry Replicate Local Publisher (harvestable registry) Local Publisher (harvestable registry) Local Searchable Registry Client Data ReplicateReplicate DAL
● 500,000 glass plates covering the entire sky from ● Basis for fundamental discoveries in astronomy, such as using Cepheid variable stars as cosmic yardsticks ● A legacy of long-term commitment to astronomical photography and research ● Astronomy will not have an equivalent time frame from digital observations until Migrating Harvard's Astronomical Plate Collection from Paper and Glass to Bits CfA/PSSG,
International Astronomical Union Resolution B3, 2000 Safeguarding the Information in Photographic Observations The International Astronomical Union, Recognising that unless urgent action is taken, this unique historical record of astronomical phenomena will be lost to future generations of astronomers, Recommends the transfer of the historic observations onto modern media by digital techniques, which will provide worldwide access to the data so as to benefit astronomical research in a way that is well matched to the tools of the researcher in the future. CfA/PSSG,
Step 0: List what is in the archive (on the web) NSF
Typical large glass plate NSF
First: Digitize Metadata From hand-written cards and logbooks NSF
Digital access to plate metadata (interactive web page) NSF
Results of metadata search NSF
Next: Digital access to image data Move the plates out of the 20th century NSF
Proposed access to digital images User Stack Catalog search FITS or Tiff Image Archive (100 Terabyte) FITS Header Archive (WCS information) FITS extractor Object or coordinates and time Plate names and object (x,y) FITS images of plate portions NSF