Presentation is loading. Please wait.

Presentation is loading. Please wait.

Astronomical Data Archiving and Curation Clive Page AstroGrid Project University of Leicester 2004 March 22.

Similar presentations


Presentation on theme: "Astronomical Data Archiving and Curation Clive Page AstroGrid Project University of Leicester 2004 March 22."— Presentation transcript:

1 Astronomical Data Archiving and Curation Clive Page AstroGrid Project University of Leicester 2004 March 22

2 Importance of Data Archiving in Astronomy No observation can be repeated exactly, as the sky is always changing –After a violent event (e.g. supernova explosion) earlier observations are crucial Observations over a long period can identify –Variability –Proper motions In recent years all data come in digital form Important earlier datasets on photographic plates have now mostly been digitised.

3 Principal Data Types in Archives Raw data from telescopes Observing logs Calibration datasets Calibrated/reduced data: –Images –Spectra –Time-series Derived data products: –Source catalogues –Sky survey image collections

4 Data Formats A variety, but FITS format predominates: –FITS can store arrays and tables, and encapsulates data and metadata, but… Standards have evolved, older FITS files less compatible Individual observatory conventions also exist Metadata vital - sometimes to be found only: –In associated software packages or documentation –In the heads of those developing the software

5 Important UK data archive sites Cambridge - Astronomical Survey Unit (CASU): –INT wide-field survey, APM catalogue, VIZIER mirror, UKIRT archive. In future: WFCAM, VISTA. Edinburgh – Wide-field Astronomy Unit (WFAU) –SuperCOSMOS images and catalogue, 6df galaxy survey, SLOAN DSS copy. In future: WFCAM, VISTA. Leicester - Data Archive Service (LEDAS): –EXOSAT, GINGA, ASCA, ROSAT, XMM; Chandra mirror, many optical datasets. In future: SWIFT, SuperWASP source archive.

6 Important UK data archive sites (continued) Manchester - Jodrell Bank: –Merlin, HI surveys, European VLBI datasets, pulsar catalogues. Future: e-Merlin archive. Rutherford Laboratory: –World Data Centre for STP, CLUSTER and ISO UK data centres, Starlink software collection and data archive. In future: SuperWASP image archive. UCL - Mullard Space Science Laboratory: –YOHKOH, SOHO, TRACE, ReSIK and other solar/STP archives.

7 Database management systems DBMS currently used by UK archives include: –BROWSE – written at ESOC/ESTEC in 1980s. –DB2 (IBM) –Ingres –miniSQL – free simple DBMS –MySQL – open source, supports many web sites –PostgreSQL – open source, good spatial indexing –SQL Server (Microsoft) –Sybase ASE –WFCtools – written at Harvard/SAO for accessing large optical catalogues

8 User access methods Residual telnet/ssh services –Allows registered users to perform DBMS operations store their own subsets etc. –Mostly obsolescent FTP access for large downloads Web interfaces use CGI with Perl, PHP, or Python –Results mostly returned as HTML tables/GIFs, with some FITS and VOtable. No use (pre-AstroGrid) of XML-based Web Services (Xforms, SOAP, WSDL etc.)

9 Problems – (1) technical Data storage: thanks to Moore’s Law, new datasets are much bigger than old ones. May get adequate storage for existing data from: –new big projects like WFCAM, SWIFT, e-MERLIN, VISTA? –SRIF funding? International Virtual Observatory Alliance (IVOA) is developing new standards e.g. for tabular data, registry, query language. –These have to be implemented before fully stable. DBMS: freeware like MySQL, PostgreSQL improving rapidly, probably adequate. –If not, licence costs may be substantial. Database middleware (OGSA-DAI, ELDAS) –still developing, not quite ready for large-scale use

10 Problems – (2) structural Data preservation requires migration to new platforms, new DBMS every few years Many DBMS in use are incapable of supporting functionality required e.g. no spatial indexing –Also implies migration to new DBMS AstroGrid (and other VO projects) will supply the middleware, but have no remit (and no funding) to update the archives themselves. Serious data mining research will require serious processing power near the data stores (e.g. an Astronomical Data Warehouse).

11 Problems – (3) managerial VO software from AstroGrid includes MySpace: a temporary user space on remote systems. –Optional, but highly desirable because of need to “shift the results not the data” –will sites give space to users unknown to them? –how to administer many ad-hoc groups of users? Creation of the VO Registry will require considerable input from managers of existing data archives – exact mechanism TBD.

12 Manpower Additional manpower needed for: Migration of existing data collections to new platforms, and often to new DBMS Installation of AstroGrid and other VO software Provision of metadata to the Registry Implementation and operation of MySpace Setting up astronomical data warehouse facilities at a few sites

13 Funding problems SRIF funding is for hardware only, not manpower AstroGrid2 bid failed to get support for elements of data centre support PPARC grant applications to support data archiving and curation have an unhappy history: they tend to fall between research and projects funding lines.

14 Summary Archives have a vital role in astronomy –They are basically in good shape in that no important bits have been lost (as far as I know) –But we have been muddling through Technical problems look soluble Data storage – we may be able to find enough Much work needed on current archives for them to survive into the VO era. Additional skilled manpower will be essential – sources of support for this are lacking Continuity is vital for archives – this is a long- term problem with no obvious solution.


Download ppt "Astronomical Data Archiving and Curation Clive Page AstroGrid Project University of Leicester 2004 March 22."

Similar presentations


Ads by Google