Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of Edinburgh
2/15 Outline Who we are Introduction to the Wide-Field Astronomy Unit What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions
3/15 Outline Who we are Introduction to the Wide-Field Astronomy Unit What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions
4/15 Wide-Field Astronomy Unit Funded to curate optical and near-infrared sky survey data for UK (and European) community Based at Royal Observatory Edinburgh ~35 years of sky survey data curation at ROE Evolving data holdings: Photographic plates Digital scans of photographic plates Born-digital data WFAU formed in 1999: group moved into UoE Currently 12 grant-funded + 2 academic staff Mix of astronomers, IT professionals & hybrids
5/15 Outline Who we are Introduction to the Wide-Field Astronomy Unit What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions
6/15 Sky survey data life-cycle: e.g. WFCAM Images taken at telescope UKIRT, in Hawaii Data reduction pipeline run in Cambridge Removes instrumental signatures Produces final, clean images Detects and characterises sources in images Data transferred to Edinburgh Ingest source catalogues and image metadata into relational database, store image files on disk Combine data from multiple nights: new images, cats. Publish release databases via web interface On per night basis
7/15 WFAU’s main survey archives Past: SuperCOSMOS Based on digital scans of photographic plates Database: ~5TB: largest tables ~10 9 rows Images: ~35,000 user requests (10GB) per month Present ( ): WFCAM Near-infrared: ~700 registered users ~500 million rows of database results per month ~125GB of flat file image data per month Near-future ( ): VISTA ~3 x data rates/volume of WFCAM
8/15 WFAU’s future plans Large Synoptic Survey Telescope US-led public/private project We’re trying to get UK to buy into it Data challenges immense WFCAM takes ~20TB of image data per year LSST will take ~20TB of image data per night: ~60PB images, ~8PB database ( ) LSST stimulating a lot of data management R&D in the US: Commercial: Google Academic: “Sci-DB” (M. Stonebraker, D. DeWitt)
9/15 The Virtual Observatory Goal: an interoperable federation of all the world’s astronomical data resources International Virtual Observatory Alliance Coordinates VO development worldwide Acts as W3C-like standards body for the VO AstroGrid: Only project to have developed a full VO system
10/15 Virtual Observatory components Registry Metadata for all data published to the VO Standard data access protocols For tabular data, images, spectra, time series, etc Standard web service wrappers for application code Enabling asynchronous calls, workflow, etc Distributed data storage system Presenting transparent aggregated logical view to user
11/15 Curation challenges for WFAU More data analysis services in the data centre Data volumes too large for user download WFAU must provide data analysis services & hardware Integration of data and knowledge Third-party annotations which can be used in queries “Object X in database Y is a quasar” “X-ray source A is the same object as radio source B” Better linkage between archives and online literature Keeping staff up to date on technologies/techniques Mostly learn by doing – do we make best choices?
12/15 Outline Who we are Introduction to the Wide-Field Astronomy Unit What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions
13/15 WFAU and DCC: What we can do for you Case studies, exemplars, etc WFAU is a well-established, competent group Astronomy is a relatively small, cohesive community, used to interdisciplinary collaboration Astronomers are early adopters of IT and recognise value of data curation VO is a rich, functional e-Science infrastructure Collaborations to date: Raj Bose – distributed annotation service James Cheney – paper on data centre security
14/15 WFAU and DCC: What you can do for us Policy advice Increasingly need to convince research councils of benefits of long term data curation – cost/benefit Technical advice – from DCC or its Associates Should we use iRODS for LSST? Do any XML databases have decent performance? Do the VO metadata standards make sense? Curation manual When will the rest appear? Training e.g. NeSC course on relational database design
15/15 WFAU and DCC: Questions What is the DCC’s model for collaboration? Can’t collaborate with everyone on everything Scientists & digital librarians live in different worlds: how do you bridge that divide? Interdisciplinary work requires sustained interaction What do you want from scientific data curators? What can you offer us in return? Few of my colleagues know anything about the DCC Does that surprise you?