Introduction to Sky Survey Problems Bob Mann. Introduction to sky survey database problems Astronomical data Astronomical databases –The Virtual Observatory.

Slides:



Advertisements
Similar presentations
Trying to Use Databases for Science Jim Gray Microsoft Research
Advertisements

World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.
Australian Virtual Observatory A distributed volume rendering grid service Gridbus 2003 June 7 Melbourne University David Barnes School of Physics, The.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and.
AstroGrid Consortium Meeting PM Report AstroGrid Consortium Meeting Overview Activities Finance Recruitment Collaboration Phase B.
National Center for Supercomputing Applications 259 th fastest computer in the world Michael Remijan NCSA –Research Programmer –Web-based Distributed Programming.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
The aims of SC4DEVO and SC4DEVO-1 Bob Mann Institute for Astronomy and National e-Science Centre, University of Edinburgh.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
Leicester Database & Archive Service J. D. Law-Green, S. W. Poulton, J. Osborne, R. S. Warwick Dept. of Physics & Astronomy, University of Leicester LEDAS.
SDSS Web Services Tamás Budavári Johns Hopkins University Coding against the Universe.
AstroGrid Group 7: Teemu Toivola Tero Viitala. Problem several separate databases no common interface between databases difficulties of joining related.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
Data provenance in astronomy Bob Mann Wide-Field Astronomy Unit University of Edinburgh
Requirements from astronomy in the Virtual Observatory era Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
Clive Page University of Leicester Meeting at ROE January 25 (1)Cross-matching Catalogues (2)Column-based storage for data exploring.
VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.
11/27/2003IVOA Small Projects Meeting China-VO Data Access Service Based on OGSA Jian Sang National Astronomical Observatory of China Chinese Virtual.
Can RDB2RDF Tools Feasible Expose Large Science Archives for Data Integration?  Alasdair J G Gray (University of Glasgow now Manchester)  Norman Gray.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
EdSkyQuery-G Overview Brian Hills, December
The ASDC SED Builder Milvia Capalbi (INAF-ASDC) in collaboration with Paolo Giommi (ASI-ASDC), Giulia Stratta (INAF-ASDC), Roberto Primavera (ElsagDatamat)
Science Archive for Sky Surveys Data Providers and the VO - NeSC 2003 March Wide Field Astronomy Unit Institute for Astronomy.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
NEON Obs School 11-Aug-2005 Archival Data and Virtual Observatories 1 Virtual Observatories...or how to do your research from a beach in the Bahamas rather.
Prototype system of the Japanese Virtual Observatory The Japanese Virtual Observatory (JVO) aims at providing easy access to federated astronomical databases.
Public Access to Large Astronomical Datasets Alex Szalay, Johns Hopkins Jim Gray, Microsoft Research.
WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file.
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.
2003 Apr 81 Indexing the Sky Clive Page Apr 82.
G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University.
AstroGrid: The UK’s Virtual Observatory Dr Dugan Witherick – Astrophysics Group, UCL Wednesday 5 th December 2007 The University of Warwick.
1 10-June-2004Andy Lawrence : PPARC data curation panel meeting AstroGrid, Data Centres, & Edinburgh What is curation ? Data Centres in the VO era Data.
Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by.
Wiss. Beirat AIP, ClusterFinder & VO-Methods H. Enke German Astrophysical Virtual Observatory ClusterFinder VO Methods for Astronomical Applications.
Association techniques for the Virtual Observatory Bob Mann.
Astronomical Data Archiving and Curation Clive Page AstroGrid Project University of Leicester 2004 March 22.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
A PPARC funded project Workflow and Job Control in Astrogrid Jeff Lusted Dept Physics and Astronomy University of Leicester.
Japanese Virtual Observatory Project Abstract : The National Astronomical Observatory of Japan (NAOJ) started the Japanese Virtual Observatory (JVO) project.
Who are we ? what is a VO ? what is a Grid ? how do we get there ? Andy Lawrence S.P.I.E. Hawaii Aug 2002 AstroGrid
Sky Survey Database Design National e-Science Centre Edinburgh 8 April 2003.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
The Large Synoptic Survey Telescope Project Bob Mann Wide-Field Astronomy Unit University of Edinburgh.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
NeSC/eDIKT & AstroGrid Phase B NeSC and IBM Grid work NeSC and IBM Grid work NeSC/WFAU sky survey DB design work NeSC/WFAU sky survey DB design work Astro-related.
William O’Mullane/ Tannu Malik - JHU IVOA Cambridge May 12-16, 2003 SkyQuery.Net SKYQUERY Federated Database Query System (using WebServices)
How to represent coverage: temporal, spectral, positional Clive Page AstroGrid Project University of Leicester 2003 March 19.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China ,
Annotation of “special structures” in astronomy Bob Mann Institute for Astronomy and National e-Science Centre University of Edinburgh.
VO Data Access Layer IVOA Cambridge, UK 12 May 2003 Doug Tody, NRAO.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Catalogs contain hundreds of millions of objects
Moving towards the Virtual Observatory Paolo Padovani, ST-ECF/ESO
Planning Observations
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Google Sky.
Presentation transcript:

Introduction to Sky Survey Problems Bob Mann

Introduction to sky survey database problems Astronomical data Astronomical databases –The Virtual Observatory – concept & status –Large sky survey databases Spatial indexing in astronomical databases Case Study: SDSS & SkyServer

Observational Astronomy Electromagnetic spectrum IRAS 25  2MASS 2  DSS Optical IRAS 100  NVSS 20cm GB 6cm ROSAT ~keV WENSS 92cm

Astronomical data – in original form Optical –Image: array of pixel values X-ray –Event list: positions, arrival times, energies of all detected photons Radio –Interferometric visibilities: sparse Fourier transform of a region of the sky Very different types of data

Astronomical data – in final form Most research done using catalogue data –i.e. tables of attributes of detected sources – mainly discrete sources (stars, galaxies, etc) –Data compression Catalogue - few% of image data volume –Amenable to representation in relational DB Natural indexing by location in sky

Astronomical Databases Sky survey archives –Homogeneous data, standard reduction pipeline –“Science Archive” – do science on DB Telescope archives –Semi-indexed collections of raw data files from all observations taken – heterogeneous –Download data for reduction and analysis Specialist data centres – collections of catalogues Bibliographic databases– scans of major journals

The Virtual Observatory Concept: –Interoperable federation of all the world’s significant astronomical databases –Facilitate multi-wavelength astronomy Status: –Several projects underway – AstroGrid in UK –5+ years’ work to create a fully working VO The VO sets the context for the design of new sky survey databases

AstroGrid: Consortium: –Edinburgh, Leicester, Cambridge, RAL, MSSL, Jodrell Bank, Queens Belfast 3 year (~£4M) project: –1 yr Phase A Study – finished end of 2002 –2 yr Phase B Implementation – to end 2004 Web (later Grid) service framework; in Java Web (later Grid) service framework; in Java Currently building web services, portals, etc - researching OGSA and OGSA-DAI

Large sky survey databases Major science driver for AstroGrid – and VO –New science – mining multi-wavelength data Largest are optical/near-infrared sky surveys Largest of these hosted in Edinburgh: –current - SuperCOSMOS, SDSS (mirror) –future - WFCAM, VISTA –Each yield 1-10TB of catalogue data in RDBMS

Spatial queries in astronomy Two important types: –Select entries (with predicate) in area of sky –Match entries (esp. between two tables) Second is special case of first –i.e. both boil down to “point-within-distance-of- point” –but distances in two cases can be very different Advantage in using a hierarchical spatial indexing scheme –Perform spatial query at appropriate granularity

Spatial Indexing in Astronomy The Celestial Sphere Many coordinate systems Most common is the equatorial system, with Right Ascension and Declination as analogues Declination as analogues of Longitude & Latitude of Longitude & Latitude

Spatial indexing in astronomical databases Basic DBMS indexes are 1-D – e.g. B-trees Some DBMSs support general 2-D indexing –Usually using R-trees (or variants) – rectangles: astronomical experiments not too successful: [Clive] Some DBMSs have native spatial indexing –Little knowledge of this in astronomy - want to know more But The Celestial Sphere is a sphere(!) –Many geographical spatial DBs use planar projections So, astronomers have felt the need to develop spatial indexing prescriptions of their own

Hierarchical Triangular Mesh - HTM Developed by Sloan survey archive team at JHU Start with projection of octahedron on sphere and subdivide triangles at their midpoints Generate unique pixel ID code based on position in the sky and level in hierarchy – can index that with B-tree

Hierarchical Equal Area Iso- Latitude Pixelisation (HEALPix) Developed by Kris Gorski (now JPL/Caltech) Start with division of sphere into twelve equal area curvilinear quadrilaterals, then divide each into four then divide each into four Like HTM, produces a pixel code on which a pixel code on which a B-tree index can be made B-tree index can be made (Ian – HEALPix in Oracle?)

Sky survey DB case study: SkyServer for SDSS Sloan Digital Sky Survey (SDSS): –first of new generation of sky surveys US-led team, dedicated telescope & camera Image half of northern sky in 5 optical bands Then obtain optical spectra for 1,000,000 galaxies Estimated ~1TB of catalogue data

SDSS Archive First of new generation of sky survey archives –Represents the state-of-the-art in sky survey databases Developed by Alex Szalay’s team at Johns Hopkins Project started in earnest in about 1996 –OODBMSs seen as the coming thing –SDSS chose Objectivity/DB for their archive: ~15 staff-years of effort later, they’d rewritten much of the DBMS themselves…and then jumped ship and started using MS SQL Server! - SkyServer (in collaboration with Jim Gray, MS Research)

SkyServer design considerations Power & flexibility to pose arbitrary queries Simple – astronomers ignorant of SQL! Hide messy spherical trigonometry –Distance on sphere between (a1,d1) and (a2,d2) is given in SQL by 2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) + cos(radians(d1))*cos(radians(d2))* square(sin(0.5*(radians(a1-a2))))) 2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) + cos(radians(d1))*cos(radians(d2))* square(sin(0.5*(radians(a1-a2))))) –Don’t want users typing this –Don’t really want DBMS to evaluate expressions like this often

SkyServer spatial queries Simple table-valued functions exposed to user: –E.g. select count(*) from fGetNearbyObjEq(a,d,radius) from fGetNearbyObjEq(a,d,radius) (a,d)=(Right Ascension, Declination) Functions call SQL Server Extended Stored Procedure –HTM index manipulation routines, implemented in a Dynamically Linked Library (DLL) –DLL generated from HTM package in C++

Lessons from HTM implementation in SkyServer SQL is not great for spherical trigonometry –Messy to write, slow to compute Have to define stored procedures/functions –Expose a clean interface to users –Let them pose queries the way they want to Replace trig operations by integer arithmetic –Library of HTM index operations underneath Precompute tables of neighbouring objects –Far fewer spatial match operations at query time

Problems with this approach How easy to develop stored procedures, etc? –Needs detailed knowledge of DBMS –Extended Stored Procedure calls slow How well will query optimiser use HTM? –…less well than built-in spatial index?… …but that might be poorly suited to astronomical applications… How easy to implement all this in DBMSs other than SQL Server? But this works reasonably well in practice!