The Australian Virtual Observatory Clusters and Grids David Barnes Astrophysics Group
Overview What is a Virtual Observatory? Scientific motivation International scene Australian scene DataGrids for VOs ComputeGrids for VOs Sketch of AVO DataGrid and ComputeGrid Clustering experience at Swinburne
What is a Virtual Observatory? A Virtual Observatory (VO) is a distributed, uniform interface to the data archives of the worlds major astronomical observatories. A VO is explored with advanced data mining and visualisation tools which exploit the unified interface to enable cross-correlation and combined processing of distributed and diverse datasets. VOs will rely on, and provide motivation for, the development of national and international computational and data grids.
Scientific motivation Understanding of astrophysical processes depends on multi-wavelength observations and input from theoretical models. As telescopes and instruments grow in complexity, surveys generate massive databases which require increasing expertise to comprehend. Theoretical modeling codes are growing in sophistication to consume available compute time. Major advances in astrophysics will be enabled by transparently cross-matching, cross-correlating and inter-processing otherwise disparate data.
Sample multi-wavelength data for the galaxy IC5332 (Ryan-Weber) blueH-alpha spectral lineinfrared HI spectral line column density HI spectral line velocity field HI spectral line velocity dispersion
HI profile from public release
International scene AstroGrid ( – phase A (1yr R&D) complete; phase B (3yr implementation) funded £3.7M. Astrophysical Virtual Observatory ( – phase A (3yr R&D) funded 4.0M. National Virtual Observatory ( vo.org) – (5yr framework development) funded USD 10M.
Australian scene Australian Virtual Observatory ( – phase A (1yr common-format archive implementation) funded AUD 260K (2003 LIEF grant [Melb, Syd, ATNF, AAO]). Data archives are: –HIPASS: 1.4 GHz continuum and HI spectral line survey –SUMSS: 843 MHz continuum survey –S4: digital images of the southern sky in five optical filters –ATCA archive: continuum and spectral line images of the southern sky –2dFGRS: optical spectra of >200K southern galaxies –and more...
DataGrids for VOs archives listed on previous slide range from ~10 GB to ~10 TB in processed (reduced) size. providing just the processed images and spectra on-line requires a distributed, high- bandwidth network of data servers – that is, a DataGrid. users may want some simple operations such as smoothing or filtering, applied at the data server. This is a Virtual DataGrid.
ComputeGrids for VOs More complex operations may be applied requiring significant processing: –source detection and parameterisation –reprocessing of raw or intermediate data products with new calibration algorithms –combined processing of raw, intermediate or "final product" data from different archives These operations require a distributed, high- bandwidth network of computational nodes – that is, a ComputeGrid.
Melbourne Adelaide Canberra Sydney Parkes? Swinburne DataCPU? Data ATNF/AAO Theory? HIPASS Gemini? ATCA 2dFGRS RAVE SUMSS CPU Theory Grangenet Possible initial players in the Australian Virtual Observatory Data and Compute Grids… APAC CPU VPAC CPU Theory
Swinburne 1998 – 2000: 40 Compaq Alpha workstations 2001: +16 Dell dual PIII rackmount servers 2002: +30 Dell dual P4 workstations mid 2002: +60 Dell dual P4 rackmount servers November 2002: placed 180 th in Top500 with 343 sustained Gflop/s. (APAC 63 rd with 825 Gflop/s) +30 Dell dual P4 rackmount servers installed mid 2002 at the Parkes telescope in NSW. psuedo-Grid with data pre-processed in realtime at the telescope, shipped back in slowtime.
Swinburne activities N-body simulation codes: –galaxy formation –stellar disk astrophysics –cosmology Pulsar searching and timing –(1 GB/min data recording) Survey processing as a coarse-grained problem Rendering of virtual reality content
Clustering costs nodesprice/nodeprice/cpu 1 cpu, 256MB std mem, 20GB disk, ethernet 1.3K 2 cpu, 1 GB fast mem, 20 GB disk, ethernet 4.4K 2.2K 2 cpu, 2GB fast mem, 60 GB SCSI disk, ethernet 8.0K 4.0K Giganet, Myrinet,...1.5K1.5K (1 cpu) 0.8K (2 cpu) (estimates incl. on-site warranty; 2 nd fastest cpu; excl. infrastructure)
Some ideas... desktop cluster – astro group has 6 dual-cpu workstations. –Add MPI, PVM, Nimrod libs and Ganglia monitoring tool to get 12-cpu loose cluster with 8GB mem. –Use MOSIX to provide transparent job migration with workstations joining the cluster at night-time. pre-purchase cluster – univ. buys ~500 desktops/yr – use them for ~6 months! –build up a cluster of desktops purchased ahead of demand, and replace as deployed to desktops. –Gain compute power of new CPUs without any real effect on end-users.