Distributed Processing of Future Radio Astronomical Observations Ger van Diepen ASTRON, Dwingeloo ATNF, Sydney
ADASS2007; GvD Contents Introduction Data Distribution Architecture Performance issues Current status and future work
ADASS2007; GvD Data Volume in future telescopes LOFAR 37 stations (666 baselines) grows to 63 stations (1953 baselines) 128 subbands of 256 channels (32768 channels) 666*32768*4*8 bytes/sec = 700 MByte/sec 5 hour observation gives 12 TBytes ASKAP (spectral line observation) 45 stations (990 baselines) 32 beams, channels each 990*32*16384*4*8 bytes/10 sec = 1.6 GByte/sec 12 hour observation gives 72 Tbytes One day observing > entire world radio archive ASKAP continuum: 280 GBytes (64 channels) MeerKAT similar to ASKAP
ADASS2007; GvD Key Issues TraditionallyFuture Size Few GBytes Several TBytes Processing time weeks-months < 1 day ModeInteractively Automatic pipeline Archived?Always Some data WhereDesktop Dedicated machine IO Many passes through data Only few passes possible Package used AIPS,Miriad,Casa,..?
ADASS2007; GvD Data Distribution Visibility data need to be stored in a distributed way Limited use for parallel IO Too many data to share across network Bring processes to the data NOT bring data to processes
ADASS2007; GvD Data Distribution Distribution must be efficient for all purposes (flagging, calibration, imaging, deconvolution) Process locally where possible and exchange as few data as possible Loss of a data partition should not be too painful Spectral partitioning seems best candidate
ADASS2007; GvD Architecture Connection types: SocketMPIMemoryDB
ADASS2007; GvD Data Processing A series of steps have to be performed on the data (solve, subtract, correct, image,...) Master get steps from control process (e.g. Python) If possible, step is directly sent to appropriate workers Some steps (e.g. solve) need iteration Substeps are sent to workers Replies are received and forwarded to other workers
ADASS2007; GvD Calibration Processing Solving non-linearly do { 1: get normal equations 1: get normal equations 2: send eq to solver 2: send eq to solver 3: get solution 3: get solution 4: send solution 4: send solution } while (!converged)
ADASS2007; GvD Performance: IO Distributed IO, yet 24 minutes to read 72 TByte once IO should be asynchronous to avoid idle CPU Deployment decision what storage to use Local disks (RAID) SAN or NAS Sufficient IO-bandwidth to all machines is needed Calibration and imaging are used repeatedly, so the data will be accessed multiple times BUT operate on chunks of data (work domain) to keep data in memory while performing many steps on them Possibly store in multiple resolutions Tiling for efficient IO if different access patterns
ADASS2007; GvD Performance: Network Process locally where possible Send as few data as possible (normal equations are small matrices) Overlay operations e.g. Form normal equations for next work domain while Solver solves current work domain
ADASS2007; GvD Performance: CPU Parallelisation (OpenMP,...) Vectorisation (SSE instructions) Keep data in CPU cache as much as possible, so smallish data arrays Optimal layout of data structures Keep intermediate results if not changing Reduce number of operations by reducing the resolution
ADASS2007; GvD Current status Basic framework has been implemented and is used in LOFAR and CONRAD calibration and imaging Can be deployed on cluster or super (or desktop) Tested on SUN cluster, Cray XT3, IBM PC cluster, MacBook Resource DB describes cluster layout and data partitioning. Hence the master can derive which processor should process with part of the data.
ADASS2007; GvD Parallel processed image (Tim Cornwell) Runs on ATNF’s Sun cluster “minicp” 8 nodes Each node = 2 * dual core Opterons, 1TB, 12GB Also on CRAY XT3 at WASP (Perth, WA) Data simulated using AIPS++ Imaged using CONRAD synthesis software New software using casacore Running under OpenMPI Long integration continuum image 8 hours integration 128 channels over 300MHz Single beam Use 1, 2, 4, 8, 16 processing nodes for calculation of residual images Scales well Must scale up hundred fold Or more….
ADASS2007; GvD Future work More work needed on robustness Discard partition when processor or disk fails Move to other processor if possible (e.g. replicated) Store data in multiple resolutions? Use master-worker in flagging, deconvolution Worker can use accelerators like GPGPU, FPGA, Cell (maybe through RapidMind) Worker can be a master itself to make use of BG/L in a PC cluster
ADASS2007; GvD Future work Extend to image processing (few TBytes) Source finding AnalysisDisplay VO access?
ADASS2007; GvD Thank you Joint work with people at ASTRON, ATNF, and KAT More detail in next talk about LOFAR calibration See poster about CONRAD software Ger van Diepen