Download presentation
Presentation is loading. Please wait.
1
Korea Astronomy and Space Science Institute
Software Correlator Jongsoo Kim Korea Astronomy and Space Science Institute
2
Future Correlator Projects
SKA phase I and phase II Test Correlator Verification Correlator SKA1 AIP and SKA2 Upgrade plan of a correlator for ACA (ALMA compact array) Korea, Japan, Taiwan are interested in correlators.
3
Correlators for Radio Interferometry
ASIC (Application-Specific Integrated Circuit): FPGA (Field-Programmable Gate Arrays): Japan, Taiwan, … Software (high level-languages, e.g., C/C++): Korea Rapid development Expandability …
4
Current Status of SC LBA (Australian Long Baseline Array)
8 antennas (Parkes, … 22-64m, GHz) DiFX software correlator (2006; Deller et al. 2007, 2011) VLBA (Very Long Baseline Array) 10 antennas (25m, 330MHz - 86GHz) DiFX MPIfR (the Max Planck Institute for Radio-astronomy) Mark4 DiFX
5
Current Status of SC (cont.)
GMRT (Giant Metrewave Radio Telescope) 30 antennas (45m, 50MHz-1.5GHz), 32MHz ASIC software correlator (Roy et al. 2010) LOFAR (Low Frequency Array) LBA (Low Band Antennae) 10-90MHz HBA (High Band Antennae) 110 – 250MHz IBM BlueGene/P: software correlation
6
Correlation Theorem, FX-correlator
F-step (FT): ~log2(Nc) operations per sample X-step (CMAC): ~ N operations per sample
7
FLOPS of the X-step in FX correlator
4 is from 8 is from 4 multiplications and 4 additions: N(N+1)/2 is the number of auto- and cross- correlations with antenna (station) N Dish array (N=250, B = 1 GHz, Nb=1) 16x2502 GFLOPS = 1 PFLOPS Sparse AA (N=50, B=380MHz, Nb=160) 16x502x160x0.38 GFLOPS = 2.43 PFLOPS
8
top500
9
Tesla Roadmap
10
Tesla K10 (Kepler) Single precision: 2x2.288 TFLOPS CUDA cores: 2x1536
Memory Bandwidth: 2x160GB/sec Memory: 2x4GB
11
CoDR of a Software Correlator for the dish array
250 dishes 250 nodes 100 Gb/s Ethernet CPUs+(GPUs) CPUs+(GPUs) CPUs+(GPUs) Required BW>32Gb/s CPUs+(GPUs) CPUs+(GPUs) >4 TFLOPS CPUs+(GPUs) 2x4x2x1GHz =16Gb/s 2 pols, 4bit sampling, Nyquist, BW
12
CoDR of a Software Correlator for the sparse AA
50 stations 16 subclusters 100 Gb/s Ethernets station subcluster subcluster station subcluster station station subcluster station subcluster >150 TFLOPS station subcluster 60Gb/s x 16
13
Conclusions Korea (software), Japan, Taiwan (FPGA) have common interest in correlators for not only the SKA but also, probably, the ACA. Japan (NAOJ) is also developing a software correlator for VLBI. We (KJT) might take part in the CSP (Central Signal Processor) domain of the SKA project together.
14
CoDR for SKA Phase I, Memo 125
Key Sciences: H I and Pulsars Sparse Aperture Array MHz, A/Tsys = 2000m2/K, Lmax=100Km Dish Array GHz, A/Tsys=1000m2/K, m dishes single-pixel feeds, Lmax=100Km Construction: Budget: 350M Euros
15
Design goals Connect antennas and computer nodes with simple network topology Use future technology development of HPC clusters Simplify programming
16
Cost and Power Estimates of SCs
# of nodes Cost per node [kEuros] Cost of IB per port [kEuros] Power per node [kW] Total cost [M Euros] Total power [MW] Dish Array 250 5 1 1.0 1.5 0.25 Sparse AA 800 4.8 0.80 Total 1050 6.3 1.05
17
Data Rate per Dish Pure data: Encoding overhead
2 (pol) x 4 (bit/sample) x 2 (Nyquist) x 1GHz (BW) =16Gb/s Encoding overhead 20% (8b/10b; PCIe 2.0) 1.5% (128b/130b; PCIe 3.0) UDP (User Data protocol) overhead < 1% (28 bytes for headers / 65,507 bytes data length) Oversample overhead ~ 20% (memo 130)
18
Communication between Computer Nodes I
after FFT 1 Nc Node 1 FFT 2x4x2xB 2x8x2xB Node 2 FFT 2 Nc
19
Communication between Computer Nodes II
after communication 1 Nc/2 Node 1 FFT 1 2 2x4x2xB 2x8x2xB Node 2 FFT 1 2 Nc/2 2
20
All-to-All Communication between Computer Nodes III
after communication Nc/4 2x4x2xB 2x8x2xB 1 N>>1 BW(interconnect) =2 (2x4x2xB) =32B Node 1 FFT 2 1 3 4 Node 2 FFT 2 Node 2 FFT 3 Nc/4 1 Node 2 FFT 2 4 3 4
21
Data Rate per Station Pure data: Encoding overhead
2 (pol) x 4 (bit/sample) x 2 (Nyquist) x 0.38GHz (BW) x 160 (beams) =972.8 Gb/s Encoding overhead 20% (8b/10b; PCIe 2.0) 1.5% (128b/130b; PCIe 3.0) UDP (User Data protocol) overhead < 1% (28 bytes for headers / 65,507 bytes data length) Oversample overhead ~ 20% (memo 130)
22
Channelized data in a station
Nchan subcluster1 subcluster2 subcluster16 160 beams Nchan/16
23
Connectivity between stations and nodes in a subcluster
100 Gb/s Ethernets station node station node node station Required BW >60Gb/s station node station node >3 TFLOPS station node ~ 60Gb/s
24
AI (Arithmetic Intensity)
Definition: number of operations (flops) per byte AI = 8flops/16bytes (Ri,Rj) = 0.5 AI = 32 flops/32bytes (Ri, Li, Rj, Lj) = 1.0 for 1x1 tile AI = 2.4 for 3x2 tiles Since AIs are small numbers, correlation calculations are bounded by the memory bandwidth. Performance: AI x memory BW (=102GB/s) Wayth+ (2009), van Nieuwpoort+ (2009)
25
Performance of Tesla C1060 as a function of AI
Performance is, indeed, memory-bounded. Maximum performance is about 1/3 of the peak performance.
26
SKA Phase I: Preliminary System, Memo 130
Sparse Aperture array 50 stations Bandwidth: 380 MHz ( MHz) Number of beam: 480 (160) Dual polarizations Bits per sample: 4 bits
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.