Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Blue Gene/L Correlator Stichting ASTRON (Netherlands Foundation for Research in Astronomy)

Similar presentations


Presentation on theme: "1 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Blue Gene/L Correlator Stichting ASTRON (Netherlands Foundation for Research in Astronomy)"— Presentation transcript:

1 1 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Blue Gene/L Correlator Stichting ASTRON (Netherlands Foundation for Research in Astronomy) Dwingeloo, the Netherlands John W. Romein P. Chris Broekema Ellen van Meijeren Kjeld van der Schaaf Walther H. Zwart

2 2 Next Generation Correlators, June 26 th −29 th, 2006 LOFAR  distributed sensor network  simple receivers  20–240 MHz  37–77+ stations  virtual telescope  32 in central core  remote stations  central processing on supercomputer Groningen

3 3 Next Generation Correlators, June 26 th −29 th, 2006 Outline  central processing  Blue Gene/L  work distribution  the correlator  performance  discussion

4 4 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Central Processor

5 5 Next Generation Correlators, June 26 th −29 th, 2006 Signal Processing Steps  Delay, PolyPhase Filter, FX Correlator, Flagging

6 6 Next Generation Correlators, June 26 th −29 th, 2006 Characteristics  37–77+ stations  160 subbands; 32 MHz bandwidth  input:  195 KHz; 2 pols; i16complex  10–20 GB/s  after PPF:  763 Hz; 256 channels; 2 pols; complex float  output after correlation:  703–3003+ baselines; 256 channels; 4 pols; 1 sec. integration; complex float  1–4 GB/s

7 7 Next Generation Correlators, June 26 th −29 th, 2006 The Blue Gene/L  700 MHz dual PowerPC 440  256 MB RAM per core  2 FPUs per core  complex numbers support  2 FMAs / cycle 2.8 GFLOP/s per core  Ethernet, tree, torus networks  synchronous communication!  12,288 cores 34.4 TFLOP/s & 768 Gb/s

8 8 Next Generation Correlators, June 26 th −29 th, 2006 External I/O  16 compute cores behind 1 Gb/s Ethernet interface  I/O node bridges between Ethernet and tree  create TCP socket on compute node  768 Psets

9 9 Next Generation Correlators, June 26 th −29 th, 2006 Work Distribution (1/2)  parallel in subbands (160)  1 subband: too much work for 1 core  use specialized cores

10 10 Next Generation Correlators, June 26 th −29 th, 2006 Work Distribution (2/2)  parallel in subbands  distribute second of sampled data round-robin over cores  core filters, shifts phase, correlates

11 11 Next Generation Correlators, June 26 th −29 th, 2006 The Correlator  weigh partially flagged data  floating point FOR stat2 IN 1.. NrStations DO FOR stat1 IN 1.. stat2 DO FOR pol1 IN [X,Y] DO FOR pol2 IN [X,Y] DO sum = (0,0) FOR time IN 1.. IntegrationTime DO sum += samples[stat1][time][pol1] * ~samples[stat2][time][pol2] END correlation[baseline(stat1,stat2)][pol1][pol2] = sum END

12 12 Next Generation Correlators, June 26 th −29 th, 2006 Correlator Optimizations  written in assembly  correlate 3x2 stations  why? see next slide  treat autocorrelations differently

13 13 Next Generation Correlators, June 26 th −29 th, 2006 Correlator Code  2 instructions per correlation/integration  hide FPU latencies  interleave with other correlations  minimize #loads  hide load latencies  use large register file  concurrent FPU ops & loads … fxcpnsmaX 0 X 2,X 0,X 2,X 0 X 2 lfpsuxX 3,p 3,inc fxcpnsmaX 0 Y 2,X 0,Y 2,X 0 Y 2 lfpsuxY 3,p 3,inc fxcpnsmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcpnsmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcpnsmaX 1 X 2,X 1,X 2,X 1 X 2 fxcpnsmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcpnsmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcpnsmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 fxcxmaX 0 X 2,X 0,X 2,X 0 X 2 fxcxmaX 0 Y 2,X 0,Y 2,X 0 Y 2 fxcxmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcxmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcxmaX 1 X 2,X 1,X 2,X 1 X 2 fxcxmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcxmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcxmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 … fxcpnsmaX 0 X 2,X 0,X 2,X 0 X 2 lfpsuxX 3,p 3,inc fxcpnsmaX 0 Y 2,X 0,Y 2,X 0 Y 2 lfpsuxY 3,p 3,inc fxcpnsmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcpnsmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcpnsmaX 1 X 2,X 1,X 2,X 1 X 2 fxcpnsmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcpnsmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcpnsmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 fxcxmaX 0 X 2,X 0,X 2,X 0 X 2 fxcxmaX 0 Y 2,X 0,Y 2,X 0 Y 2 fxcxmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcxmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcxmaX 1 X 2,X 1,X 2,X 1 X 2 fxcxmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcxmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcxmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 … X 0 Y 2 += X 0 * ~Y 2

14 14 Next Generation Correlators, June 26 th −29 th, 2006 Computational Performance  1 second of station samples, 1 subband, 1 core correlator: 98% of FPU peak performance!

15 15 Next Generation Correlators, June 26 th −29 th, 2006 Network Performance  need multiple concurrently-communicating cores  one core does not achieve 1 Gbit/s  OS problem

16 16 Next Generation Correlators, June 26 th −29 th, 2006 Overall Performance  37 stations, 1 subband, 195 KHz → 256 channels on 6 cores  I/O limited

17 17 Next Generation Correlators, June 26 th −29 th, 2006 The EoR observation mode  computationally most-challenging mode  32–37 stations  160 subbands  ±24 beams  i4complex input samples  10 second integration time  requires ±25 (!) TFLOP/s  need 6-rack capacity  need faster communication

18 18 Next Generation Correlators, June 26 th −29 th, 2006 Discussion & Conclusions  software great flexiblity  Blue Gene/L excellent computational performance  correlator achieves 98%  need faster communication  estimated development time: < 1 man-year  paper: http://www.astron.nl/~romein/ [SPAA'06]


Download ppt "1 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Blue Gene/L Correlator Stichting ASTRON (Netherlands Foundation for Research in Astronomy)"

Similar presentations


Ads by Google