Download presentation
Presentation is loading. Please wait.
Published byErica Underwood Modified over 9 years ago
1
1 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Blue Gene/L Correlator Stichting ASTRON (Netherlands Foundation for Research in Astronomy) Dwingeloo, the Netherlands John W. Romein P. Chris Broekema Ellen van Meijeren Kjeld van der Schaaf Walther H. Zwart
2
2 Next Generation Correlators, June 26 th −29 th, 2006 LOFAR distributed sensor network simple receivers 20–240 MHz 37–77+ stations virtual telescope 32 in central core remote stations central processing on supercomputer Groningen
3
3 Next Generation Correlators, June 26 th −29 th, 2006 Outline central processing Blue Gene/L work distribution the correlator performance discussion
4
4 Next Generation Correlators, June 26 th −29 th, 2006 The LOFAR Central Processor
5
5 Next Generation Correlators, June 26 th −29 th, 2006 Signal Processing Steps Delay, PolyPhase Filter, FX Correlator, Flagging
6
6 Next Generation Correlators, June 26 th −29 th, 2006 Characteristics 37–77+ stations 160 subbands; 32 MHz bandwidth input: 195 KHz; 2 pols; i16complex 10–20 GB/s after PPF: 763 Hz; 256 channels; 2 pols; complex float output after correlation: 703–3003+ baselines; 256 channels; 4 pols; 1 sec. integration; complex float 1–4 GB/s
7
7 Next Generation Correlators, June 26 th −29 th, 2006 The Blue Gene/L 700 MHz dual PowerPC 440 256 MB RAM per core 2 FPUs per core complex numbers support 2 FMAs / cycle 2.8 GFLOP/s per core Ethernet, tree, torus networks synchronous communication! 12,288 cores 34.4 TFLOP/s & 768 Gb/s
8
8 Next Generation Correlators, June 26 th −29 th, 2006 External I/O 16 compute cores behind 1 Gb/s Ethernet interface I/O node bridges between Ethernet and tree create TCP socket on compute node 768 Psets
9
9 Next Generation Correlators, June 26 th −29 th, 2006 Work Distribution (1/2) parallel in subbands (160) 1 subband: too much work for 1 core use specialized cores
10
10 Next Generation Correlators, June 26 th −29 th, 2006 Work Distribution (2/2) parallel in subbands distribute second of sampled data round-robin over cores core filters, shifts phase, correlates
11
11 Next Generation Correlators, June 26 th −29 th, 2006 The Correlator weigh partially flagged data floating point FOR stat2 IN 1.. NrStations DO FOR stat1 IN 1.. stat2 DO FOR pol1 IN [X,Y] DO FOR pol2 IN [X,Y] DO sum = (0,0) FOR time IN 1.. IntegrationTime DO sum += samples[stat1][time][pol1] * ~samples[stat2][time][pol2] END correlation[baseline(stat1,stat2)][pol1][pol2] = sum END
12
12 Next Generation Correlators, June 26 th −29 th, 2006 Correlator Optimizations written in assembly correlate 3x2 stations why? see next slide treat autocorrelations differently
13
13 Next Generation Correlators, June 26 th −29 th, 2006 Correlator Code 2 instructions per correlation/integration hide FPU latencies interleave with other correlations minimize #loads hide load latencies use large register file concurrent FPU ops & loads … fxcpnsmaX 0 X 2,X 0,X 2,X 0 X 2 lfpsuxX 3,p 3,inc fxcpnsmaX 0 Y 2,X 0,Y 2,X 0 Y 2 lfpsuxY 3,p 3,inc fxcpnsmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcpnsmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcpnsmaX 1 X 2,X 1,X 2,X 1 X 2 fxcpnsmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcpnsmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcpnsmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 fxcxmaX 0 X 2,X 0,X 2,X 0 X 2 fxcxmaX 0 Y 2,X 0,Y 2,X 0 Y 2 fxcxmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcxmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcxmaX 1 X 2,X 1,X 2,X 1 X 2 fxcxmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcxmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcxmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 … fxcpnsmaX 0 X 2,X 0,X 2,X 0 X 2 lfpsuxX 3,p 3,inc fxcpnsmaX 0 Y 2,X 0,Y 2,X 0 Y 2 lfpsuxY 3,p 3,inc fxcpnsmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcpnsmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcpnsmaX 1 X 2,X 1,X 2,X 1 X 2 fxcpnsmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcpnsmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcpnsmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 fxcxmaX 0 X 2,X 0,X 2,X 0 X 2 fxcxmaX 0 Y 2,X 0,Y 2,X 0 Y 2 fxcxmaY 0 X 2,Y 0,X 2,Y 0 X 2 fxcxmaY 0 Y 2,Y 0,Y 2,Y 0 Y 2 fxcxmaX 1 X 2,X 1,X 2,X 1 X 2 fxcxmaX 1 Y 2,X 1,Y 2,X 1 Y 2 fxcxmaY 1 X 2,Y 1,X 2,Y 1 X 2 fxcxmaY 1 Y 2,Y 1,Y 2,Y 1 Y 2 … X 0 Y 2 += X 0 * ~Y 2
14
14 Next Generation Correlators, June 26 th −29 th, 2006 Computational Performance 1 second of station samples, 1 subband, 1 core correlator: 98% of FPU peak performance!
15
15 Next Generation Correlators, June 26 th −29 th, 2006 Network Performance need multiple concurrently-communicating cores one core does not achieve 1 Gbit/s OS problem
16
16 Next Generation Correlators, June 26 th −29 th, 2006 Overall Performance 37 stations, 1 subband, 195 KHz → 256 channels on 6 cores I/O limited
17
17 Next Generation Correlators, June 26 th −29 th, 2006 The EoR observation mode computationally most-challenging mode 32–37 stations 160 subbands ±24 beams i4complex input samples 10 second integration time requires ±25 (!) TFLOP/s need 6-rack capacity need faster communication
18
18 Next Generation Correlators, June 26 th −29 th, 2006 Discussion & Conclusions software great flexiblity Blue Gene/L excellent computational performance correlator achieves 98% need faster communication estimated development time: < 1 man-year paper: http://www.astron.nl/~romein/ [SPAA'06]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.