Implementation of a streaming database management system on a Blue Gene architecture for measurement data processing. Erik Zeitler Uppsala data base lab
Looking out into space: Use large radio telescopes!
Problem: Size matters We have hit the limit
Use many large radio telescopes? Augment the measurements using signal processing They act together as a HUGE telescope Look in one direction only Expensive…
Solution Use a huge amount of small antennas This enables new scientific applications (and challenges) } Broad band Multi direction receivers
Scientific applications Re-ionization epoch the 1 st 10 5 years – hydrogen forming Deep Extragalactic Surveys To boldly go… Transient Sources All-sky surveys of –gamma bursts –flare stars –supernovae Ultra High Energy Cosmic Rays Pulsars
Antennas, antennas, antennas… Broad band radio receiver 80…300 MHz, 3 dimensions Produces 0.9 Gbps raw data Central site + 20 outstations located within a circular area, diameter 350 km 13 10 3 antennas
System overview Antennas Basic beam forming FPGAs Network GbE, 10GbE Central Processing facility Linux clusters, IBM Blue Gene/L Off line analysis PCs, workstations, Blue Gene
System overview
Central processing tasks FFT Signal correlation Calibration RFI mitigation (noise from human activities) Stratosphere plasma Subtracting known objects Transient analysis Peak detection
Computing challenges Multiple incoming data streams 20 Tbps Multiple experiments Complex computations Demand for rapid reconfiguration of computing systems Use case: On-line transient analysis
Central processing facilities On line processing Linux cluster (buffering) Light weight BG/L (beam) 6 racks 6144 compute nodes + 96 I/O nodes Off-line processing Linux clusters, SAN, GRID, …
Blue Gene Dataflow supercomputer LLNL installation: 64 racks (65536 CPUs) 70 TFLOPS on the size of a tennis court
BG/L architecture I/O node: 2x Linux Each I/O node coordinates 64 compute nodes 512 MB RAM Compute node: 2x Single threaded light weight OS Typically: –1 CPU for computation –1 CPU for communication 512 MB RAM
User agent
UDBL project Implement a very high performance stream database manager based on AmosII DB kernel ( Utilize the BG/L computing environment for scalable data stream queries involving user-defined computations Implement specialized query optimization: Planning BG/L node configuration for given stream queries Re-configuration when interesting phenomena occur
This far (after 4 months) Implementing primitives for data ~ Computation Aggregation Communication Fusion Proof of concept cases Signal processing Peak detection Stream join Benchmark Based on real LOFAR/LOIS data Performance analysis for stream databases
A simple example gnuplot(peakdetect(vector_elements(wina gg(vector_elements(readlofarvectorfile( "temp.DAT")),256,256))));
Other application areas Other space physics research areas projects at IRFU Network traffic analysis Financial (stock market) information Content analysis of streaming media
Questions?