Download presentation
Presentation is loading. Please wait.
Published byClifford Garrison Modified over 6 years ago
1
Operating a hardware VLBI correlator can be interesting at the best of times. Most of this is because from our point of view it looks like this:
2
input output MAC* (*) monitoring and control
it takes some input, produces output and if we’re lucky we have a means to monitor and control it. In short: it is a black box. (*) monitoring and control MAC*
3
Herding FPGAs or: distributed monitor and control harro verkouter
My name is Harro Verkouter. I work at the Joint Institute for VLBI in Europe and I’m going to talk about distributed monitoring and control. harro verkouter Joint Institute for VLBI in Europe
4
At JIVE we were happy to wheel out the hardware MarkIV correlator and
5
switch to a software correlator, but ...
6
at some point JIVE got involved in developing the UniBoard
at some point JIVE got involved in developing the UniBoard. A high performance signal processing board.
7
16x 10 Gbps ethernet 16x 10 Gbps ethernet
it features eight state-of-the-art FPGAs, 16x 10 Gbps i/o at one of the board, 16x 10 Gbps of I/O at the other end of the board and an on-board mesh which connects each FPGA in one column to all FPGAs in the other column. Finally there is a 1 Gbps control connection to each individual FPGA. As part of the project, JIVE is responsible for implementing a VLBI correlator on this board.
8
input output Let’s start filling in boxes from this scheme. The question becomes: how to provoke something useful out of this? MAC*
9
output MAC* UDP/IPv4 file server file server file server
Start by reading data from files and sending it via UDP into the correlator. MAC*
10
MAC* UDP/IPv4 UDP/IPv4 file server capture file server file server
At the output we capture the data to a RAID file system MAC*
11
unb_control UDP/IPv4 UDP/IPv4 file server capture file server
The uniboard itself must be controlled unb_control
12
unb_control correlator_control UDP/IPv4 UDP/IPv4 file server capture
And it will come as no surprise that all of the components themselves must be controlled. Is this the whole picture? NO! unb_control correlator_control
13
unb_control correlator_control UDP/IPv4 UDP/IPv4 file server capture
we can have multiple uniboards unb_control correlator_control
14
...... ≤ 32 ≤ 4/Board ... unb_control correlator_control UDP/IPv4
file server UDP/IPv4 capture ...... ≤ 32 ≤ 4/Board ... capture Mark5ABC UDP/IPv4 the uniboard supports data from up to 32 stations as input. the output can be distributed over many capture nodes if the data rate gets too high. unb_control correlator_control
15
distributed ⇒ much communication
correlator_control distributed ⇒ much communication multiple UniBoards up to 32 input data servers multiple capture nodes heterogeneous summarizing: correlator_control will be a large distributed system. How to code this?
16
Which language/system?
MarkIV correlator control s/w archaic C++ (predates STL!) home grown message passing id. for distributed processing Other C/C++ implementations MPI library Boost library LOFAR control s/w ... we looked at different possibilities for writing our correlator control system.
17
Which language/system?
JAVA XMLRPC or RMI for remote execution no distributed multiprocessing Python multiprocessing module (threads, not machines) message passing module we also looked at other languages. However, with all these possibilities, I spot a pattern. Do you see it?
18
THEY’RE ALL BOLTED ON!
21
Write the control system
from scratch in another language! The obvious
22
http://www.erlang.org functional programming language high level
compiles to byte code (cf. Python, JAVA) developed by Ericsson AB since 1987 designed for monitoring/control designed for concurrency designed to build distributed systems open sourced in 1998 We choose the Erlang language. It is a functional, high level programming language developed for industrial strenght distributed system monitoring and control. It is also not very new; it has been in development since the eighties of the previous millenium.
23
Functional goodies immutable data functions can return functions
functions can take functions as arguments pattern matching + guard expressions no for() or while() loops less Lines Of Code you have to think about your code From being a functional programming language you automatically get the following goodies. Out of these I would like to highlight the two most important ones.
24
Functional goodies code explodes! immutable data
functions can return functions functions can take functions as arguments pattern matching + guard expressions no for() or while() loops less Lines Of Code you have to think about your code Calling a function with arguments that do not match the expected pattern or values that fail the guard expressions will make your code explode, which is a good thing.
25
Functional goodies head explodes! immutable data
functions can return functions functions can take functions as arguments pattern matching + guard expressions no for() or while() loops less Lines Of Code you have to think about your code This is the hardest thing about functional programming: you actually HAVE to THINK about your program. Typically, this results in the following failure mode .... However, none of this explains WHY Erlang is good for us.
26
Pid = spawn(‘node@host’, Function, [Arg1, Arg2, ...])
built-in execute function on remote node! creates a new, concurrent, Erlang process “almost, but not quite, ...” (no unix process) lightweight – 106 processes/node easily returned process identifier “Pid” is built-in type encodes which process running where The built-in spawn function allows executing a function on a remote node. This creates a new, concurrent Erlang process. Erlang processes are almost, but not quite, ENTIRELY unlike unix processes. The spawn function will return the Erlang process ID of the new process. It is an actual type in the language and encodes where the process is running.
27
Message -> do_something() end
Pid!Message, receive Message -> do_something() end communication exclusively via messages syntax elements: operator ! for sending keyword receive accepts incoming messages As said, in distributed systems there is a lot of communication going on. In Erlang this is strictly done by sending messages to a process. The *language* contains syntax elements to actually support this. The send operator allows for easy sending of a message to a process. The receive *keyword* allows a process to wait for incoming messages and take action upon receipt.
28
LE = <<V:32-unsigned-little>>
BE = <<V:9-unsigned-big>> binaries are a built-in type arbitrary sizes of bit fields syntax elements: keywords << and >> for creation keyword : for specification of field width &cet. In distributed system, endiannes always plays a big role. Besides that, dealing with binary data is always a pain – using masks and shifts. To deal with that, Erlang contains an extensive syntax and language support for dealing with binary data. Binary forms can be intuitively be encoded/decoded. This makes dealing with binary protocol data a breaze.
29
{_,Code,_} = code:get_object_code(Module)
code, load_binary, [Code]) executes code:load_binary(Code) remotely, transfers the Code blob via the network Another great pain in distributed systems is: distributing the executable code. In Erlang it is possible to read in the compiled byte code for a module. Then it is trivial to load that code on a remote node.
30
host A init() job0() job1() host Y host X writerN() reader0_job0()
writer1_job1() writerN() host Y readerZ() process23() host Z process42() The achilles heel of any distributed system is that typically, when you start something, it creates processes all over the place.
31
host A init() job0() job1() host Y host X writerN() reader0_job0()
writer1_job1() writerN() host Y readerZ() process23() host Z process42() The problem usually is if something fails, to find out WHAT failed and WHERE it failed.
32
erlang divides processes in two classes, workers and supervisors
erlang divides processes in two classes, workers and supervisors. supervisors monitor child processes. the nice thing is that besides workers, supervisors can monitor other supervisors as well. this way you can build a process or supervision tree. now suppose one of the workers somewhere crashes .... worker supervisor
33
worker supervisor {error, Reason} {error, Reason} {error, Reason}
what happens is that its supervisor catches that error and propagates the error upwards to its supervisor. this repeat until we’re at the root of the tree. worker supervisor
34
Let’s put it all together
35
ccs.jive.nl fileserv0.jive.nl unb_ctl.jive.nl capture.jive.nl
This is an overview of all machines that we need. Let’s make the transition to processes
36
ccs.jive.nl UNB = spawn(‘unb_ctl.jive.nl’, unb_control, [...])
setup_board() -> fpga:read(version), fpga:write(Config). unb_control The process begins with correlator control booting. First of all, it needs a uniboard controller. The controller is spawned on the remote host. unb_control proceeds by interacting with the uniboard.
37
ccs.jive.nl fileserv0.jive.nl
UNB = spawn(‘unb_ctl.jive.nl’, unb_control, [...]) CAP = spawn(‘capture.jive.nl’, capturer, [...]) ccs.jive.nl unb_control capturer receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl
38
ccs.jive.nl fileserv0.jive.nl
UNB = spawn(‘unb_ctl.jive.nl’, unb_control, [...]) CAP = spawn(‘capture.jive.nl’, capturer, [...]) SEND = spawn(‘fileserv0.jive.nl’, sender, [...]) ccs.jive.nl sender unb_control capturer spawn(file_reader,..) receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl This is an overview of all machines that we need. Let’s make the transition to processes
39
ccs.jive.nl fileserv0.jive.nl
UNB = spawn(‘unb_ctl.jive.nl’, unb_control, [...]) CAP = spawn(‘capture.jive.nl’, capturer, [...]) SEND = spawn(‘fileserv0.jive.nl’, sender, [...]) ccs.jive.nl sender unb_control capturer spawn(file_reader,..) receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader FD = open_file(Infile)
40
ccs.jive.nl fileserv0.jive.nl
UNB = spawn(‘unb_ctl.jive.nl’, unb_control, [...]) CAP = spawn(‘capture.jive.nl’, capturer, [...]) SEND = spawn(‘fileserv0.jive.nl’, sender, [...]) ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end Everybody is now waiting for things to happen
41
ccs.jive.nl fileserv0.jive.nl
CAP ! {start, OutfileName} ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl FD = file:open(Out) do_capture file_reader receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end This is an overview of all machines that we need. Let’s make the transition to processes
42
ccs.jive.nl fileserv0.jive.nl
sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader do_capture receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end receive {udp,Data} -> file:write(FD,Data) end This is an overview of all machines that we need. Let’s make the transition to processes
43
ccs.jive.nl fileserv0.jive.nl
case UNB ! fifo_level of ... ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader do_capture receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end receive {udp,Data} -> file:write(FD,Data) end This is an overview of all machines that we need. Let’s make the transition to processes
44
ccs.jive.nl fileserv0.jive.nl
case UNB ! fifo_level of empty -> SND ! {do, }; ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader do_capture receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end receive {udp,Data} -> file:write(FD,Data) end This is an overview of all machines that we need. Let’s make the transition to processes
45
ccs.jive.nl fileserv0.jive.nl
case UNB ! fifo_level of empty -> SND ! {do, }, UNB ! {do, } ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl file_reader do_capture receive {do,UTSecond} -> Frame = file:read(FD), udp:write(Frame) end receive {udp,Data} -> file:write(FD,Data) end This is an overview of all machines that we need. Let’s make the transition to processes
46
Works like a charm! file_sender and do_capture pure Erlang
trivial to implement too slow Erlang can interact with external (UNIX) processes implement file_sender/do_capture in C
47
ccs.jive.nl fileserv0.jive.nl
case UNB ! fifo_level of empty -> SND ! {do, }; ccs.jive.nl sender unb_control capturer receive {do,UTSecond} -> ...; end receive fifo_level -> ...; {do,UTSecond} -> ...; end receive {start,F} ->; stop -> ...; end fileserv0.jive.nl cpp_sender cpp_capture fd = open(“..”, O_RDONLY) while( true ) { if( new_second ) { read(fd, buf, frame_sz); write(sok, buf, frame_sz); } fd = open(“.”, O_WRONLY) while( true ) { read(sok, buf, N); write(fd, buf, N); } This is an overview of all machines that we need. Let’s make the transition to processes
48
at some point JIVE got involved in developing the UniBoard
at some point JIVE got involved in developing the UniBoard. A high performance signal processing board.
49
Everything you read on the internet about Erlang is true ...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.