Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Robustness in Distributed Systems Per Bergqvist Erlang User Conference 2001 (courtesy CellPoint Systems AB)

Similar presentations


Presentation on theme: "Improving Robustness in Distributed Systems Per Bergqvist Erlang User Conference 2001 (courtesy CellPoint Systems AB)"— Presentation transcript:

1 Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)

2 Design base Cluster of cooperating hosts Erlang and C COTS hardware based Unix based (i.e. Solaris or Linux) 10/100/1000 base-T back plane (”system area network”)

3 Cluster Shared, distributed, system configuration Each host have ONE cluster controller Dispatch and supervise worker tasks Master cluster controller: holds configuration database (persistent replica) Slave cluster controller: gets configuration from master cluster controllers Cluster is DOWN when all master cluster controllers are inaccessible

4 Typical system Firewall Switch Traffic Control

5 Cluster Key Benefits Single system view Enforces decoupling of parts of O&M from actual traffic processing

6 Implementing a cluster Cluster->Host->Node->NodeData Cluster global parameters Subscription mechanisms for conf. changes Mnesia as configuration database on master cluster controllers Homebrewn configuration distribution to slave controllers (NOT using mnesia) (Worker) node supervision

7 Mnesia gotchas First distributed node startup Disallow writes when all replicas not accessible Use timeout on table load and force load

8 ... BUT... TCP based distribution Network partitioning

9 Network parameters Align TCP retransmission intervals w/ Erlang heartbeats Align TCP and IP rerouting parameters

10 Typical system II: Dual back plane Firewall Switch Traffic Control

11 Erlang multi-homing problem Host A Host B Host C

12 Multi-home Erlang w/ TCP Add an alias interface to loop back i/f Patch tcp distribution to bind to alias Publish alias interface on (all wanted) via real hw i/f’s Method 1: Static routes and gratuitous/proxy arp Method 2: Use new (routing) protocol

13 ARP method Implement a utility to: - broadcast unsolicited ARP responses - respond to ARP requests for the alias i/f address Add static routes on all far end systems NOTE: all real i/f needs to be on same IP subnet

14 New routing protocol Broadcast (Ethernet frames) what you have, including interface priority Let the far end select path based on what/when they receive Far end dynamically sets up host routes Use short retransmission intervals

15 Erlang multi-homing resolved ? Host A Host B Host C

16 Summing up Erlang can support multihoming with some additional work By using loop back alias i/f, link failure becomes a routing problem (peer-peer association is kept intact) Solaris TCP/IP stack parameters are: - hard to find (only in out-of-date app. notes) - hard to set ”right” - host global A distribution mechanism with built-in support for multi-homing preferred

17 Erlang Distribution over SCTP Per Bergqvist et al per@synapse.se per@synapse.se Erlang User Conference 2002


Download ppt "Improving Robustness in Distributed Systems Per Bergqvist Erlang User Conference 2001 (courtesy CellPoint Systems AB)"

Similar presentations


Ads by Google