ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Advertisements

Static Routing Exercise. What will the exercise involve?  Unix network interface configuration  Cisco network interface configuration  Static routes.
CSCI 4550/8556 Computer Networks Comer, Chapter 23: An Error Reporting Mechanism (ICMP)
YOUR FIRST ISIS2 GROUP Ken Birman 1 Cornell University.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Linux Networking TCP/IP stack kernel controls the TCP/IP protocol Ethernet adapter is hooked to the kernel in with the ipconfig command ifconfig sets the.
11-Jun-15 Exceptions. 2 Errors and Exceptions An error is a bug in your program dividing by zero going outside the bounds of an array trying to use a.
Chapter 5 IP Routing Routing Sending packets through network from one device to another What must routers know? – Destination address – Neighboring routers.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 23 Introduction to Computer Networks.
Experiment: Step by Step Author: Anna Bekkerman
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
Threads II. Review A thread is a single flow of control through a program Java is multithreaded—several threads may be executing “simultaneously” If you.
Interior Gateway Routing Protocol (IGRP) is a distance vector interior routing protocol (IGP) invented by Cisco. It is used by routers to exchange routing.
Day15 IP Space/Setup. IP Suite of protocols –TCP –UDP –ICMP –GRE… Gives us many benefits –Routing of packets over internet –Fragmentation/Reassembly of.
Campus IPv6 Deployment Phillip Deneault WPI Network Security Officer 1.
ISIS 2 PROCESS GROUPS Ken Birman 1 Cornell University.
Network Administration
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9 Internet Control Message.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
Bootstrap and Autoconfiguration (DHCP)
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Internet Control Message Protocol (ICMP)
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Multicast routing.
Nachos Phase 1 Code -Hints and Comments
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
1 Routing. 2 Routing is the act of deciding how each individual datagram finds its way through the multiple different paths to its destination. Routing.
1 3-Oct-15 Distance Vector Routing CCNA Exploration Semester 2 Chapter 4.
IP Forwarding.
CMPT 471 Networking II Address Resolution IPv4 ARP RARP 1© Janice Regan, 2012.
1 CMPT 471 Networking II IGMP (IPv4) and MLD (IPv6) © Janice Regan,
Chapter 22 Network Layer: Delivery, Forwarding, and Routing Part 5 Multicasting protocol.
1.  “To write test cases for every non-trivial function or method in the module so that each test case is [as] separate from the others [as] possible.”
1 Internet Control Message Protocol (ICMP) Used to send error and control messages. It is a necessary part of the TCP/IP suite. It is above the IP module.
Java Threads. What is a Thread? A thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently.
1.  Writing snippets of code that try to use methods (functions) from your program.  Each snippet should test one (and only one) function......by calling.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
CCNA 2 Week 9 Router Troubleshooting. Copyright © 2005 University of Bolton Topics Routing Table Overview Network Testing Troubleshooting Router Issues.
Debugging and Profiling With some help from Software Carpentry resources.
1 Chapter 23 Internetworking Part 3 (Control Messages, Error Handling, ICMP)
REPLICATING FILES AND OTHER BIG OBJECTS “OUT OF BAND” WITH ISIS2 Ken Birman 1 Cornell University.
COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.
Making SIP NAT Friendly Jonathan Rosenberg dynamicsoft.
Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 1 Multicasting within UCS Qiese Dides.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
ICS3U_FileIO.ppt File Input/Output (I/O)‏ ICS3U_FileIO.ppt File I/O Declare a file object File myFile = new File("billy.txt"); a file object whose name.
Network Layer: Address Mapping, Error Reporting, and Multicasting
Introduction to Computer Programming - Project 2 Intro to Digital Technology.
Firewalls A brief introduction to firewalls. What does a Firewall do? Firewalls are essential tools in managing and controlling network traffic Firewalls.
(ITI310) By Eng. BASSEM ALSAID SESSIONS 9: Dynamic Host Configuration Protocol (DHCP)
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
(c) University of Washington10-1 CSC 143 Java Errors and Exceptions Reading: Ch. 15.
NETWORKING (2) Dr. Andy Wu BCIS 4630 Fundamentals of IT Security.
ITP 457 Network Security Networking Technologies III IP, Subnets & NAT.
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
Computer Networks 0110-IP Gergely Windisch
Troubleshooting. Why Troubleshoot? What Can Go Wrong? –Misconfigured zone –Misconfigured server –Misconfigured host –Misconfigured network.
21-2 ICMP(Internet control message protocol)
Testing and Debugging.
Internet Control Message Protocol Version 4 (ICMPv4)
Vidur Nayyar Xueting Wang Weicong Zhao
Internet Control Message Protocol
P1 : Distributed Bitcoin Miner
TCP/IP Protocol Suite 1 Chapter 9 Upon completion you will be able to: Internet Control Message Protocol Be familiar with the ICMP message format Know.
Exceptions and networking
Presentation transcript:

ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University

Parameters 2  Many features of Isis 2 depend on parameters you can modify to “shape” the behavior of the platform.  They give you very fine control over behavior of Isis 2  There are three main categories of parameters 1. Those that determine how the system will start up 2. Those that determine how it sends messages 3. Those that control limits, timeouts and other bounds

What happens when you call IsisSystem.Start()? Startup Parameters 3

How IsisSystem.Start() works 4 1. The library initializes itself and determines the IP address of “local host.” If the host has several IP addresses, it picks the last of the IPv4 addresses 2. The system scans the “environment” variables to read values of the parameters. These will override the default values compiled into Isis 2 1. In Linux/bash, use “export” to set them, either in.bashrc or in a shell script. Or call setenv(2) 2. In Windows, use the “set” command, or call Environment.SetEnvironmentVariable("something", somevalue);

How IsisSystem.Start() works 5 1. Next, the system decides which network interfaces it should use (all of them, unless you tell it otherwise by setting ISIS_NETWORK_INTERFACES) 1. Do this if you expect to run on machines that have a “production” network and a “management” network 2. Otherwise leave ISIS_NETWORK_INTERFACES alone 2. Having done this, it attempts to contact the ORACLE 1. If the ORACLE isn’t found, it restarts the ORACLE 2. Otherwise, it asks the ORACLE to let it join the ISISMEMBERS system group

Logging 6  Normally, upon restart, Isis 2 creates a log file for messages printed by the library  You can inhibit this by setting ISIS_MUTE=true  You can also direct that messages be echoed to the Debug stream rather than the Console when calling IsisSystem.Start()  If you allow logging and want to write to the log, call IsisSystem.Write() or IsisSystem.WriteLine()  Output goes to the log plus to Console, or Debug stream

Fast start: But there can only be one… 7  For extreme speed, you can tell Isis 2 not to hunt for the ORACLE (by specifying an argument to IsisSystem.Start)  It will restart instantly. But if you launch two instances this way, they won’t communicate with one-another.  So… do this only in the first instance that you launch

Overwhelming the Membership Oracle 8  If processes start one by one, no issue….  But what if you try to start 50 at once, or 500? Oracle Hello? Welcome! Oracle

Master/Worker 9  If a system will be big, launching hundreds of members can overload the ORACLE.  Better performance: add many all at the same time  In this case use the Master/Worker pattern  Master starts first, collects a list of the workers  Workers start after the master and register with it  Then Master can add a batch of workers to the system, and to any groups that are desired

Master: Accumulates workers, tells them what to do 10 static void beMaster(string[] args) { IsisSystem.Start(); Semaphore waitForWorkers = new Semaphore(0,1); bool fullyStaffed = false List myWorkers = new List (); IsisSystem.RegisterAsMaster((NewWorker)delegate(Address worker) { lock (myWorkers) if (fullyStaffed) IsisSystem.RejectWorker(worker); else { myWorkers.Add(worker); if(myWorkers.Count() == GOAL) { fullyStaffed = true; waitForWorkers.Release(1); } } }); waitForWorkers.WaitOne(); IsisSystem.BatchStart(myWorkers); // This delays until they have all finished their batch start IsisSystem.WaitForWorkerSetup(myWorkers); Group.MultiJoin(myWorkers, new Group[] { myGroup }); // In front of this next line do whatever you want this application to do IsisSystem.WaitForever(); // If the master shuts down, its workers will too IsisSystem.Shutdown(); } Accumulate workers Main thread waits until enough workers have connected, then starts them all at once… … Then adds them all to groups we may want to use

RunAsWorker: Let Master run the show 11 static void beWorker(string[] args) { // This next line assumes that argument 0 is the master's Address // You can also use new Address(mastersHost, 0) if you know the host IP // address of the master but don’t know the master’s pid. IsisSystem.RunAsWorker(args[0]); // This line blocks until the master issues the BatchStart() call // Notice that in this one special case we call it AFTER RunAsWorker! IsisSystem.Start(); // Before calling this next line do whatever setup this worker must do: // create your group handles and register callbacks – but don’t call Join // For example, you might call g = new Group(“something”), then call // g.ViewHandlers += myViewHandler; … etc – anything needed to have the // group ready for a Join. But you call SetUp done INSTEAD of g.Join(). IsisSystem.WorkerSetupDone(); // Now, for each group the Master created using a multijoin, you wait // for its first view to be reported. This is one way to do that: foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250); // WaitForever would freeze the main thread but if the worker has joined // groups (or gets added to groups by the master using MultiJoin(), the // worker could be quite active, receiving messages, sending them, etc) IsisSystem.WaitForever(); // If the master shuts down the worker will throw an // IsisException("master termination"); // If this next line actually executes, this particular worker will exit // (in effect, this worker is a normal Isis application by now, except that // if the master terminates, it does too. In particular, it can // deliberately chose to leave the system if it wishes to do so IsisSystem.Shutdown(); }

Master/Worker Timeline  Worker  Master 12 Oracle IsisSystem.RunAsWorker(mAddress); IsisSystem.Start(); Reached goal IsisSystem.BatchStart(myWorkers); IsisSystem.Start();... Accumulate workers Group g = new Group(“myGroup”);... Attach handlers for g, but don’t call Join IsisSystem.WorkerSetupDone(); IsisSystem.WaitForever(); Setup done for all workers IsisSystem.WaitForWorkerSetup(myWorkers); Group.MultiJoin(myWorkers, new Group[] { myGroup }); IsisSystem.WaitForever(); Group myGroup = new Group(“myGroup”);... Attach handlers for myGroup, then myGroup.Join(); foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250); New view

Why does this help? 13  Workers only send one message to Master  Hence it experiences less load  It adds them all at once, first to the system, then to whatever groups the application will use  Hence only one group view needs to be sent, and it can be sent efficiently, using a broadcast  Overall load is much reduced

How to control what internet protocols Isis 2 uses Messaging Parameters 14

IP multicast / ISIS_UNICAST_ONLY 15  Isis 2 will broadcast to find the ORACLE unless you tell it not to do so.  Default: OK to use IP multicast, UDP, broadcast  ISIS_UNICAST_ONLY: don’t use IP multicast. Still requires UDP (older ISIS_TCP_ONLY feature was eliminated starting in Isis v2.1)  You must list the machines on which Isis 2 ORACLE will run if you put the system in ISIS_UNICAST_ONLY mode. ISIS_HOSTS=“…”

Normal versus UNICAST_ONLY 16  With normal IP multicast packets are still sent directly  With ISIS_UNICAST_ONLY, packets travel on a tree of point-to-point links and must be forwarded, perhaps log 2 (N) times IP multicast Unicast tree: power of 2 “reach”

ISIS_HOSTS 17  Idea is to list the places where the ORACLE can run ISIS_HOSTS=c1.cs.cornell.edu,c2.cs.cornell.edu … or ISIS_HOSTS= ,  Processes running on other machines can join the system but can’t restart it from scratch

ISIS_HOSTS: numerical is best! 18  We have seen bugs in the Linux DNS when accessed from Mono. Sometimes it hangs  To avoid this, use fully numerical IP addresses when you set the values in ISIS_HOSTS  Use the IPv4 addresses for the machines on which you want the ORACLE to run. In this case DNS never hangs  The “ping” and “traceroute” commands are examples of ways you can look these up.  On Windows, string names are fine. On Linux, they work, but don’t put the DNS under heavy load.

ISIS_PORTp 19  The system uses two standard IP ports  ISIS_PORTp: for p2p messages  ISIS_PORTa: Set to ISIS_PORTp+1, for acks/nacks  These ports should not be blocked by your firewall  On Linux, also check iptables, which is like a firewall  If two instances of Isis 2 use non-overlapping port ranges, they will not notice one-another.

ISIS_MAXIPMCADDRS 20  When permitted to use IP multicast, Isis 2 tries not to overuse that feature:  ISIS_MCRANGE_LOW: low-end of the IPMC address range Isis 2 should use. By default, CLASSD+5000, where CLASSD is /8  ISIS_MCRANGE_HIGH: high-end of the IPMC range  ISIS_MAXIPMCADDRS: limit on how many multicast addresses Isis 2 can use, system-wide. It is perfectly reasonable to set this to a small number, like 5 or 10. The system should work if ISIS_MAXIPMCADDRS  2.  If ISIS_UNICAST_ONLY is true, then no IPMC addresses are used at all.

ISIS_TTL 21  Broadcast and multicast messages are automatically relayed by routers  Each “hop” causes the “time to live” field in the message to be decremented  If the TTL reaches zero, the router drops the packet  Isis 2 initializes the TTL value using ISIS_TTL.  You can set this to 0 or 1 to confine the system to a single segment of your network.

ISIS_MAXMSGLEN 22  Automatically adjusted but you can provide a recommended value if you wish  Isis 2 will override the value in some situations  Normally not something you would need to modify  If a message is too large, Isis 2 will automatically fragment it and reassemble it prior to delivery

These are less often changed Other limits and timeouts 23

ISIS_DEFAULTTIMEOUT 24  Normally 45secs. OK to reduce if you wish.  Failure detection needs twice this long, hence 90s.  This applies if you kill a process “suddenly” (e.g. ^C) or if the machine on which it was running crashes  45s is very slow, but on cloud computing systems long delays happen more often than you would expect!  On lightly loaded clusters, you can set ISIS_DEFAULTTIMEOUT much lower, but not less than 2s.  If you design a failure sensing solution of your own, call Isis.ProcessFailed(who) to tell us if a process crashes.

Help! I’ve been poisoned! 25  If a process throws this exception, it means that some other process thought it had failed  If a dead process reappears, live members send it a “you have been poisoned” message  Prevents system partitioning  Rule in Isis 2 : Only allow a single partition to remain alive at one time. If a partition forms, immediately shut one side down (the side lacking a majority)

Speeding up failure detection 26  If a process will exit (rather than crash), call IsisSystem.Shutdown() first.  This rapidly announces the departure and the process will immediately be removed from groups it belongs to  Like a fast failure notification – as if it said “bye!”  You can also eliminate a group rapidly (without killing its members) using g.Terminate()

Hints for EC2 users 27  On EC2 we recommend using ISIS_UNICAST_ONLY  EC2 gives you a “virtual cluster” with nodes numbered from IP address xxx.xxx.xxx.0. You can use this range to set ISIS_HOSTS even before launching your application  If you use the Master/Worker startup mode, you can tell the system the master is at:  new Address(xxx.xxx.xxx.0, 0);  This works because the master will run on node xxx.xxx.xxx.0 (due to ISIS_HOSTS) and the pid is ignored in the BeWorker call, so using 0 is fine.

How can it be done? Debugging Isis 2 issues 28

Debugging is hard… 29  … debugging distributed systems even harder  Useful tools  Visual studio. Keep in mind that even an exception thrown inside Isis 2 could be caused by a mistake in your code. All those upcalls will be issued from Isis 2 stacks!  You can call IsisSystem.GetState() to obtain a string representing the state of the Isis system itself. But you’ll need help from Cornell experts to understand this data.  You can call IsisSystem.RunTimeStatsState() to obtain a self- explanatory string with counts of messages sent and received. The data itself is in IsisSystem.RTS, and you can access this at runtime.

Suggestions 30  Isis 2 is multithreaded. So write thread-safe code.  Don’t block during upcalls from Isis 2 into your code. The library assumes that upcalls will complete quickly and could malfunction otherwise.  Isis 2 has a lot of threads. Don’t let this worry you.  We gave you the source code. If you notice a bug, post it to isis2.codeplex.com on the “issues” page  Post questions on the codeplex “discussions” page