Steven Whitham Jeremy Woods NetSolve Steven Whitham Jeremy Woods
Architecture A system of “loosely connected” machines, meaning they can be on a LAN or even international network Heterogenous system, can use machines with incompatible data formats at the same time One or many NetSolve “Agents” can exist on a NetSolve system Each will have a view of NetSolve resources (i.e. computational servers used for calculations) The Agent is responsible for selecting the best resource Assuming changes to the NetSolve system are rare, eventually all agents will have the same view of the overall system Machine application range is extending by using configuration files. Platforms C, FORTRAN, MATLAB, Mathematica, Java (is wonderful)
How It Works - Overview NetSolve Client sends a problem to the NetSolve Agent NetSolve Agent determines which NetSolve Resource is “best” suited for the problem and sends the problem to that resource NetSolve System returns the result to the NetSolve Client Communication via TCP/IP External Data Representation (XDR) is used between hosts with incompatible data formats
How It Works - Details In order to classify a describe a problem, Netsolve uses a 3-tuple of <name, inputs, outputs> Then users must use a specific calling sequence, which is its format. Once the client send the problem to an agent, the agent uses the network time and computation time to calculate the execution time for each machine in the system and chooses the approximate best one. This depends on 3 types of parameters: client-dependent, static server-dependent, and dynamic server-dependent.
How It Works - Details Cont... The agent ranks a list of the best machines to solve the computations, and then sends the list to the client. The client traverses the list until the problem is solved, marking any machines that were not able to finish the request. If the problem is not solved when the client finishes the list, the client requests a new list from the agent and this new list contains all new machines. The result is then sent directly to the client from the machine that did the computation.
Execution parameters Client Dependent Static Server-Dependent Size of data sent Size of data received Size of problem Static Server-Dependent Network characteristics between local host and machine The complexity of the algorithm used by the machine The performance of the machine Dynamic Server-Dependent Workload
Workload Model The best machine is an approximation because of workload variability. Too expensive to continuously update agent with machines workload. Only broadcast the workload when it has changed significantly Choice of time slice and confidence intervals extremely important.
Load Balancing Best machine is determined by predicting smallest execution time T for a given problem Time is split into time to send and receive data (Tn) and time to compute (Tc) Tn is calculated using: Network latency Size of data sent Size of data returned Tc is calculated using Size of the problem Complexity of the algorithm Performance of the server depending on current workload and its optimal performance capabilities
Load Balancing - Performance Model Where: p = estimated performance P = raw performance of server w = workload n = # processors on server Equation: p = P x 100 x n 100 x n + max(w - 100 x (n - 1),0) NetSolve provides a system for creating a Problem Description File (PDF) which defines the complexity of a computational algorithm.
Fault Tolerance Each NetSolve process is an independent entity (agents and machines) The NetSolve agent automatically detects server failures if unable to establish TCP connection, and may drop the server if it does not reboot When the NetSolve agent receives a request, it generates a weighted list of servers able to support If the first server refuses the request, the agent sends it to each next until the request is accepted Improper implementation of server resources will eventually cause the server to be dropped
Drawbacks Physical network layout Network Address Translation (NAT) NetSolve tracks components by IP Address Access unavailable if resource is behind NAT Wide range of ports Now blocked by most commercial firewalls The core of NetSolve is still used today, but its implementation has been updated to support more applications
References Henri Casanova, Jack Dongarra. Netsolve: A Network Server for Solving Computational Science Problems. April 29, 1996 Henri Casanova, Jack Dongarra. Applying NetSolve’s Network-Enabled Server, Proceedings of Heterogeneous Computing Workshop, 1998 Henri Casanova, Jack Dongarra. NetSolve: a Network-Enabled Solver; Examples and Users, Proceedings of Heterogeneous Computing Workshop, 1998 Dorian C. Arnold, Jack Dongarra. The NetSolve Environment: Progressing Towards the Seamless Grid, 2000 International Conference on Parallel Processing (ICPP- 2000), August 2000 Thomas Brady, Eugene Konstantinov, Alexey Lastovetsky. SmartNetSolve: High-Level Programming System for High Performance Grid Computing, IPDPS, IEEE Computing Society, 2006 Asim YarKhan, Jack Dongarra, Keith Seymour. GridSolve: The Evolution of A Network Enabled Solver, Proceedings of the 2006 International Federation for Information Processing (IFIP) Working Conference, 2006.
Thank You Questions?