Steven Whitham Jeremy Woods

Slides:



Advertisements
Similar presentations
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
Advertisements

11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Distributed components
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
Revision Week 13 – Lecture 2. The exam 5 questions Multiple parts Read the question carefully Look at the marks as an indication of how much thought and.
NetSolve Henri Casanova and Jack Dongarra University of Tennessee and Oak Ridge National Laboratory
Subnetting.
1 Programming systems for distributed applications Seif Haridi KTH/SICS.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NetSolve / GridSolve By Milan Novakovic, Steven Morgan.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
DYNAMIC HOST CONFIGURATION PROTOCOL (DHCP) BY: SAMHITA KAW IS 373.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
Module 12: Routing Fundamentals. Routing Overview Configuring Routing and Remote Access as a Router Quality of Service.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
1. I NTRODUCTION TO N ETWORKS Network programming is surprisingly easy in Java ◦ Most of the classes relevant to network programming are in the java.net.
On the Performance of TCP Splicing for URL-aware Redirection Ariel Cohen, Sampath Rangarajan, and Hamilton Slye The 2 nd USENIX Symposium on Internet Technologies.
1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Service Discovery Protocols Mobile Computing - CNT Dr. Sumi Helal Professor Computer & Information Science & Engineering Department University.
6/29/1999PDPTA'991 Performance Prediction for Large Scale Parallel Systems Yuhong Wen and Geoffrey C. Fox Northeast Parallel Architecture Center (NPAC)
6.1 © 2004 Pearson Education, Inc. Exam Designing a Microsoft ® Windows ® Server 2003 Active Directory and Network Infrastructure Lesson 6: Designing.
LINUX® Netfilter The Linux Firewall Engine. Overview LINUX® Netfilter is a firewall engine built into the Linux kernel Sometimes called “iptables” for.
Presented by Deepak Varghese Reg No: Introduction Application S/W for server load balancing Many client requests make server congestion Distribute.
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
Network Processing Systems Design
Development of a Simulator for the HANARO Research Reactor (Communication Protocol) H.S. Jung.
Introduction to Distributed Platforms
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
COMPUTATIONAL MODELS.
Distributed Systems.
Sabri Kızanlık Ural Emekçi
Advanced Computer Networks
Network Load Balancing
In-situ Visualization using VisIt
Distribution and components
CHAPTER 3 Architectures for Distributed Systems
Hiding Network Computers Gateways
NET323 D: Network Protocols
AGENT OS.
NET323 D: Network Protocols
Setting Up Firewall using Netfilter and Iptables
Lecture 1: Multi-tier Architecture Overview
CS4470 Computer Networking Protocols
CLUSTER COMPUTING.
Lec 5: SNMP Network Management
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.
Practical Issues for Commercial Networks
COMPUTER NETWORK TECHNOLOGY
Professor Ioana Banicescu CSE 8843
TA: Donghyun (David) Kim
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
AbbottLink™ - IP Address Overview
Introduction to Networking & TCP/IP
CS4470 Computer Networking Protocols
Automatic optimization of parallel linear algebra software
The SMART Way to Migrate Replicated Stateful Services
Database System Architectures
Request for Comments(RFC) 3489
Last Class: Communication in Distributed Systems
Exceptions and networking
Presentation transcript:

Steven Whitham Jeremy Woods NetSolve Steven Whitham Jeremy Woods

Architecture A system of “loosely connected” machines, meaning they can be on a LAN or even international network Heterogenous system, can use machines with incompatible data formats at the same time One or many NetSolve “Agents” can exist on a NetSolve system Each will have a view of NetSolve resources (i.e. computational servers used for calculations) The Agent is responsible for selecting the best resource Assuming changes to the NetSolve system are rare, eventually all agents will have the same view of the overall system Machine application range is extending by using configuration files. Platforms C, FORTRAN, MATLAB, Mathematica, Java (is wonderful)

How It Works - Overview NetSolve Client sends a problem to the NetSolve Agent NetSolve Agent determines which NetSolve Resource is “best” suited for the problem and sends the problem to that resource NetSolve System returns the result to the NetSolve Client Communication via TCP/IP External Data Representation (XDR) is used between hosts with incompatible data formats

How It Works - Details In order to classify a describe a problem, Netsolve uses a 3-tuple of <name, inputs, outputs> Then users must use a specific calling sequence, which is its format. Once the client send the problem to an agent, the agent uses the network time and computation time to calculate the execution time for each machine in the system and chooses the approximate best one. This depends on 3 types of parameters: client-dependent, static server-dependent, and dynamic server-dependent.

How It Works - Details Cont... The agent ranks a list of the best machines to solve the computations, and then sends the list to the client. The client traverses the list until the problem is solved, marking any machines that were not able to finish the request. If the problem is not solved when the client finishes the list, the client requests a new list from the agent and this new list contains all new machines. The result is then sent directly to the client from the machine that did the computation.

Execution parameters Client Dependent Static Server-Dependent Size of data sent Size of data received Size of problem Static Server-Dependent Network characteristics between local host and machine The complexity of the algorithm used by the machine The performance of the machine Dynamic Server-Dependent Workload

Workload Model The best machine is an approximation because of workload variability. Too expensive to continuously update agent with machines workload. Only broadcast the workload when it has changed significantly Choice of time slice and confidence intervals extremely important.

Load Balancing Best machine is determined by predicting smallest execution time T for a given problem Time is split into time to send and receive data (Tn) and time to compute (Tc) Tn is calculated using: Network latency Size of data sent Size of data returned Tc is calculated using Size of the problem Complexity of the algorithm Performance of the server depending on current workload and its optimal performance capabilities

Load Balancing - Performance Model Where: p = estimated performance P = raw performance of server w = workload n = # processors on server Equation: p = P x 100 x n 100 x n + max(w - 100 x (n - 1),0) NetSolve provides a system for creating a Problem Description File (PDF) which defines the complexity of a computational algorithm.

Fault Tolerance Each NetSolve process is an independent entity (agents and machines) The NetSolve agent automatically detects server failures if unable to establish TCP connection, and may drop the server if it does not reboot When the NetSolve agent receives a request, it generates a weighted list of servers able to support If the first server refuses the request, the agent sends it to each next until the request is accepted Improper implementation of server resources will eventually cause the server to be dropped

Drawbacks Physical network layout Network Address Translation (NAT) NetSolve tracks components by IP Address Access unavailable if resource is behind NAT Wide range of ports Now blocked by most commercial firewalls The core of NetSolve is still used today, but its implementation has been updated to support more applications

References Henri Casanova, Jack Dongarra. Netsolve: A Network Server for Solving Computational Science Problems. April 29, 1996 Henri Casanova, Jack Dongarra. Applying NetSolve’s Network-Enabled Server, Proceedings of Heterogeneous Computing Workshop, 1998 Henri Casanova, Jack Dongarra. NetSolve: a Network-Enabled Solver; Examples and Users, Proceedings of Heterogeneous Computing Workshop, 1998 Dorian C. Arnold, Jack Dongarra. The NetSolve Environment: Progressing Towards the Seamless Grid, 2000 International Conference on Parallel Processing (ICPP- 2000), August 2000 Thomas Brady, Eugene Konstantinov, Alexey Lastovetsky. SmartNetSolve: High-Level Programming System for High Performance Grid Computing, IPDPS, IEEE Computing Society, 2006 Asim YarKhan, Jack Dongarra, Keith Seymour. GridSolve: The Evolution of A Network Enabled Solver, Proceedings of the 2006 International Federation for Information Processing (IFIP) Working Conference, 2006.

Thank You Questions?