Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet Thriveni Movva CMPS 5433.

Slides:

Advertisements

Similar presentations

M A Wajid Tanveer Infrastructure M A Wajid Tanveer

Advertisements

Distributed Processing, Client/Server and Clusters

2. Computer Clusters for Scalable Parallel Computing

High Performance Computing Course Notes Grid Computing.

Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.

Distributed Processing, Client/Server, and Clusters

History of Distributed Systems Joseph Cordina

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

DISTRIBUTED COMPUTING

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.

Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”

1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

DISTRIBUTED COMPUTING

◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

The University of Bolton School of Games Computing & Creative Technologies LCT2516 Network Architecture CCNA Exploration LAN Switching and Wireless Chapter.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.

Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.

Authors: Ronnie Julio Cole David

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

7. Grid Computing Systems and Resource Management

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.

Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.

Tackling I/O Issues 1 David Race 16 March 2010.

Background Computer System Architectures Computer System Software.

SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Advanced Network Administration Computer Clusters.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Clouds , Grids and Clusters

Definition of Distributed System

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Grid Computing.

Oracle Solaris Zones Study Purpose Only

University of Technology

GRID COMPUTING PRESENTED BY : Richa Chaudhary.

Introduction to Cloud Computing

An Introduction to Computer Networking

Parallel and Multiprocessor Architectures – Shared Memory

Multiple Processor Systems

CLUSTER COMPUTING.

CSE8380 Parallel and Distributed Processing Presentation

Database System Architectures

Presentation transcript:

Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet Thriveni Movva CMPS 5433

Overview  About Grids/Clusters  Uses of Grid  Differences between Grids/Clusters  Benefits of Grid  Grid Architecture  Building Ethernet Network for Grids/Clusters  Examples of Ethernet Grids/Clusters  Conclusion/Summary

What Is A Grid Computer?  Hardware and Software System  Integrates a collection of distributed system components  Computer systems  Storage etc  Solves large-scale computation problems  Appear to the user as a single, large “Virtualized” computing system  Consists of geographically dispersed computers

What is a Cluster?  Multiprocessor system consisting of co-located computers and storage  Viewed as though it were a single computer  Connected through fast local area networks (Localized within a room or building)  Provides more speed and/or reliability than a single computer  Cost-effective than single computers of comparable speed or reliability.

Uses of Grid Computing  Computer systems and other resources  not constrained to be dedicated to individual users or applications  Can be made available for dynamic pooling/sharing according to the changing needs  Using internet, Grid-based resource sharing and collaborative problem solving can be extended to multi-institutional “Virtual Organizations”

Differences between Grids/Clusters  Grids: dispersed over a local/metropolitan/WANdispersed over a local/metropolitan/WAN span administrative boundariesspan administrative boundaries focus on problems in distributing computing/resource sharingfocus on problems in distributing computing/resource sharing distribute workloads among different machine types and OSdistribute workloads among different machine types and OS  Clusters: localized within a room/buildinglocalized within a room/building single administrationsingle administration focus on compute-intensive problems and HPCfocus on compute-intensive problems and HPC homogenous (single type of processor and OS)homogenous (single type of processor and OS)

Benefits Of The Grid  Grid Computing offers a number of Potential uses and benefits that can be broadly categorized in the following way:  High Performance Computing (HPC)  Data Federation and Collaboration  Resource Allocation and Optimization

High Performance Computing (HPC)  Computationally intensive parallelizable applications can be benefited  Uses computer array of numerous commodity or specialized systems  Most applications of the Grid fall into HPC classification  Advantages Of HPC:  Cost effective solutions to critical problems  High return on investment  Solves problems that were previously insolvable within given time and cost  Solve problems too large for conventional supercomputers  Fields in which the HPC Grid has successfully addressed a wide range of computational problems include:  Climate/weather/ocean modeling and simulation, Internet search engines, Signal/image processing, Pharmaceutical research, Military forces simulation

Data Federation and Collaboration  Consolidates data from different sources in a single data service  Hides data location, local ownership and infrastructure from the application  No data disruption by local users, applications or data management policies  Facilitates wide range of integrated applications like:  Corporate performance dashboards  Marketing analysis tools  Customer service applications  Data mining applications

Resource Allocation and Optimization  Sharing of computing and storage to improve resource utilization  For Example, the applications and the batch jobs can be transferred to an idle server  Benefits of resource optimization  Reclaims much of the stranded capacity of the computing infrastructure  Reduces the level of capital investment  No modification of existing application required

Grid Computing Architecture  Basic architecture of Grid consists of  User Interface  Applications  Grid Middleware  Computing Resources  Grid Network

Applications  Classification of parallel applications  Embarrassingly Parallel Computations (EPC) Divided into independent partsDivided into independent parts Allocated to multiple processors for simultaneous executionAllocated to multiple processors for simultaneous execution No communication is required between the processorsNo communication is required between the processors Example : Testing large integers to determine prime numbersExample : Testing large integers to determine prime numbers  Parametric and Data Parallel Computations Also referred to as Nearly Embarrassingly Parallel Computations (NEPC)Also referred to as Nearly Embarrassingly Parallel Computations (NEPC) Each processor works on independent subset of the dataEach processor works on independent subset of the data Data is later gathered by a single processData is later gathered by a single process Examples: Internet search enginesExamples: Internet search engines  Loosely Coupled Synchronous Parallel Computations Inter-process communication between small subset of processors before the computation can be completedInter-process communication between small subset of processors before the computation can be completed

Grid Middleware  Gives the Grid the semblance of a single computer system  Provides coordination among computing resources of the Grid  Provides location transparency  Allows the applications to run over a virtualized layer of networked resources  Available from system vendors and independent software vendors  Example: Globus Toolkit

Functions of Middleware  Discovery and monitoring  Discover what resources or services are available  Monitor their status  Resource allocation and management  Matches application requirements to the available computing resources  Creates and schedules remote jobs as required  Ensures optimum load balancing and resource utilization  Security  Shared resources may contain sensitive information  Secures communications, authenticate user identities using SSL/TLS etc  Message Passing System  Used by compute-intensive parallel applications for inter-process communication  Examples: MPI (Message passing interface) and PVM (parallel virtual machine)

Ethernet Networks for Clusters and Grids  Single-switch Clusters  Large Clusters  Ethernet Grid Networks

Single-switch Clusters  Built using a single high-availability Gigabit Ethernet switch/router as the cluster interconnect  The maximum size of a single-switch Ethernet cluster is determined by the non-blocking port capacity of the switch  Current Switch/routers provide interconnect for over 600 GbE connected servers  All server ports configured to be in same subnet

Large Clusters  Built using meshes of Federated Ethernet switches  Ethernet switches use non-blocking, constant Bi-sectional Bandwidth (CBB) topologies  CBB  Provides scalability to support thousands of cluster nodes  Provide high bandwidth connectivity to the network  The core of the cluster provides each node switch with equal load share to avoid blocking of ports

Ethernet Grid Networks (Campus Grid network based on Ethernet switching)  Ethernet allow the cluster to participate in a broader campus or Enterprise Grid structure  Desktop computers, workstations connected to the campus grid network using gbE  Server farms Outside of cluster are connected to site switches using gbE  Goal of campus LANs  gives high priority to general Grid traffic  ensures critical Grid traffic does not incur any added latency

Grid Tools  Tools used to prioritize critical grid traffic  Priority Queuing The forwarding capacity of a congested port is immediately allocated to any high priority traffic that enters the queueThe forwarding capacity of a congested port is immediately allocated to any high priority traffic that enters the queue  Rate limiting and policing Limits the amount of lower priority traffic that enters the networkLimits the amount of lower priority traffic that enters the network  Weighted Random Early Discard (WRED)  Packet loss can be eliminated if buffers are never allowed to fill to capacity with resulting overflows  Overflows can be avoided by applying WRED to the lower priority traffic  WRED eliminates the possibility of high priority packets arriving at a buffer that is already overflowing with lower priority packets

Examples of Ethernet Cluster/Grids  TeraGrid  Is a multi-institutional effort to build and deploy world’s most comprehensive computing infrastructure for open scientific research  NASA  NASA uses ESDCD “Grid of clusters”, to help scientists increase their understanding of the Earth, the solar system and the universe through computational modeling and processing of space-borne observations

Conclusion/Summary  Ethernet continues to evolve as a highly cost-effective and flexible technology  Majority of parallel and general Grid applications are very well served by the performance characteristics of Ethernet as the cluster/Grid interconnect  In the future, Ethernet end-to-end data transfer bandwidths, message latencies and CPU utilization will improve dramatically due to NIC enhancements  Volume production leading to price decline  These developments expected to improve the overall performance of existing Ethernet clusters/Grids and use of cluster/Grid technology by a broader range of commercial enterprises