High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.

Slides:



Advertisements
Similar presentations
1 Dynamic DNS. 2 Module - Dynamic DNS ♦ Overview The domain names and IP addresses of hosts and the devices may change for many reasons. This module focuses.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Harvard University Oracle Database Administration Session 2 System Level.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
Cluster Computing - GCB1 Cluster Computing Javier Delgado Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi.
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Chapter 7 Configuring & Managing Distributed File System
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
This courseware is copyrighted © 2011 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
Module 13: Configuring Availability of Network Resources and Content.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Open Source Cluster Applications Resources. Overview What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Chapter Fourteen Windows XP Professional Fault Tolerance.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
Components of Database Management System

June 6 th – 8 th 2005 Deployment Tool Set Synergy 2005.
Module 5: Designing a Terminal Services Infrastructure.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Introduction to the Adapter Server Rob Mace June, 2008.
A Guide to Oracle9i1 Database Instance startup and shutdown.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Beowulf Software. Monitoring and Administration Beowulf Watch 
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
1 TOPIC 6 DATABASE 6.1 Introduction to Database 6.2 Basic Concept of Database 6.3 Database Object DATABASE.
High Availability in DB2 Nishant Sinha
Linux Operations and Administration
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
ETRI Site Introduction Han Namgoong,
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Next Generation of Apache Hadoop MapReduce Owen
Seminar On Rain Technology
Oracle 10g Administration Oracle Server Introduction Copyright ©2006, Custom Training Institute.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
Advanced Network Administration Computer Clusters.
High Availability Linux (HA Linux)
Module Overview Installing and Configuring a Network Policy Server
Secrets to Fast, Easy High Availability for SQL Server in AWS
Consulting Services JobScheduler Architecture Decision Template
Network Load Balancing
Maximum Availability Architecture Enterprise Technology Centre.
Computing Experience…
TYPES OF SERVER. TYPES OF SERVER What is a server.
湖南大学-信息科学与工程学院-计算机与科学系
Design Unit 26 Design a small or home office network
CLUSTER COMPUTING.
Distributed computing deals with hardware
Module 1: Overview of Systems Management Server 2003
Presentation transcript:

High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman

Outline Installation Installation 2 Management 3 Security 4 Administration 5 Backgroud Backgroud 1

DIY Supercomputer HPC = Computer + Network + OS + Management Software

Background Introduction Clemson Palmetto 12,392 cores TeraFlops TOP1: Tianhe-1A (China) 186, 368 cores 4,701 TeraFlops

HPC Network Topology 3 Set of Networks Management Parallel Computing Storage Centralized Storage

Installation Easy Management Batch OS install Batch software install

Management Cluster Management Partition a cluster into multiple logical computers Maps logical computers (clusters) onto servers (nodes) Multiple independent OS configurations Manages and monitors logical computer (clusters) status Cluster status to management system Job scheduling and management Manages and monitors operating system instances (nodes) Node status to management system System Management Management of overall system configuration Redundant management servers with automatic failover Designed to anticipate and tolerate failures

Management Server Management Automatic discovery of server hardware Remote server control (Power On/Off, Cycle) Scalable fast diskless or data-less booting for large node count systems Server redundancy and failover Provides server status to the management system Network Management Automatic discovery of interconnect hardware Multiple interconnect fabric topologies Redundant paths and networks Load balancing and failover Network status to the management system Storage Management Scalable root file systems for diskless or data-less nodes Multiple global storage configurations High BW to secondary storage for data and check pointing Provides server status to the management system

Security Control Model

Administration ( C3 Tool Suite ) cexec: executes any standard command on all cluster nodes e.g. cexec mkdir /tmp ckill: terminates a user specified process on all cluster nodes e.g. ckill my_program_abc cget: retrieves files or directories from all cluster nodes cpush: distribute files or directories to all cluster nodes cpushimage: update the system image on all cluster nodes using an image captured by the System Imager tool crm: remove files or directories from all cluster nodes cshutdown: shutdown or restart all cluster nodes cnum: returns a node range number based on node name cname: returns node names based on node ranges clist: returns all clusters and their type in a configuration file 'Cluster Command & Control' (C3)

Other Administration Tools System Installation Suite (SIS) : I Install the client nodes. SIS also provides the database from which OSCAR obtains its cluster configuration information. The main concept to understand about SIS is that it is an image based install tool. An image is basically a copy of all the files that get installed on a client. This image is stored on the server and can be accessed for customizations or updates. You can even chroot into the image and perform builds. Switcher Environment Manager: Provide a simple mechanism to allow users to manipulate their environment Provide a simple mechanism to allow users to manipulate their environment

References [1] wiki/InstallGuideIntroduction. wiki/InstallGuideIntroduction. [2] M.J. Brim, T.G. Mattson, "OSCAR: Open Source Cluster Application Resources".. Cluster Application Resources".. [3] B.Luethke, S. Scott and T. Naughton, "OSCAR Cluster Administration With C3". Administration With C3". [4] C3,

Question?