Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.

Slides:



Advertisements
Similar presentations
What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
Advertisements

IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Distributed Processing, Client/Server and Clusters
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
2. Computer Clusters for Scalable Parallel Computing
Beowulf Supercomputer System Lee, Jung won CS843.
Introduction to DBA.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Technical Architectures
What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Computer System Architectures Computer System Software
Chapter 2. Creating the Database Environment
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Using NAS as a Gateway to SAN Dave Rosenberg Hewlett-Packard Company th Street SW Loveland, CO 80537
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Components of a Sysplex. A sysplex is not a single product that you install in your data center. Rather, a sysplex is a collection of products, both hardware.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
VMware vSphere Configuration and Management v6
What is virtualization? virtualization is a broad term that refers to the abstraction of computer resources in order to work with the computer’s complexity.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Background Computer System Architectures Computer System Software.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Page 1 Cloud Computing JYOTI GARG CSE 3 RD YEAR UIET KUK.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Advanced Network Administration Computer Clusters.
Unit 3 Virtualization.
Introduction to VMware Virtualization
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
High Availability Linux (HA Linux)
VirtualGL.
Network Configurations
Grid Computing.
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
Cloud Computing.
Chapter 1: Introduction
Overview Introduction VPS Understanding VPS Architecture
CLUSTER COMPUTING.
Distributed computing deals with hardware
LO2 – Understand Computer Software
Presentation transcript:

Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik

Computer Cluster Computer cluster consists of connected computers/servers/resources acting like a single system. System capable of performing tasks previously delegated to machines costing hundreds of thousands to millions of dollar. More efficient resource utilization. General benefit of a flexible research-computing infrastructure that can be tuned and adapted to meet changing research and user demands. Areas of scientific inquiry previously discarded as impossible are now feasible.

Biology and cluster configuration Cluster configuration influenced by intended application mix. In bioinformatics, far more commonly seen are serial computing problems, also referred to as “embarrassingly parallel”. These type of problems can be broken down into a series of independent steps each of which can be completed in any order without affecting result. For example, large scale bioinformatics sequence analysis, experimentation.

Biology and cluster configuration Sequence analysis –compare one sequence against a database of many sequences. –vastly increase performance by simply dividing the query sequences up and running multiple searches at the same time on separate machines. Experimentation –run slight variations of same program thousands or millions of times in a row. –this can be dealt with via loosely coupled compute clusters, also known as compute farms. Do not need High performance clusters, like Beowulf –good for tightly coupled problems. The large number of embarrassingly parallel problems is primary driver for widespread adoption of clusters.

A typical biocluster 12N Software based distributed resource management (DRM) Ethernet network Small inexpensive servers Users

A typical biocluster

Portal architecture Public local area network Private cluster network Cluster compute elements File server Portal machine aka ‘master’, ‘head’ or ‘login’ node

Design considerations Reliability, Availability, and Security –compute nodes should be anonymous and interchangeable to support non-disruptive troubleshooting, maintenance and upgrade activities. –critical failure points such as fileservers and portal machines need to be duplicated or made resilient to failure. Flexibility and Scalability –multiple competing users, workflows, projects should be supported simultaneously. Manageability –administrative overhead should be minimized, this requires methodologies for automating or reducing administration tasks. –software DRM layer needs to ensure that business and scientific priorities can dynamically alter allocation of computing resources.

Pre purchase decisions DRM –simplifies interaction with cluster from both user level and administration level. –important decision. –most commonly seen DRM software suites for life sciences run Sun Grid Engine (SGE) or Platform (LSF). –when it comes to flexible yet sophisticated resource sharing and job scheduling needs, especially among many different groups or projects, LSF still has edge with respect to functionality and ease of configuration. –installing a sophisticated Grid Engine configuration can be an adventure. –experience suggests that LSF requires least amount of resources to install, configure and maintain over time.

Choosing hardware Science and scientific application demands should drive hardware configuration. Absence of specific application benchmarks. Dual processor Intel Xeon based servers for compute node configuration. Networking technology –Switched Gigabit Ethernet is affordable and should be the default interconnect for cluster systems. –alternative cluster interconnects such as InfiniBand and Myrinet offer higher performance, but large cost and lack of existing life science application codes capable of benefiting from such technologies.

Choosing hardware Storage –speed of cluster storage is usually a performance bottleneck. –network storage: computing-storage devices that can be accessed over a computer network, rather than directly being connected to the computer, e.g. NAS (network attached storage), SAN (storage area network) or hybrid architectures. –use of large internal disk drives within each compute node to cache data needed for data intensive cluster jobs.

Deploying, monitoring and management Maintenance methodology –if a cluster node enters a faulted state, the node is wiped and reinstalled via the network. –the power to the host is cycled. –if the node fails to successfully rejoin the cluster, it is disabled and considered failed. –it is replaced later at the convenience of operator. Prepackaged methods for handling remote unattended operating system installations and rebuilds, e.g. SystemImager for linux clusters, NetBoot for Apple hardware and Mac OS X.

Conclusions Building high quality clusters for use in computational biology is a non-trivial task. It is important to – understand user and application requirements. –actively participate in DRM selection process. –avoid fixation on raw price/ performance figures that might not reflect the true costs of deploying, managing and supporting distributed systems. –beware of total solutions.

Questions & discussion How difficult or easy it is to detect failure modes – hardware, code, process? How difficult it is to have cluster with mixed architecture nodes?