High Availability Planning for Tier-2 Services

Slides:



Advertisements
Similar presentations
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Advertisements

Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
C LOUD C OMPUTING Presented by Ye Chen. What is cloud computing? Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
storage service component
Storage Area Network (SAN)
Virtualization for Cloud Computing
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
By : Nabeel Ahmed Superior University Grw Campus.
Virtual Desktop Infrastructure Solution Stack Cam Merrett – Demonstrator User device Connection Bandwidth Virtualisation Hardware Centralised desktops.
Server Types Different servers do different jobs. Proxy Servers Mail Servers Web Servers Applications Servers FTP Servers Telnet Servers List Servers Video/Image.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
High-Availability Methods Lesson 25. Skills Matrix.
The Three Important Topologies By: Parimal Satashia.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Introduction to VMware Virtualization
Module 9: Configuring Storage
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Improving QAD Availability Through Virtualization Spring 2014 QAD Midwest User Group Meeting March 24, 2014 Itasca, IL Kirk Patten – Strategic Information.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
Simplifying Resource Sharing in Voluntary Grid Computing with the Grid Appliance David Wolinsky Renato Figueiredo ACIS Lab University of Florida.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Spring 2012 Prague, Czech Republic April 23 th, 2012 April 23 th, 2012.
Copyright ©2003 Digitask Consultants Inc., All rights reserved Cluster Concepts Digitask Seminar November 29, 1999 Digitask Consultants, Inc.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Shawn McKee/University of Michigan
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
VMware vSphere Configuration and Management v6
70-412: Configuring Advanced Windows Server 2012 services
WINDOWS SERVER 2003 Genetic Computer School Lesson 12 Fault Tolerance.
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
AGLT2 Site Report Shawn McKee/University of Michigan HEPiX Fall 2012 IHEP, Beijing, China October 14 th, 2012.
Seminar on RAID TECHNOLOGY Redundant Array of Independent Disk By CHANDAN.R 8 TH ISE, 1ap05is013 Under the guidance of Mr.Mithun.B.N, Lecturer,Dept.ISE.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
VM Technologies (and others) for Site and Service Resiliency Shawn McKee/University of Michigan OSG AHM - Lincoln, Nebraska March 19 th, 2012.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
AGLT2 Info on VM Use Shawn McKee/University of Michigan USATLAS Meeting on Virtual Machines and Configuration Management June 15 th, 2011.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Bentley Systems, Incorporated
Agenda Hardware Virtualization Concepts
Services DFS, DHCP, and WINS are cluster-aware.
Software Defined Storage
HP ArcSight ESM 6.8c HA Fail Over Illustrated
Unit OS10: Fault Tolerance
Introduction to Networks
Introduction to Networks
Storage Virtualization
GGF15 – Grids and Network Virtualization
RAID RAID Mukesh N Tekwani
Windows Server 2016 Software Defined Storage
Fault Tolerance Distributed Web-based Systems
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

High Availability Planning for Tier-2 Services Shawn McKee/AGLT2 USATLAS Tier-2/Tier-3 Workshop Ann Arbor, May 27th, 2008

Motivation Within the USATLAS facilities we have ever increasing Tier-2 capability, soon to average >1000 processors / Tier-2 Loss of critical services can total disrupt running jobs and the overall system Wouldn’t it be nice to have a more “robust” system for enabling our critical services? At AGLT2 we are looking at both VMware ESX and Xen Enterprise as possible solutions for a base system Robust service options could be layered on top…

Hardware Level We need to build a resilient(redundant) hardware basis to support the higher-level services To do this we ordered a matched set of servers, an iSCSI backend storage system and a management node All systems have redundant power supplies, network connections, network switches, RAID5 or RAID 1 disk configurations and UPS based power. High-availability is easiest. Loss of one physical server causes the other to instantiate any missing servers. Fault-tolerance (“resilience”) is more difficult…

Diagram of AGLT2 Test Platform

Concepts for Fault Tolerance Make sure stateful services store information in “shared” storage areas Use techniques like DRBD (Distributed Replicated Block Device) to distribute critical information between independent locations Evaluate/modify server applications for their ability to fail-over to another running system instance without losing their context. GOAL: Run a resilient Globus Gatekeeper instance which will not lose its running jobs if the underlying physical node crashes or goes offline.

Summary Need to test/evaluate both VMware and Xen to determine which can provide us the most effective environment The FNAL folks have started evaluating resilient grid services (see: http://www2.twgrid.org/event/isgc2008/Presentation%20Meterial/Operation%20&%20Management/Operation%20&%20Management_Eileen%20Berman.pdf ) We need to test these configurations on our systems. We want a very robust basis to build our critical grid services on: GUMS, Globus Gatekeeper, Condor headnode, Kerberos, NIS, AFS, dCache, etc.