Fabric Management at CERN BT July 16 th 2002 CERN.ch.

Slides:

Advertisements

Similar presentations

Clustering Technology For Scaleability Jim Gray Microsoft Research

Advertisements

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.

CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.

Cross Platform Single Sign On using client certificates Emmanuel Ormancey, Alberto Pace Internet Services group CERN, Information Technology department.

Introduction to DBA.

Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data

DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance

1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.

Merrill Holt Director Parallel Server Product Management Oracle Corporation.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Red Hat Linux Network. Red Hat Network Red Hat Network is the environment for system- level support and management of Red Hat Linux networks. Red Hat.

12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status

Features Scalability Availability Latency Lifecycle Data Integrity Portability Manage Services Deliver Features Faster Create Business Value.

Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.

Managing Mature White Box Clusters at CERN LCW: Practical Experience Tim Smith CERN/IT.

Weekly Report By: Devin Trejo Week of June 7, 2015-> June 13, 2015.

Chapter Fourteen Windows XP Professional Fault Tolerance.

1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.

October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.

SIOS – Comprehensive High Availability Options for your VMware Environment.

ENTERPRISE COMPUTING QUIZ By: Lean F. Torida

Server Systems Administration. Types of Servers Small Servers –Usually are PCs –Need a PC Server Operating System (SOS) such as Microsoft Windows Server,

Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.

INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.

Database Administration COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.

Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.

BOSS Business Objects Shared Service Steve Rademacher – June 2009.

CERN - IT Department CH-1211 Genève 23 Switzerland The Tier-0 Road to LHC Data Taking CPU ServersDisk ServersNetwork FabricTape Drives.

Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.

Server Performance, Scaling, Reliability and Configuration Norman White.

CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.

Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.

Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP

May http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.

CERN.ch 1 Issues  Hardware Management –Where are my boxes? and what are they?  Hardware Failure –#boxes  MTBF + Manual Intervention = Problem!

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.

Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.

CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.

Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.

Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype

David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.

Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.

CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.

INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.

Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.

Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.

EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.

Information Initiative Center, Hokkaido University North 11, West 5, Sapporo , Japan Tel, Fax: General.

Installation Guacamole Is a web application that provides access to desktop environments using remote desktop protocols (such as VNC or RDP); Installation.

Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.

1 Policy Based Systems Management with Puppet Sean Dague

CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland.

Administration Tools Cluster.exe is a command line tool that you can use for scripting or remote administration through slow WAN links. Cluadmin.exe is.

Douglas Potter IBI Minneapolis User Group November 2008

High Availability 24 hours a day, 7 days a week, 365 days a year…

Monitoring and Fault Tolerance

Maximum Availability Architecture Enterprise Technology Centre.

Status and plans of central CERN Linux facilities

CERN Certificates platform Emmanuel Ormancey / Anatoly Gladkov

Hadoop Technopoints.

The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:

Unit 2: Fundamentals of Computer Systems

Presentation transcript:

Fabric Management at CERN BT July 16 th 2002 CERN.ch

CERN.ch The Problem ~6,000 PCs Another ~1,000 boxes Only 1/3 rd of the total capacity is at CERN… Grid Computing. c.f. ~1,500 PCs and ~150 disk servers at CERN today.

CERN.ch The Past Automated management tools developed to handle multi-architecture clusters with few tens of nodes. Good points –Much automation –Solid set of tools –Much accumulated experience Bad points –Cant cope with number of systems we have today –Configuration information stored in multiple locations –Monitoring at system level, but users see service failures.

CERN.ch Where we are going Use Linux standards –RPM, LSB, … Single location(/interface) for configuration information –Which nodes in which clusters –Node roles, states, required software –Personnel roles (who is allowed to perform what) Better Installation tools –Guaranteed reproducibility across nodes and over time –Making use of configuration information »Multiple distinct system images Service level monitoring –Making use of configuration information State Management for –System reconfiguration requests »Both system upgrades and reconfigurations to reflect workload changes –Automatic recovery procedures (and non-automatic if necessary…)