Cluster / Grid Status Update

Slides:



Advertisements
Similar presentations
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Advertisements

Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance.
Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.
Copyright GeneGo CONFIDENTIAL »« MetaCore TM (System requirements and installation) Systems Biology for Drug Discovery.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
UK -Tomato Chromosome Four Sarah Butcher Bioinformatics Support Service Centre For Bioinformatics Imperial College London
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Introduction to DoC Private Cloud
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Cluster Components Compute Server Disk Storage Image Server.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
DIY: Your First VMware Server. Introduction to ESXi, VMWare's free virtualization Operating System.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
NCPHEP ATLAS/CMS Tier3: status update V.Mossolov, S.Yanush, Dz.Yermak National Centre of Particle and High Energy Physics of Belarusian State University.
The CRI compute cluster CRUK Cambridge Research Institute.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Computational Research in the Battelle Center for Mathmatical medicine.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
Willkommen Welcome Bienvenue How we work with users in a small environment Patrik Burkhalter.
Storage at SMU OSG Storage 9/22/2010 Justin Ross Southern Methodist University.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
CFI 2004 UW A quick overview with lots of time for Q&A and exploration.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Compute and Storage For the Farm at Jlab
Experience of Lustre at QMUL
GRID OPERATIONS IN ROMANIA
The Beijing Tier 2: status and plans
White Rose Grid Infrastructure Overview
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
Stuart Wild. Particle Physics Group Meeting, January 2010.
Jeremy Maris Research Computing IT Services University of Sussex
OUHEP STATUS Hardware OUHEP0, 2x Athlon 1GHz, 2 GB, 800GB RAID
Southwest Tier 2 Center Status Report
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Glasgow Site Report (Group Computing)
Experience of Lustre at a Tier-2 site
5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007
Computing Board Report CHIPP Plenary Meeting
Oxford Site Report HEPSYSMAN
Stallo: First impressions
Southwest Tier 2.
GSIAF "CAF" experience at GSI
PES Lessons learned from large scale LSF scalability tests
Virtualization in the gLite Grid Middleware software process
Small site approaches - Sussex
High Performance Computing and TupleSpaces
Presentation transcript:

Cluster / Grid Status Update University of Sussex Emyr James

Contents History of the Cluster Hardware / Architecture Software Inventory Usage Problems So Far... Future Plans

History Money available from grant for Atlas work to pay for hardware & support Design – Jeremy Maris, Alces, Dell Installation : Above + Clustervision Hardware and basic software (OS, SGE, lustre, Atlas) ready to go when I joined

Architecture : Feynman Cluster split into 2 parts – Feynman / Apollo Lustre filesystem shared by both Infiniband interconnect Head Nodes : PowerEdge R610, dual 6 core Xeon 2.67GHz, 48GB Ram (4GB/Core) Feynman Compute Nodes : 8x Dell PowerEdge 410 as above Feynman : 120 Cores, ~420GB RAM

Architecture : Apollo Apollo : 16 Nodes 1-10, 13-16 as feynman 11,12 : PowerEdge R815, 4xAMD Opteron 12 cores @ 2.2GHz, 256GB RAM Total of 288 cores, ~1.2 TB Ram

Architecture : Lustre 2 MDS's – Automatic Failover 3 OSS's each with 3x10TB RAID6 OST's Total of 81T available after formatting Currently 48T used but needs cleaning up Great performance, has been in use with no problems at all for 9 months Also NFS server for home directories

Architecture : Data Centre

Architecture : Data Centre

Software Sun Grid Engine Scheduler Ganglia for Monitoring Bright Cluster Manager Python scripts for viewing queue / accounting

Usage Mainly Atlas at the moment Typical usage : download data to lustre using dq2-get on head node, run Root analysis jobs on compute nodes Analysis done contributed to 2 Atlas papers so far – SUSY related Snoplus group starting to use it CryoEDM developing MC code

Lustre Performance

Problems Using NFS server for job IO – user education and boilerplate scripts LDAP connection issues – use nscd DNS registration issue Lustre RAID Server battery cache Thunder! No problems due to lustre filesystem itself

Future 'Gridification' ongoing to become part of SouthGrid GridPP Investigating procurement options for 60TB addition to lustre FS Adding more nodes – snoplus grant application, pooling money from ITS & other research groups Reorganise into 1 cluster