Jeremy Maris Research Computing IT Services University of Sussex

Slides:



Advertisements
Similar presentations
Research Computing The Apollo HPC Cluster
Advertisements

ITS Training and Awareness Session Research Support Jeremy Maris & Tom Armour ITS
Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
17th October 2013Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
HPC at IISER Pune Neet Deo System Administrator
Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
DMF Configuration for JCU HPC Dr. Wayne Mallett Systems Manager James Cook University.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
13th October 2011Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
P. Kuipers Nikhef Amsterdam Computer- Technology Nikhef Site Report Paul Kuipers
The CRI compute cluster CRUK Cambridge Research Institute.
CEA DSM Irfu IRFU site report. CEA DSM Irfu HEPiX Fall 0927/10/ Computing centers used by IRFU people IRFU local computing IRFU GRIF sub site Windows.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Patryk Lasoń, Marek Magryś
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.
11th October 2012Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
What’s Coming? What are we Planning?. › Better docs › Goldilocks – This slot size is just right › Storage › New.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Storage at SMU OSG Storage 9/22/2010 Justin Ross Southern Methodist University.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Royal Holloway site report Simon George RAL Jun 2010.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
PSI CMS T3 Status & '1[ ] HW Plan March '16 Fabio Martinelli
Compute and Storage For the Farm at Jlab
Australia Site Report Lucien Boland Goncalo Borges Sean Crosby
NIIF HPC services for research and education
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
JLab “SciPhi-XVI” KNL Cluster
Experience of Lustre at QMUL
What is HPC? High Performance Computing (HPC)
The demonstration of Lustre in EAST data system
Cluster / Grid Status Update
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Heterogeneous Computation Team HybriLIT
DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd
Western Analysis Facility
Oxford University Particle Physics Unix Overview
Appro Xtreme-X Supercomputers
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Experience of Lustre at a Tier-2 site
Daniel Murphy-Olson Ryan Aydelott1
Oxford Site Report HEPSYSMAN
Edinburgh (ECDF) Update
DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE
NGS Oracle Service.
Статус ГРИД-кластера ИЯФ СО РАН.
3.2 Virtualisation.
Small site approaches - Sussex
ALICE Computing Upgrade Predrag Buncic
CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
HEPSYSMAN Summer th May 2019 Chris Brew Ian Loader
RHUL Site Report Govind Songara, Antonio Perez,
HEPSYSMAN Summer June 2017 Ian Loader Chris Brew
Presentation transcript:

Jeremy Maris Research Computing IT Services University of Sussex Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex

Apollo Cluster – people IT Services Jeremy Maris Alhamdu Bello Bernie Broughton Maths and Physical Sciences EPP Vacant (was Matt Raso-Barnett) Albert Asawaroengchai )

)

Apollo Cluster - Aims Shared infrastructure and support from IT Services Fairshare use of central resources Extension of facility by departments Storage (adding Lustre OST, SAN storage) CPU (power paid bt Software Licenses Departments guaranteed 90% exclusive of their nodes. 10% sharing with others, plus back fill of idle time. Enhancement by IT Services as budgets allow )

Apollo Cluster - Data Centre 24 x 45 U Water cooled racks 96 A per rack, 18Kw cooling Current capacity ~ 350Kw Upgrade with another 5 racks + PDU UPS 5 minutes 1MW generator 2 x 10Gb JANET )

Apollo Cluster - Hardware Current total of ~3250 cores. Physics 1216 16 x 64, 8 x12, 4 x 16 + 2 GPU nodes GridPP 304 4 x 64 , 3 x 16 Engineering 400 16 x 16, 2 x 64 4x K40 GPU Informatics 256 4 x64 BSMS 128 2 x 64 Chem 176 16 x8 + 3 x16 Lifesci 128 1 x 64 core + 4 x 16 core Economics 16 1x 16 ITS 456 mainly Intel 12 core nodes 48GB RAM/node 40 TB Home NFS file systems 500 TB Lustre file system (scratch), QDR IB, IPoIB Bright Cluster manager, Univa Grid Engine )

Apollo Cluster - Lustre Patched 2.5.3 on Centos 6 8 OSS, 21 OST R510, R730 + MD1200 and MD1400 Mix of 2, 3 and 6TB disks Subscription to Lustre Community edition $2000 per OSS per annum on 2 x OSS test system Privileged access to repos EDU support area – can see all EDU tickets Can only raise support tickets on test cluster )

Apollo Cluster - Storage Dothill SAN + FalconStor virtualisation Nexenta ZFS research storage (140TB) NFS home filesystems - R510 and R730 Astronomy N body simulation from PRACE R730XD, MD1200, MD3640 100TB - > 400TB NFS storage 12 disk RAID6 volumes + LVM Backup: Legato Networker + LTO5 )

Apollo Cluster - provisioning Bright Cluster Manager 6.1 base Image based system, Puppet tailoring for Grid Lustre, NFS, Grid service nodes Puppet VMs for Grid service nodes SRM has 1Gb, not Infiniband Need real hardware… )

Accounting – 200 active users October 2016 >March 2016

Apollo Cluster – Summer Upgrade HPE procurement – 55 x 2640 v3 nodes, 880 cores Omni-Path half bandwidth tree Lustre router between Truescale and Omni-Path Centos 7.2 for most nodes, SL6 for Grid Bright Cluster Manager 7.1 4 node Hadoop instance Univa Grid Engine 8.4.0 )

Apollo Cluster – Challenges ~ 65% utilisation of cluster – fragmented by dedicated queues Heterogeneous QDR IB Omnipath AMD nodes Intel nodes Use job classes to select appropriate nodes Cgroups to limit/manage resources More use of fairshare Back fill with ATLAS production jobs (~1000 cores..) )