JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012.

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

VO Sandpit, November 2009 CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA)
STFC and the UK e-Infrastructure Initiative The Hartree Centre Prof. John Bancroft Project Director, the Hartree Centre Member, e-Infrastructure Leadership.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
QCloud Queensland Cloud Data Storage and Services 27Mar2012 QCloud1.
Martin Hamilton, Centre Manager − hpc-midlands.ac.uk HPC Midlands Cloud Supercomputing for Academia and Industry.
VO Sandpit, November 2009 NERC Big Data And what’s in it for NCEO? June 2014 Victoria Bennett CEDA (Centre for Environmental Data Archival)
Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Technology on the NGS Pete Oliver NGS Operations Manager.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Open Source Grid Computing in the Finance Industry Alex Efimov STFC Kite Club Knowledge Exchange Advisor UK CERN Technology Transfer Officer
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013.
CEMS: The Facility for Climate and Environmental Monitoring from Space Victoria Bennett, ISIC/CEDA/NCEO RAL Space.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Sandor Acs 05/07/
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
The Birmingham Environment for Academic Research Setting the Scene Peter Watkins, School of Physics and Astronomy (on behalf of the Blue Bear team)
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Large Scale Parallel File System and Cluster Management ICT, CAS.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
VO Sandpit, November 2009 e-Infrastructure for Climate and Atmospheric Science Research Dr Matt Pritchard Centre for Environmental Data Archival (CEDA)
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
The National Grid Service Mike Mineter
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
Introduction to Hartree Centre Resources: IBM iDataPlex Cluster and Training Workstations Rob Allan Scientific Computing Department STFC Daresbury Laboratory.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI A pan-European Research Infrastructure supporting the digital European Research.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
RI EGI-InSPIRE RI Pre-OMB meeting Preparation for the Workshop “EGI towards H2020” NGI_UK John Gordon and.
Advanced Computing Facility Introduction
Accessing the VI-SEEM infrastructure
EonNAS.
Appro Xtreme-X Supercomputers
JASMIN Success Stories
GridPP Tier1 Review Fabric
The National Grid Service
Presentation transcript:

JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Outline STFC Compute and Data National and International Services Summary

Isaac Newton Group of Telescopes La Palma UK Astronomy Technology Centre Edinburgh Polaris House Swindon, Wiltshire Chilbolton Observatory Stockbridge, Hampshire Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire Joint Astronomy Centre Hawaii Rutherford Appleton Laboratory Harwell Oxford Science and Innovation Campus 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing What we do…. The nuts and bolts that make it work enable scientists, engineers and researcher to develop world class science, innovation and skills

SCARF 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Providing Resources for STFC Facilities, Staff and their collaborators ~2700 Cores Infiniband Panasas filesystem Managed as one entity ~50 peer reviewed publications/year Additional capacity per year for general use Facilities such as CLF add capacity using their own funds National Grid Service partner Local access using Myproxy-SSO Users use federal id and password to login UK e-Science Certificate access

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing NSCCS (National Service Computational Chemistry Software) Providing National and International Compute, Training and support EPSRC Mid-Range Service –SGI Altix UV SMP system, 512 CPUs, 2TB shared memory Large memory SMP chosen over a traditional cluster as this best suites the Computational Chemistry Applications Supports over 100 active users –~70 peer reviewed papers per year –Over 40 applications installed Authentication using NGS technologies Portal to submit jobs –access for less computationally aware chemists

Tier-1 Architecture 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing CPU ATLAS CASTOR CMS CASTOR LHCB CASTOR GEN CASTOR SJ5 Storage Pools >8000 processor cores >500 disk servers (10PB) Tape robot (10PB) >37 dedicated T10000 tape drives (A/B/C) OPN

E-infrastructure South 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Consortium of UK universities Oxford, Bristol, Southampton, UCL Formed the Centre for Innovation With STFC as a partner Two New Services (£3.7M) IRIDIS – Southampton – x86-64 EMERALD – STFC – GPGPU Cluster Part of larger investment in e-infrastructure A Midland Centre of Excellence (£1M). Led by Loughborough University West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led by the University of Strathclyde E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester MidPlus: A Centre of Excellence for Computational Science, Engineering and Mathematics (£1.6 M). Led by the University of Warwick

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Providing Resources to Consortium and partners Consortium of UK universities Oxford, Bristol, Southampton, UCL, STFC Largest production GPU facility in UK 372 Nvidia Telsa M2090 GPUs Scientific Applications Still under discussion Computational Chemistry front runners AMBER NAMD GROMACS LAMMPS Eventually 100’s of applications covering all sciences EMERALD

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing 6 racks

EMERALD HARDWARE I 15 x SL6500 chassis: –4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090 GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW SL6500 scalable line chassis 4 x 1200W power supplies, 4 fans 4 x 2U, half-width SL390 servers –SL390s nodes 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) 3 x NVidia M2090 GP-GPUs (512 CUDA cores) 48GB DDR-3 memory 1 HDD 146GB SAS 15k drive HP QDR Infiniband & 10GbE ports Dual 1Gb network ports 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

EMERALD HARDWARE II 12 x SL6500 chassis, –2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090 GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW. Twelve Chassis SL6500 scalable line chassis 4 x 1200W power supplies, 4 fans 2 x 4U, half-width SL390 servers –SL390s nodes 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) 8 x NVidia M2090 GP-GPUs (512 CUDA cores) 96GB DDR-3 memory 1 HDD 146GB SAS 15k drive HP QDR Infiniband & 10GbEthernet Dual 1Gb network ports 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

EMERALD 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing System Applications RedHat Enterprise 6.x Platform LSF CUDA tool kit SDK and libraries Intel and Portland Compilers Scientific Applications Still under discussion Computational Chemistry front runners AMBER NAMD GROMACS LAMMPS Eventually 100s of applications covering all sciences

EMERALD 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Managing a GPU cluster GPUs are more power efficient and give more Gflops/Watt than x86_64 servers Reality……True……But each 4 U Chassis: ~1.2 kW/U space Full rack required 40+ kW! Hard to cool Additional in row coolers Cold aisle containment Uneven power demand Stresses aircon and power infrastructure 240 GPU job 31kW Cluster idle to 80kW instantly Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W

CEDA data storage & services Curated data archive Archive management services Archive access services (HTTP, FTP, Helpdesk,...) Data intensive scientific computing Global / regional datasets & models High spatial, temporal resolution Private cloud Flexible access to high-volume & complex data for climate & earth observation communities Online workspaces Services for sharing & collaboration JASMIN/CEMS 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Deadline (or funding gone!): 31 st March 2012 for “doing science” Government Procurement : £5M Tender to order < 4 weeks Machine room upgrades + Large Cluster compete for time Bare floor to operation in 6 weeks 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL “Doing science” 14 th March 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC 600TB) Oct Mar-2012 BIS Funds  Tender  Order  Build  Network  Complete JASMIN/CEMS 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

JASMIN/CEMS at RAL - 12 Racks w. Mixed Servers and Storage - 15KW/rack peak (180KW Total) - Enclosed cold aisle + in-aisle cooling - 600kg / rack (7.2 Tonnes total) - Distributed 10Gb network -(1 Terabit/s bandwidth) - Single 4.5PB global file system - Two VMware vSphere pools of servers with dedicated image storage. - 6 Weeks bare floor to working 4.6PB. 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

JASMIN / CEMS Infrastructure Configuration: Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total). Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers Physical: 12 Racks. Enclosed aisle, in-row chillers Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs) High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute Single Namespace Solution: one single file system, managed as one system Status: The largest Panasas system in the world and one of the largest storage deployments in the UK 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

JASMIN/CEMS Networking Gnodal 10Gb Networking –160 x 10Gb Ports in a 4 x GS4008 switch stack Compute 23 Dell servers for VM hosting (VMware vCentre + vCloud) and HPC access to storage. 8 Dell Servers for compute Dell Equallogic iSCSI arrays (VM images) All 10Gb connected. Already upgraded 10Gb network to add 80 more Gnodal 10Gb ports Compute expansion 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

What is Panasas Storage? “A complete hardware and software storage solution” Ease of Management –Single Management Console for 4.6PB Performance –Parallel access via DirecFlow, NFS, CIFS –Fast Parallel reconstruction ObjectRAID –All files stored as objects. –RAID level per file –Vertical, Horizontal and network parity Distributed parallel file system –Parts (objects) of files on every blade –All blades transmit/receive in parallel Global Name Space Battery UPS –Enough to shut down cleanly. 1x 10Gb Uplink per shelf –Performance scales with size Director Blade Storage Blades 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

PanActive Manager 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Panasas in Operation Reliability –1133 Blades – 206 Power Supplies – 103 Shelf Network switches –1442 components Soak testing revealed 27 faults In Operation 7 faults –No loss of service –~0.6% failure per year –Compared to commodity storage ~5% per year 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Performance –Random IO 400MB/s per host –Sequential IO 1Gbyte/s per host External Performance –10Gb connected –Sustained 6Gp/s

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Backups System and User Data SVN Codes and documentation Monitoring Ganglia, Cacti, Power-management Alerting Nagios Security Intrusion detection, patch monitoring Deployment Kickstart, LDAP, inventory database VMware Server consolidation,extra resilience 150+ Virtual servers Supporting all e-Science activities Development Cloud ~ Infrastructure Solutions Systems Management

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing e-Infrastructures Lead role in National and International e-infrastructures Authentication Lead and Develop UK e-Science Certificate Authority Total issued ~30,000 Current~3000 Easy integration of UK Access Management Federation Authorisation Use existing EGI tools Accounting Lead and develop EGI APEL accounting 500M Records, 400GB data ~282 Sites publish records ~12GB/day loaded into the main tables Usually 13 months but Summary data since 2003 Integrated into existing HPC style services

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing e-Infrastructures Lead role in National and International e-infrastructures User Management Lead and develop NGS UAS Service Common portal for project owners Manage Project and User Allocations Display trends, make decisions (policing) Information, what services are available? Lead and develop the EGI information portal GOCDB 2180 registered GOCDB users belonging to 40 registered NGIs 1073 registered sites hosting a total of 4372 services downtime entries entered via GOCDB Training & Support Training Market place tool developed to promote training opportunities, resources and materials SeIUCCR Summer Schools Supporting 30 students for 1 week Course (120 Applicants)

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Summary High Performance Computing and Data SCARF NSCCS JASMIN EMERALD GridPP – Tier1 Managing e-Infrastructures Authentication, Authorisation, Accounting Resource discovery User Management, help and Training

19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Information Website Contact: Pete Oliver peter.oliver at stfc.ac.uk Questions?