Download presentation
Presentation is loading. Please wait.
Published byDenis Cameron Modified over 9 years ago
1
JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012
2
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Outline STFC Compute and Data National and International Services Summary
3
Isaac Newton Group of Telescopes La Palma UK Astronomy Technology Centre Edinburgh Polaris House Swindon, Wiltshire Chilbolton Observatory Stockbridge, Hampshire Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire Joint Astronomy Centre Hawaii Rutherford Appleton Laboratory Harwell Oxford Science and Innovation Campus 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
4
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing What we do…. The nuts and bolts that make it work enable scientists, engineers and researcher to develop world class science, innovation and skills
5
SCARF 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Providing Resources for STFC Facilities, Staff and their collaborators ~2700 Cores Infiniband Panasas filesystem Managed as one entity ~50 peer reviewed publications/year Additional capacity per year for general use Facilities such as CLF add capacity using their own funds National Grid Service partner Local access using Myproxy-SSO Users use federal id and password to login UK e-Science Certificate access
6
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing NSCCS (National Service Computational Chemistry Software) Providing National and International Compute, Training and support EPSRC Mid-Range Service –SGI Altix UV SMP system, 512 CPUs, 2TB shared memory Large memory SMP chosen over a traditional cluster as this best suites the Computational Chemistry Applications Supports over 100 active users –~70 peer reviewed papers per year –Over 40 applications installed Authentication using NGS technologies Portal to submit jobs –access for less computationally aware chemists
7
Tier-1 Architecture 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing CPU ATLAS CASTOR CMS CASTOR LHCB CASTOR GEN CASTOR SJ5 Storage Pools >8000 processor cores >500 disk servers (10PB) Tape robot (10PB) >37 dedicated T10000 tape drives (A/B/C) OPN
8
E-infrastructure South 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Consortium of UK universities Oxford, Bristol, Southampton, UCL Formed the Centre for Innovation With STFC as a partner Two New Services (£3.7M) IRIDIS – Southampton – x86-64 EMERALD – STFC – GPGPU Cluster Part of larger investment in e-infrastructure A Midland Centre of Excellence (£1M). Led by Loughborough University West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led by the University of Strathclyde E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester MidPlus: A Centre of Excellence for Computational Science, Engineering and Mathematics (£1.6 M). Led by the University of Warwick
9
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Providing Resources to Consortium and partners Consortium of UK universities Oxford, Bristol, Southampton, UCL, STFC Largest production GPU facility in UK 372 Nvidia Telsa M2090 GPUs Scientific Applications Still under discussion Computational Chemistry front runners AMBER NAMD GROMACS LAMMPS Eventually 100’s of applications covering all sciences EMERALD
10
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing 6 racks
11
EMERALD HARDWARE I 15 x SL6500 chassis: –4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090 GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW SL6500 scalable line chassis 4 x 1200W power supplies, 4 fans 4 x 2U, half-width SL390 servers –SL390s nodes 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) 3 x NVidia M2090 GP-GPUs (512 CUDA cores) 48GB DDR-3 memory 1 HDD 146GB SAS 15k drive HP QDR Infiniband & 10GbE ports Dual 1Gb network ports 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
12
EMERALD HARDWARE II 12 x SL6500 chassis, –2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090 GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW. Twelve Chassis SL6500 scalable line chassis 4 x 1200W power supplies, 4 fans 2 x 4U, half-width SL390 servers –SL390s nodes 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) 8 x NVidia M2090 GP-GPUs (512 CUDA cores) 96GB DDR-3 memory 1 HDD 146GB SAS 15k drive HP QDR Infiniband & 10GbEthernet Dual 1Gb network ports 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
13
EMERALD 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing System Applications RedHat Enterprise 6.x Platform LSF CUDA tool kit SDK and libraries Intel and Portland Compilers Scientific Applications Still under discussion Computational Chemistry front runners AMBER NAMD GROMACS LAMMPS Eventually 100s of applications covering all sciences
14
EMERALD 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Managing a GPU cluster GPUs are more power efficient and give more Gflops/Watt than x86_64 servers Reality……True……But each 4 U Chassis: ~1.2 kW/U space Full rack required 40+ kW! Hard to cool Additional in row coolers Cold aisle containment Uneven power demand Stresses aircon and power infrastructure 240 GPU job 31kW Cluster idle to 80kW instantly Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W
15
CEDA data storage & services Curated data archive Archive management services Archive access services (HTTP, FTP, Helpdesk,...) Data intensive scientific computing Global / regional datasets & models High spatial, temporal resolution Private cloud Flexible access to high-volume & complex data for climate & earth observation communities Online workspaces Services for sharing & collaboration JASMIN/CEMS 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
16
Deadline (or funding gone!): 31 st March 2012 for “doing science” Government Procurement : £5M Tender to order < 4 weeks Machine room upgrades + Large Cluster compete for time Bare floor to operation in 6 weeks 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL “Doing science” 14 th March 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC 600TB) Oct 2011... 8-Mar-2012 BIS Funds Tender Order Build Network Complete JASMIN/CEMS 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
17
JASMIN/CEMS at RAL - 12 Racks w. Mixed Servers and Storage - 15KW/rack peak (180KW Total) - Enclosed cold aisle + in-aisle cooling - 600kg / rack (7.2 Tonnes total) - Distributed 10Gb network -(1 Terabit/s bandwidth) - Single 4.5PB global file system - Two VMware vSphere pools of servers with dedicated image storage. - 6 Weeks bare floor to working 4.6PB. 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
18
JASMIN / CEMS Infrastructure Configuration: Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total). Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers Physical: 12 Racks. Enclosed aisle, in-row chillers Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs) High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute Single Namespace Solution: one single file system, managed as one system Status: The largest Panasas system in the world and one of the largest storage deployments in the UK 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
19
JASMIN/CEMS Networking Gnodal 10Gb Networking –160 x 10Gb Ports in a 4 x GS4008 switch stack Compute 23 Dell servers for VM hosting (VMware vCentre + vCloud) and HPC access to storage. 8 Dell Servers for compute Dell Equallogic iSCSI arrays (VM images) All 10Gb connected. Already upgraded 10Gb network to add 80 more Gnodal 10Gb ports Compute expansion 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
20
What is Panasas Storage? “A complete hardware and software storage solution” Ease of Management –Single Management Console for 4.6PB Performance –Parallel access via DirecFlow, NFS, CIFS –Fast Parallel reconstruction ObjectRAID –All files stored as objects. –RAID level per file –Vertical, Horizontal and network parity Distributed parallel file system –Parts (objects) of files on every blade –All blades transmit/receive in parallel Global Name Space Battery UPS –Enough to shut down cleanly. 1x 10Gb Uplink per shelf –Performance scales with size Director Blade Storage Blades 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
21
PanActive Manager 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
22
Panasas in Operation Reliability –1133 Blades – 206 Power Supplies – 103 Shelf Network switches –1442 components Soak testing revealed 27 faults In Operation 7 faults –No loss of service –~0.6% failure per year –Compared to commodity storage ~5% per year 19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Performance –Random IO 400MB/s per host –Sequential IO 1Gbyte/s per host External Performance –10Gb connected –Sustained 6Gp/s
23
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Backups System and User Data SVN Codes and documentation Monitoring Ganglia, Cacti, Power-management Alerting Nagios Security Intrusion detection, patch monitoring Deployment Kickstart, LDAP, inventory database VMware Server consolidation,extra resilience 150+ Virtual servers Supporting all e-Science activities Development Cloud ~ Infrastructure Solutions Systems Management
24
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing e-Infrastructures Lead role in National and International e-infrastructures Authentication Lead and Develop UK e-Science Certificate Authority Total issued ~30,000 Current~3000 Easy integration of UK Access Management Federation Authorisation Use existing EGI tools Accounting Lead and develop EGI APEL accounting 500M Records, 400GB data ~282 Sites publish records ~12GB/day loaded into the main tables Usually 13 months but Summary data since 2003 Integrated into existing HPC style services
25
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing e-Infrastructures Lead role in National and International e-infrastructures User Management Lead and develop NGS UAS Service Common portal for project owners Manage Project and User Allocations Display trends, make decisions (policing) Information, what services are available? Lead and develop the EGI information portal GOCDB 2180 registered GOCDB users belonging to 40 registered NGIs 1073 registered sites hosting a total of 4372 services 12663 downtime entries entered via GOCDB Training & Support Training Market place tool developed to promote training opportunities, resources and materials SeIUCCR Summer Schools Supporting 30 students for 1 week Course (120 Applicants)
26
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Summary High Performance Computing and Data SCARF NSCCS JASMIN EMERALD GridPP – Tier1 Managing e-Infrastructures Authentication, Authorisation, Accounting Resource discovery User Management, help and Training
27
19th October 2012JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing Information Website http://www.stfc.ac.uk/SCD Contact: Pete Oliver peter.oliver at stfc.ac.uk Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.