Download presentation
Presentation is loading. Please wait.
Published byJustin Harper Modified over 8 years ago
1
GRACE at UCL
2
2 www.ucl.ac.uk/research-it-services When one size can't fit all: Scalable HPC For Research Delivery ISD/RITS/RCPS - Owain Kenway Grace/Legion/Software Stack/Legion DI
3
3 www.ucl.ac.uk/research-it-services State of Research Computing Services: Legion Legion has been UCL's primary local compute resource since 2007. Almost none of the original hardware is still in service. Gradual upgrade over time. Absorbing other services. 7 year old core network technology – 1G Ethernet
4
4 www.ucl.ac.uk/research-it-services State of Research Computing Services: Legion Gradual upgrade over time means service is fragmented: 8 Different node types! Some have Infiniband, some don't! PIs buy the hardware they need.
5
5 www.ucl.ac.uk/research-it-services Parallel vs Serial In general: Iridis 3 → parallel Legion → high throughput Parallel Single job spans multiple nodes Tightly coupled parallelisation usually in MPI Sensitive to network performance Currently primarily chemistry, physics, engineering High throughput Lots (tens of thousands) of independent jobs on different data High I/O Currently, primarily biosciences and physics In the future, digital humanities
6
6 www.ucl.ac.uk/research-it-services Input DataOutput Data Many processes on many processors work simultaneously + communicate between each other Parallel
7
7 www.ucl.ac.uk/research-it-services Many processes, operate independently of each other and in any order Input Data Output Data High Throughput
8
8 www.ucl.ac.uk/research-it-services Iridis Retirement In summer 2015, Southampton were due to retire Iridis This means that we would lose ~71 TeraFlops of compute capacity. And the ability to run large parallel jobs! We also wanted to retire the original Legion hardware which was 7 years old! Losing another 20 TeraFlops Luckily, we had £1.5 million to spend!
9
9 www.ucl.ac.uk/research-it-services State of Research Computing Services: Grace Grace went “into service” on the 2 nd December 2015.Complete new service for parallel compute. All nodes are connected to storage by 40 gigabit infiniband. Infiniband is primary network in the cluster (IP over IB – looks like a “normal” network). Designed with network capacity to double size over time.
10
10 www.ucl.ac.uk/research-it-services To replace UCL's Iridis 3 service and retired Legion nodes we required ~90 TeraFlops sustained Grace was benchmarked at ~180 TeraFlops
11
11 www.ucl.ac.uk/research-it-services GraceLegion
12
12 www.ucl.ac.uk/research-it-services Legion/Grace have a common software stack. Red Hat Enterprise Linux + Son of Grid Engine + Environment modules Common set of Compilers (so you can compile your own code) Libraries Applications It's likely the application you use is already available or we can install it for you Scripted builds of applications (so we can easily install new versions for you) xCAT management software (which allows us to manage the cluster) Easy to move between the services (you have the same environment on both machines)
13
13 www.ucl.ac.uk/research-it-services Wherever possible the UCL Research Computing Platform Services Team's work is Open Source and on Github: https://github.com/UCL-RITS/rcps-buildscripts https://github.com/UCL-RITS/rcps-modulefiles You can deploy it on your resources/desktop (application licenses permitting)
14
14 www.ucl.ac.uk/research-it-services The Future – Legion “Data Intensive” Although Legion now does only high throughput computing, it's not designed for it. Some issues with I/O We need to retire some old hardware. So the next major upgrade is re- designing Legion for HTC. Replace old “Nehalem” nodes. Replace/upgrade 1G Ethernet I/O subsystem. Local mirroring of common datasets. Coming ~summer 2017! The then current iteration of the software stack.
15
15 www.ucl.ac.uk/research-it-services None of this would have been possible without: UCL: Dr Ian Kirker, Heather Kelly, Brian Alston, Thomas Jones, Luke Sudbery, William Hay, Colin Byelong, Prof. Dario Alfe, Dr Javier Herrero, Dr Jörg Saßmannshausen, Mike Atkins, Greg Dyer OCF/Lenovo/DDN Georgina Ellis, Arif Ali, Jagjit Reehal, Jim Roche, Richard Mansfield and certainly many, many others. THANKS!
16
Grace has effectively doubled the capacity for parallel compute available to researchers at UCL Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut efficitur ipsum vitae tortor accumsan, a pulvinar lorem lacinia. Donec eu arcu justo. Fusce eget consequat risus Proin est lacus, interdum vitae feugiat quis, faucibus vel mi. Vivamus accumsan nisi vel nulla viverra semper. Donec purus enim, sollicitudin vitae porta a, commodo sodales justo. Sed iaculis rutrum molestie. Visit www. ucl.ac.uk/research-it-services/grace to download these slides after the event. What did you think? Join the conversation on Twitter with #GraceAtUCL. Don’t forget to follow us for access to the event video and today’s polling results.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.