Download presentation
Presentation is loading. Please wait.
Published byMaud Cooper Modified over 9 years ago
1
ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury Laboratory This ambitious project was the result of an international collaboration linking the USA’s TeraGrid and the UK’s e- Science Grid, jointly funded by NSF and EPSRC. Trans- Atlantic optical bandwidth is supported by British Telecommunications.
2
ANL Royal Society - June 2004 Overview Project Objectives The TeraGyroid scientific experiment Testbed and Partners Applications Porting and RealityGrid Environment Grid Software Infrastructure Visualization Networking What was done Project Objectives - How well did we do? Lesson Learned
3
ANL Royal Society - June 2004 UK-Teragrid HPC Project Objectives Joint experiment combining high-end computational facilities in the UK e-Science Grid (HPCx and CSAR) and the Teragrid sites: –world class computational science experiment –enhanced expertise/ experience to benefit UK and USA –inform construction/operation of national/ international grids –stimulate long-term strategic technical collaboration –support long-term scientific collaborations –experiments with clear scientific deliverables –choice of applications to be based on community codes –inform future programme of complementary experiments
4
ANL Royal Society - June 2004 The TeraGyroid Scientific Experiment High-density isosurface of the late-time configuration in a ternary amphiphilic fluid as simulated on a 64 3 lattice by LB3D. Gyroid ordering coexists with defect-rich, sponge-like regions. The dynamical behaviour of such defect-rich systems can only be studied with very large scale simulations, in conjunction with high- performance visualisation and computational steering.
5
ANL Royal Society - June 2004 The RealityGrid project Mission: “Using Grid technology to closely couple high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.” to predict the realistic behavior of matter using diverse simulation methods LB3D - highly scalable grid based code to model dynamics and hydrodynamics of complex multiphase fluids mesoscale simulations enables access to larger physical and longer timescales RealityGrid environment enables multiple steered and spawned simulations, the visualised output being streamed to a distributed set of collaborators located at AG nodes across the USA and UK.
6
ANL Royal Society - June 2004 Testbed and Project Partners Reality Grid partners: –University College London (Application, Visualisation, Networking) –University of Manchester (Application, Visualisation, Networking) –Edinburgh Parallel Computing Centre (Application) –Tufts University (Application) Teragrid sites at: –Argonne National Laboratory (Visualization, Networking) –National Center for Supercomputing Applications (Compute) –Pittsburgh Supercomputing Center (Compute, Visualisation) –San Diego Supercomputer Center (Compute) UK High-End Computing Services - HPCx run by the University of Edinburgh and CCLRC Daresbury Laboratory (Compute, Networking, Coordination) - CSAR run by the University of Manchester and CSC (Compute and Visualisation)
7
ANL Royal Society - June 2004 Computer Servers ~ 7 TB memory - 5K processors in integrated resource The TeraGyroid project has access to a substantial fraction of the world's largest supercomputing resources, including the whole of the UK's supercomputing facilities and the USA's TeraGrid machines. The largest simulations are in excess of one billion lattice sites.
8
ANL Royal Society - June 2004 Networking Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast DL RAL TeraGrid UK Amsterdam BT provision Netherlight
9
ANL Royal Society - June 2004 Applications Porting LB3D written in Fortran90 Order 128 variables per grid point 1Gpoint = 1TB Various compiler issues to be overcome at different sites Site configuration issues important eg I/O access to high speed global file systems for checkpoint files Connectivity of high-speed file systems to network Multi heading required of several systems to separate control network from data network Port forwarding required for compute nodes on private network
10
ANL Royal Society - June 2004 Exploring parameter space through computational steering Initial condition: Random water/ surfactant mixture. Self-assembly starts. Rewind and restart from checkpoint. Lamellar phase: surfactant bilayers between water layers. Cubic micellar phase, low surfactant density gradient. Cubic micellar phase, high surfactant density gradient.
11
ANL Royal Society - June 2004 Reality Grid - Environment Computations run at HPCx, CSAR, SDSC, PSC and NCSA Visualisation run at Manchester, UCL, Argonne, NCSA, Phoenix Scientists steering calculations from UCL and Boston over Access Grid Visualisation output and collaborations multicast to Phoenix and visualised on the show floor in the University of Manchester booth
12
ANL Royal Society - June 2004 Visualisation servers Amphiphilic fluids produce exotic mesophases with a range of complex morphologies - need visualisation The complexity of these data sets (128 variables) makes visualisation a challenge Using the VTK library, with patches refreshing each time new data available Video stream multicast to Access Grid using FLXmitter library SGI OpenGL Vizserver used to allow remote control of visualisation Visualisation of billion node models requires 64-bit hardware and multiple rendering units Achieved visualisation of 1024 3 lattice using ray- tracing algorithm developed at University of Utah on 100 proc Altix on showroom floor at SC’03
13
ANL Royal Society - June 2004 Grid Software Infrastructure Various versions of Globus Toolkit 2.2.3, 2.2.4, 2.4.3 and 3.1 (including GT 2 compatibility bundles) Used GRAM, GridFTP Globus-I/O - no incompatibilities Not use MDS- robustness/ utility of data 64 bit version of GT2 required for AIX (HPCx) system - some grief due to tendency to require custom-patched versions of third party libraries Lot of system management effort required to work with/ around toolkit Need a more scalable CA system that bypasses every system administrator having to study everyone else’s certificates
14
ANL Royal Society - June 2004 TeraGyroid Network
15
ANL Royal Society - June 2004 VizEng2 PHOENIX SimEng1 UK SimEng2 PSC Disk1 UK Networking
16
ANL Royal Society - June 2004 Networking On-line visualization requires O(1 Gbps) bandwidth for larger problem sizes Steering requires 100% reliable near-real time data transport across the Grid to visualization engines. Reliable transfer is achieved using TCP/IP: handshaking for each single packet that is transferred (to check and repair loss). This slows down transport limits data transfer rates limits LB3D steering of larger systems. Point-to-n-point transport for visualization, storage and job migration uses n times more bandwidth since unicast is used.
17
ANL Royal Society - June 2004 What Was Done? The TeraGyroid experiment represents the first use of collaborative, steerable, spawned and migrated processes based on capability computing. –generated 2TB of data –exploration of the multi-dimensional fluid coupling parameter space with 64 3 simulations accelerated through steering –study of finite size periodic boundary condition effects, exploring the stability of the density of defects in the 64 3 simulations as they are scaled up to 128 3, 256 3, 512 3, 1024 3 –100K to 1,000K time steps –exploring the stability the crystalline phases to perturbations and variations in effective surfactant temperature 128 3 and 256 3 simulations - clear of finite size effects Perfect crystal not formed in 128 3 systems - 600K steps Statistics of number of defects, velocity and lifetimes requires large systems as these have sufficient defects
18
ANL Royal Society - June 2004 World’s Largest Lattice Boltzmann Simulation? 1024 3 lattice sites scale up 128 3 simulations with periodic tiling and perturbations for initial state Finite-size effect free dynamics 2048 processors 1.5 TB of memory 1 minute per time step on 2048 processors 3000 time steps 1.2TB of visualisation data Run on LeMieux at Pittsburgh SC
19
ANL Royal Society - June 2004 Access Grid Screen at SC ‘03 during SC Global Session on Application Steering
20
ANL Royal Society - June 2004 Measured Transatlantic Bandwidths during SC’03
21
ANL Royal Society - June 2004 Demonstrations/ Presentations Demonstrations of the TeraGyroid experiment at SC’03: TeraGyroid on the PSC Booth Tue 18, 10:00-11:00 Thu 20, 10:00-11:00 RealityGrid and TeraGyroid on UK e-Science Booth Tue 18, 16:00-16:30 Wed 19, 15:30-16:00 RealityGrid during the SC'03 poster session: Tue 18, 17:00-19:00 HPC-Challenge presentations: Wed 19 10:30-12:00 SC Global session on steering: Thu 20, 10:30-12:00 Demonstrations and real-time output at the University of Manchester and HPCx booths.
22
ANL Royal Society - June 2004 Most Innovative Data Intensive Application - SC 03
23
ANL Royal Society - June 2004 Project Objectives - How Well Did We Do? - 1 world class computational science experiment –science analysis is ongoing - leading to new insights into properties of complex fluids at unprecedented scales –SC’03 award - ‘Most Innovative Data Intensive App’ enhanced expertise/ experience to benefit UK and USA –first transatlantic federation of major HEC facilities –applications need to be adaptable to different architectures inform construction/operation of national/ int grids –most insight gained into end to end network integration, performance and dual homed systems –remote visualisation, steering and checkpointing require high bandwidth which is dedicated and reservable –results fed directly into ESLEA proposal to exploit UKLight optical switched network infrastructure stimulate long-term strategic technical collaboration –strengthened relationships between Globus, networking and visualisation groups
24
ANL Royal Society - June 2004 Project Objectives - How Well Did We Do? - 2 support long-term scientific collaborations –built on strong and fruitful existing scientific collaborations between researchers in UK and USA experiments with clear scientific deliverables - an explicit science plan was published, approved and then executed. Data analysis is ongoing. choice of applications to be based on community codes –experiences will be of benefit to other grid based applications in particular in the computation engineering community inform future programme of complementary experiments –Report to be made available on RG Website –EPSRC Initiating another Call for Proposals - not targetting SC’04.
25
ANL Royal Society - June 2004 Lessons Learned How to support such projects - full peer review? Timescales were very tight - September - November Resource estimates need to be flexible Need complementary experiments for US and UK to reciprocate benefits HPC centres/ e-science and networking groups can work very effectively together on challenging common goals Site configuration issues very important - network access Visualisation capabilities in UK need upgrading Scalable CA, dual address systems Network QoS very important for checkpointing, remote steering and visualisation Do it again?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.