George Em Karniadakis Division of Applied Mathematics The CRUNCH group: Cross-Site Simulations on the TeraGrid spectral elementsMicro / Nano-fluidicsparallel computing
Grand-Challenge Problem 1: Turbulence – Drag crisis (Tightly-Coupled Problem) Turbulence – Last frontier in classical physics Climate, environment, transport, energy,… Re=300,000 (CPU ~ Re 3 ) requires 20 Billion DOFs Memory 4 TBytes
Wave Propagation in a Model of the Arterial Circulation (Data of 55 main arteries from J.J. Wang and K. Parker, 1997) Grand-Challenge Problem 2: Human Arterial Tree (Loosely-Coupled Problem)
First Parallel TeraGrid Paradigm NCSA IA64 SDSC IA64 in-site communication Cross-site communication in-site communication TG Site Whole flow Domain All-to-all
-5/3 DNS versus Experiments: max Re=10,000 DNS Experiments (Rockwell, 2004) Energy Spectrum Black – simulation Blue - experiment RMS velocity
Turbulence: Single-Site Performance Fixed problem sizeFixed workload PSC: Compaq Alpha EV68, 1 GHz 300 Million DOFs, 2-level MPI MPICH-G2 and MPI perform similarly (SDSC/IA-64)
Half processors from NCSA, half from SDSC Intel IA-64 processors (Itanium-2, 1.5 GHz) Slow-down factor 1.5 SDSC TG NCSA TG FFT Matrix transposition Turbulence: Cross-Site Performance Fixed problem sizeFixed workload
P(t) W1W1 W2W2 Ascending aorta U(t) Inflow conditions U(t) P(t) Thoracic aorta Femoral P(t) U(t) W1W1 W2W2 Tibial P(t) Outflow conditions (Peripheral resistance) 1D Model – Sherwin et al. / Imperial College
Platelet Aggregation in Arterioles and Venules FLOW Parameters: Vessel diameter - 50 µm, vessel length µm, blood velocity µm/s, platelet diameter - 3 µm, platelet concentration /mm 3, platelet density fluid density Simulation time - 28 s venules platelet aggregate
Growth Rate vs. Blood Velocity Experiments: Begent and Born, Nature, Vol. 227, No. 5261, pp , 1970
Second Parallel TeraGrid Paradigm Multiscale Simulation of Arterial Tree
Arterial-Tree: Cross-Site Performance (Homogeneous Network) Three arteries; 4 Million DOFs per artery 1CPU/node on ANL; 2CPUs/node on NCSA/SDSC No slown-down, full scalability SDSC TG ANL TG NCSA TG Fixed problem size Fixed workload
SDSC TG NCSA TG PSC TG Arterial-Tree: Cross-Site Performance (Heterogeneous Network) PSC connects to TG via application gateway (qsockets) Two arteries per site PSC proc:2 GF vs 6 GF IA-64
New Unique Capability Potentially unlimited salability; Enabling technology –Integrate “real and virtual” in projects like: –Digital human, digital ocean, digital space, … Predictability and Uncertainty –Stochastic simulations –Prediction vs. Postdiction –Risk-based/Reliability-based design –Sensitivity analysis – steering of experiments (e.g., DDDAS concept) Inverse Problems –Engineering design –Biomedical sciences –Geological/Climate Modeling
What Users Need Debuggers for TG (a la TotalView) New topology-aware parallel algorithms Sustained network/cluster performance TG visualization capability Middleware –Robust MPICH-G2 –Co-scheduling –Network & Globus diagnostics –Authentication/Security – often in conflict Consultants/Referees with TG-Expertise