Download presentation
Presentation is loading. Please wait.
Published byΑνθούσα Παπανδρέου Modified over 6 years ago
1
Pattern-Forming Instabilities in the Swift-Hohenberg Model
Micah Brodsky 6.338 Spring ‘08
2
Swift-Hohenberg Model
Scalar partial differential equation Can derive as approximation to Navier-Stokes at onset of convection But used generally as case study in pattern formation Has a Lyapunov functional – relaxes towards a steady state, no turbulence But does interesting things along the way
3
The Model ∂u/∂t = εu - u3 – (∇2 – 1)2u + g2u2
Expands to: (ε - 1)·u - ∇2u - 2∇4u + g2u2 - u3 Saturation term! Linear instability High-frequency stabilization Low-order nonlinearities First order in time, fourth order in space Linear component is spatially selective Saturated by cubic term Like a spatial version of an oscillator – 1D steady state is exactly an oscillator Preferred length scale One of the simplest possible pattern-forming equations
4
Let’s try it out… ε = 1.5, g2 = 0 Initialized with low-amplitude random perturbations about zero
5
What Do We Understand? We can easily predict simple features
Instability Characteristic wavelength Perturbational analysis gives a bit more Wavelength instabilities (e.g. Zigzag and “Eckman”) Higher order wave effects (e.g. hexagons!) But what about high-amplitude behavior?
6
Scales of order ε = 4 ε = 0.5 ε = 0.1
7
Competition Among Patterns
Quadratic term responsible for inducing hexagonal mesh ε = 0.1, g2 = 0.5 Initialized with low-amplitude random perturbations about zero, along with a strip of -1.
8
High-Amplitude Instabilities
Initially saturates to uniform steady state solutions 0 is linearly unstable, but ±√(ε – 1) are linearly stable But then… Last example This is a fun one ε = 3, g2 = 0 Initialized with low-amplitude random perturbations plus a slight positive bias, along with a strip of -1.
9
Implementation Explicit finite difference solver C++ / MPI on SiCortex
I.e. ui+1[x] = ui[x] + dt * ( … ui[x-1] - 2ui[x] + ui[x+1] ... ) Instability is the bottleneck (and ∇4 makes it hit hard) C++ / MPI on SiCortex Domain split into vertical slabs Processors exchange 2-cell-wide boundaries after every time step I’m too dumb for spectral methods. May try implicit methods with ScaLAPACK, if I can get my head around its interface.
10
Output SDL for real-time visualization
Forwarding X to the rank-0 node is a pain but can be done Raw data dump to file, imported into Matlab with fread() Simple DirectMedia Layer SSH into rank-0 node
11
Where’s The Bottleneck?
Worst case communication / computation: ~ 4 doubles / ~ 66 flops But SiCortex has lots of bandwidth Most of the time is spent computing Compiler effects Memory, memory, memory…
12
Compiler -O3 goes without saying
But wait, why is it generating such crummy machine code (sometimes you have to look!) Pointer analysis! Compiler can use registers and software pipelining efficiently only if it knows that it’s not tripping over its own calculations Use __restrict__ pointers wherever possible Means no other pointer points to the same data 50% FLOPS improvement on simpler kernel (diffusion equation) But… only about 10% here.
13
Memory Hierarchy Profile with papiex -a L1 cache easily overrun
Solution: block striping Adds some overhead, but about 10% net savings on “tall” problems (why not more?) L2 usually okay (though not if we go 3D) Memory access stencil:
14
Communication SiCortex has a lot of bandwidth
1 double for every 2 clock cycles (without congestion) This is plenty to keep us busy But, communication is still up to 25% of runtime (use mpipex, subtract startup overhead) We can still gain by overlapped communication & computation Or can we?
15
Overlapped Communication
Solution: Compute edges first, then start non-blocking transfers (Isend, Irecv) Wait for results only when we need to compute next edges Problem! This kills our L1 cache performance gains Communication is heaviest when middle block is narrowest Net result: 5-7% gain on large problems 13% on large, “thin” problems SiCortex network is just too fast – even with stupid communication, it’s not much overhead We have to do the borders in separate stripes, or else there’s no computation to overlap when most needed Can’t overlap everything then. Might do better by splitting into chunks, but then we’d have startup overhead.
16
The real problems Kernel has too many instructions!
Processor has poor instruction-level parallelism (ILP) Not to mention the time step is *really* slow Implicit / Crank-Nicholson methods? Spectral methods? Scalability Slab slicing limits to width / 2 processors Switch to block decomposition? 3D I really want to see 3D results. In 3D. =) Future work!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.