需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code.

需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code with f95 and dynamic allocated memory 4. EVP solver with f95 and dynamic allocated memory 5. Subroutines a2o/o2a with cross cpu core data exchange. 或 6. Timcom 改寫為每 cpu core 可同時處理南北半球。 7. Netcdf input and output 。

Domain Decomposition 方案 1) 採 timcom 之架構。需要修改 a2o/o2a 等 subroutine ，使其可以跨 node 來交換 timcom 及 echam 之資料。優 :y 方向 ghost zone 之傳輸量為 2) 之一半。缺 : 需額外跨 core 交換 (llon*llat-2*ng*llat) 之資料。 2) 採 echam 之架構。如採此，則每一個 cpu core 皆需同時計算南北半球之海洋 domain ，這在 timcom 需修改部份 code 。優 : 同樣 cpu 數下，會比 1) 快，因跨 core 之交換資料 llon>2 之條件較少。

159 2610 3711 4812 48 3711 2610 159 nproca (4 ) nprocb (3) glon glat EQ

J0 J1 (jon0) 2 (jos0) 1 2 (jow0) I1 (ioe0) I0 YVDEG(J0), YV(J0) YDEG(J0), Y(J0) Y1DEG, YVDEG(J1) YVDEG(3) YDEG(3) YVDEG(2), YV(2) YVDEG(2), Y(2) Y0DEG, YV(1) YVDEG(1), YDEG(1) ng X Y DX(J0) DY(J0) DYV(J1) DYV(3) DY(3) DX(3) DX(2) DY(2) DYV(2) X0DEG X1DEG

Parallel Consideration 目前這版本許多設定還有問題，因此一下子就會 crash 。但試一下是好的。另如有可能，建議將目前 mo_ocean 中與原始 timcom 之同樣功能之 subroutine 併入 standalone 平行化之 timcom 版，以方便測試，看是否正常，尤其是希望可以發展中 ng>=2 之版之 f90, dynamic allocated memory 之單純海洋模式。這些測試有助於我們之後再併入 echam 。

Information for whole ECHAM domain nlon ： number of longitudes of the global domain nlat ： number of latitudes of the global domain nlev ： number of levels of the global domain Information valid for all processes of a model instance nproca ： number of processors for the dimension counts longitudes nprocb ： number of processors for the dimension counts latitudes d_nprocs ： number of processors used in the model domain nproca × nprocb spe, epe ： Index number of first and last processor which handles this model domain mapmesh(ib,ia) ： array mapping from a logical 2-d mesh to the processor index numbers within the decomposition table global decomposition. ib=1, nprocb ； ia=1, nproca

General local information pe ： processor identifier. This number is used in the mpi send and receive routines set_b ： index of processor in the direction of longitudes. This number determines the location within the array mapmesh. processors with ascending numbers handle subdomains with increasing longitudes. set_a ： index of processor in the direction of latitudes. This number determines the location within the array mapmesh. processors with ascending numbers handle subdomains with decreasing values of absolute latitudes.

Grid space decomposition nglat, nglon ： mumber of longitudes and latitudes in grid space handle by this processor. nglpx ： number of longitudes allocated. glats(1: 2), glate(1: 2) ： start and end values of global latitude indices. glons(1: 2), glone(1: 2) ： start and end values of global longitude indices. glat (1: nglat) ： global latitude index. glon(1: nglon) ： offset to global longitude index.

echam memory_g3b 等變數 ( 如 sitwt, sitwu ，皆是 local 之變數。並不是基於一個 main scatter 出去然後 collect 各 processors 的。而是各個 node 分別計算而來。只是 echam 其排列方式仍與 timecom 不同。

The Lin-Rood Finite Volume (FV) Dynamical Core: Tutorial Christiane Jablonowski National Center for Atmospheric Research Boulder, Colorado NCAR Tutorial, May / 31/ 2005

Topics that we discuss today The Lin-Rood Finite Volume (FV) dynamical core The Lin-Rood Finite Volume (FV) dynamical core –History: where, when, who, … –Equations & some insights into the numerics –Algorithm and code design The grid The grid –Horizontal resolution –Grid staggering: the C-D grid concept –Vertical grid and remapping technique Practical advice when running the FV dycore Practical advice when running the FV dycore –Namelist and netcdf variables variables (input & output) –Dynamics - physics coupling Hybrid parallelization concept Hybrid parallelization concept –Distributed-shared memory parallelization approach: MPI and OpenMP Everything you would like to know Everything you would like to know

Who, when, where, … FV transport algorithm developed by S.-J. Lin and Ricky Rood (NASA GSFC) in 1996 2D Shallow water model in 1997 3D FV dynamical core around 1998/1999 Until 2000: FV dycore mainly used in data assimilation system at NASA GSFC Also: transport scheme in ‘Impact’, offline tracer transport In 2000: FV dycore was added to NCAR’s CCM3.10 (now CAM3) Today (2005): The FV dycore –might become the default in CAM3 –Is used in WACCAM –Is used in the climate model at GFDL

Dynamical cores of General Circulation Models Dynamics Physics FV: No explicit diffusion (besides divergence damping)

The NASA/NCAR finite volume dynamical core 3D hydrostatic dynamical core for climate and weather prediction: –2D horizontal equations are very similar to the shallow water equations –3 rd dimension in the vertical direction is a floating Lagrangian coordinate: pure 2D transport with vertical remapping steps Numerics: Finite volume approach –conservative and monotonic 2D transport scheme –upwind-biased orthogonal 1D fluxes, operator splitting in 2D –van Leer second order scheme for time-averaged numerical fluxes –PPM third order scheme (piecewise parabolic method) for prognostic variables –Staggered grid (Arakawa D-grid for prognostic variables)

The 3D Lin-Rood Finite-Volume Dynamical Core Momentum equation in vector-invariant form Continuity equation Thermodynamic equation, also for tracers (replace  ): The prognostics variables are:  p: pressure thickness,  =Tp -  : scaled potential temperature Pressure gradient term in finite volume form

Finite volume principle Continuity equation in flux form: Integrate over one time step  t and the 2D finite volume  with area A: Integrate and rearrange: Time-averaged numerical flux Spatially-averaged pressure thickness

Finite volume principle Apply the Gauss divergence theorem: unit normal vector Discretize:

Orthogonal fluxes across cell interfaces G i,j-1/2 G i,j+1/2 F i+1/2,j F i-1/2,j F: fluxes in x direction G: fluxes in y direction Flux form ensures mass conservation (i,j) Wind directionUpwind-biased:

Quasi semi-Lagrange approach in x direction G i,j-1/2 G i,j+1/2 F i+1/2,j F i-5/2,j (i,j) CFL x = u *  t/  y > 1 possible: implemented as an integer shift and fractional flux calculation CFL y = v *  t/  y < 1 required

Numerical fluxes & subgrid distributions 1st order upwind –constant subgrid distribution 2nd order van Leer –linear subgrid distribution 3rd order PPM (piecewise parabolic method) –parabolic subgrid distribution ‘Monotonocity’ versus ‘positive definite’ constraints Numerical diffusion Explicit time stepping scheme: Requires short time steps that are stable for the fastest waves (e.g. gravity waves) CGD web page for CAM3: http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

Subgrid distributions: constant (1st order) x1x1 x3x3 x4x4 x2x2 u

Subgrid distributions: piecewise linear (2nd order) x1x1 x3x3 x4x4 x2x2 u van Leer See details in van Leer 1977

Subgrid distributions: piecewise parabolic (3rd order) x1x1 x3x3 x4x4 x2x2 u PPM See details in Carpenter et al. 1990 and Colella and Woodward 1984

Monotonicity constraint x1x1 x3x3 x4x4 x2x2 u van Leer Monotonicity constraint results in discontinuities not allowed Prevents over- and undershoots Adds diffusion See details of the monotinity constraint in van Leer 1977

Simplified flow chart stepondynpkg physpkg cd_core te_map trac2d p_d_coupling c_sw 1/2  t only: compute C- grid time- mean winds d_sw full  t: update all D-grid variables subcycled Vertical remapping d_p_coupling

vu Grid staggerings (after Arakawa) A grid B grid u v vv vu u u v vv v uu uu D grid C grid Scalars:

Regular latitude - longitude grid Converging grid lines at the poles decrease the physical spacing  x Digital and Fourier filters remove unstable waves at high latitudes Pole points are mass-points

Typical horizontal resolutions Time step is the ‘physics’ time step: Dynamics are subcyled using the time step  t/nsplit ‘nsplit’ is typically 8 or 10 CAM3: check (dtime=1800s due to physics ?) WACCAM: check (nsplit = 4, dtime=1800s for 2 o x2.5 o ?)  x  Lat x Lon Max.  x (km)  t (s) ≈ spectral 4 o x 5 o 46 x 725567200T21 (32x64) 2 o x 2.5 o 91 x 1442783600T42 (64x128) 1 o x 1.25 o 181 x 2881391800T85 (128x256) Defaults:

Idealized baroclinic wave test case Jablonowski and Williamson 2005 The coarse resolution does not capture the evolution of the baroclinic wave

Idealized baroclinic wave test case Finer resolution: Clear intensification of the baroclinic wave

Idealized baroclinic wave test case Finer resolution: Clear intensification of the baroclinic wave, it starts to converge

Idealized baroclinic wave test case Baroclinic wave pattern converges

Idealized baroclinic wave test case: Convergence of the FV dynamics Solution starts converging at 1deg Global L 2 error norms of p s Shaded region indicates the uncertainty of the reference solution

Floating Lagrangian vertical coordinate 2D transport calculations with moving finite volumes (Lin 2004) Layers are material surfaces, no vertical advection Periodic re-mapping of the Lagrangian layers onto reference grid WACCAM: 66 vertical levels with model top around 130km CAM3: 26 levels with model top around 3hPa (40 km) http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

Physics - Dynamics coupling Prognostic data are vertically remapped (in cd_core) before dp_coupling is called (in dynpkg) Vertical remapping routine computes the vertical velocity  and the surface pressure p s d_p_coupling and p_d_coupling (module dp_coupling) are the interfaces to the CAM3/WACCAM physics package Copy / interpolate the data from the ‘dynamics’ data structure to the ‘physics’ data structure (chunks), A-grid Time - split physics coupling: –instantaneous updates of the A-grid variables –the order of the physics parameterizations matters –physics tendencies for u & v updates on the D grid are collected

Practical tips What do IORD, JORD, KORD mean? IORD and JORD at the model top are different (see cd_core.F90) Relationship between –dtime –nsplit (what happens if you don’t select nsplit or nsplit =0, default is computed in the routine d_split in dynamics_var.F90) –time interval for the physics & vertical remapping step Namelist variables: Input / Output: Initial conditions: staggered wind components US and VS required (D-grid) Wind at the poles not predicted but derived User’s Guide: http://www.ccsm.ucar.edu/models/atm-cam/docs/usersguide/

Practical tips IORD, JORD, KORD determine the numerical scheme –IORD: scheme for flux calculations in x direction –JORD: scheme for flux calculations in y direction –KORD: scheme for the vertical remapping step Available options: - 2: linear subgrid, van-Leer, unconstrained 1:constant subgrid, 1st order 2:linear subgrid, van Leer, monotonicity constraint (van Leer 1977) 3:parabolic subgrid, PPM, monotonic (Colella and Woodward 1984) 4: parabolic subgrid, PPM, monotonic (Lin and Rood 1996, see FFSL3) 5: parabolic subgrid, PPM, positive definite constraint 6: parabolic subgrid, PPM, quasi-monotone constraint Defaults: 4 (PPM) on the D grid (d_sw), -2 on the C grid (c_sw) Namelist variables:

‘Hybrid’ Computer Architecture SMP: symmetric multi-processor Hybrid parallelization technique possible: Shared memory (OpenMP) within a node Distributed memory approach (MPI) across nodes Example: NCAR’s Bluesky (IBM) with 8-way and 32-way nodes

Schematic parallelization technique NP SP Eq. 1D Distributed memory parallelization (MPI) across the latitudes: Proc. 1 4 3 2 Longitudes0340

Schematic parallelization technique NP SP Eq. Each MPI domain contains ‘ghost cells’ (halo regions): copies of the neighboring data that belong to different processors Proc. 2 Longitudes0340 3 ghost cells for PPM

Schematic parallelization technique Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives: Typical loop: do k = 1, plev … enddo Can often be parallelized with OpenMP (check dependencies): !$OMP PARALLEL DO … do k = 1, plev … enddo

Schematic parallelization technique Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives: e.g.: assume 4 parallel ‘threads’ and a 4-way SMP node (4 CPUs) !$OMP PARALLEL DO … do k = 1, plev … enddo kCPU 1 plev 1 2 3 4 4 5 8

Thank you ! Any questions ??? Tracer transport ? Fortran code …

References Carpenter, R., L., K. K. Droegemeier, P. W. Woodward and C. E. Hanem 1990: Application of the Piecewise Parabolic Method (PPM) to Meteorological Modeling. Mon. Wea. Rev., 118, 586-612 Colella, P., and P. R. Woodward, 1984: The piecewise parabolic method (PPM) for gas- dynamical simulations. J. Comput. Phys., 54,174-201 Jablonowski, C. and D. L. Williamson, 2005: A baroclinic instability test case for atmospheric model dynamical cores. Submitted to Mon. Wea. Rev. Lin, S.-J., and R. B. Rood, 1996: Multidimensional Flux-Form Semi-Lagrangian Transport Schemes. Mon. Wea. Rev., 124, 2046-2070 Lin, S.-J., and R. B. Rood, 1997: An explicit flux-form semi-Lagrangian shallow water model on the sphere. Quart. J. Roy. Meteor. Soc., 123, 2477-2498 Lin, S.-J., 1997: A finite volume integration method for computing pressure gradient forces in general vertical coordinates. Quart. J. Roy. Meteor. Soc., 123, 1749-1762 Lin, S.-J., 2004: A ‘Vertically Lagrangian’ Finite-Volume Dynamical Core for Global Models. Mon. Wea. Rev., 132, 2293-2307 van Leer, B., 1977: Towards the ultimate conservative difference scheme. IV. A new approach to numerical convection. J. Comput. Phys., 23. 276-299

需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code.

Similar presentations

Presentation on theme: "需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code.

Similar presentations

Presentation on theme: "需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom main code."— Presentation transcript:

Similar presentations

About project

Feedback