Download presentation
Presentation is loading. Please wait.
Published byBryan Lynch Modified over 9 years ago
1
Feb 23, 2010, Tsukuba-Edinburgh Computational Science Workshop, Edinburgh Large-Scale Density-Functional calculations for nano-meter size Si materials Jun-Ichi Iwata Center for Computational Sciences University of Tsukuba
2
Outline Quantum Mechanical (First-Principles) Simulation in Solid-State Physics Density-Functional Theory W. Kohn (Nobel Prize in 1998) Density-Functional simulations for large systems Real-Space DFT program code for Parallel Computation -RSDFT- Applications of RSDFT for Si nano materials >10,000-atom system
3
First-Principles Calculation in Material Physics We describe material properties from the behavior of electrons and ions. ions → classical, electrons → quantum We solve the Schrodinger equation for electronic ground state Density-functional theory is a powerful tool for this purpose.
4
Density-Functional Theory electron density Energy Functional We get stable atomic & electronic structures. ( minimize ) P. Hohenberg and W. Kohn, Phys. Rev. 136 (1964) B864. W. Kohn and L. J. Sham, Phys. Rev. 140 (1965) A1133. minimize with respect to Potential Kohn-Sham equation → We have to solve this equation self-consistently ( Nonlinear eigenvalue problem )
5
M. T. Yin and M. L. Cohen Phys. Rev. B26, 5668 (1982). Exchange functional in Local-Density Approx. DFT calc. Expt. Lattice Constant ( Å ) 5.375.41 Bulk Modulus (Mb) 0.9770.988 Si ( in diamond structure ) Performance of DFT with simple approximation quantitatively good results Correctly describe various properties
6
Proteins ( cytochrome c oxidase ) ~ 30,000 atoms Nano structures (Si pyramid) ~ 100,000 atoms A. Ichimiya et al., Surf. Sci. 493, 555 (2001). Everybody wants to apply the DFT for Large systems Usually, we treat 10- to 1000-atom systems by DFT. However, we need to treat larger systems. to study large objects (nano structures, proteins) to make the atomic model more realistic
7
Real-Space DFT program code (RSDFT) Solve Kohn-Sham equation (eigenvalue problem) → Computational costs ~ O(N 3 ) Developed for parallel computers
8
discretize function Column vector Laplacian → Higher-Order Finite-Difference Higher-order finite difference pseudopotential method J. R. Chelikowsky et al., Phys. Rev. B, (1994) Real-Space Method continuous spacediscrete space Typical number of grid points : 10,000 ~ 1,000,000 ( ⇔ Reciprocal-Space (Plane-Wave) Method )
9
Real-Space Finite-Difference Sparse Matrix FFT free (FFT is inevitable in the conventional plane-wave code) Kohn-Sham eq. (finite-difference) 3D grid is divided by several regions for parallel computation. Higher-order finite difference Integration MPI_ISEND, MPI_IRECV MPI_ALLREDUCE RSDFT – suitable for parallel first-principles calculation - MPI ( Message Passing Interface ) library CPU0 CPU8CPU7 CPU6 CPU5 CPU4CPU3 CPU2CPU1
10
Convergence behavior for Si 10701 H 1996 The largest system in the present study → Si 10701 H 1996 Massively Parallel Computing Computational Time (with 1024 nodes of PACS-CS) 6781 sec. × 60 iteration step = 113 hour Based on the finite-difference pseudopotential method (J. R. Chelikowsky et al., PRB1994) Highly tuned for massively parallel computers Computations are done on a massively-parallel cluster PACS-CS at University of Tsukuba. (Theoretical Peak Performance = 5.6GFLOPS/node) with our recently developed code “RSDFT” Iwata et al, J. Comp. Phys. (2010) Real-Space Density-Functional Theory code (RSDFT) Grid points = 3,402,059 Bands = 22,432
11
Conjugate-Gradient Method Gram-Schmidt orthonormalization Density, Potentials update Subspace Diagonalization Total Computational Cost ~ O(N 3 ) O(N 3 ) O(N) O(N 2 ) Flow chart Calc. Ionic Potentials Input initial configuration of Ions Hellman-Feynman Force Move ions Convergence Check Electronic structure optimization must be performed in each atomic optimization step Atomic structure optimization Electronic structure optimization Algorithm → subspace iteration method (Rayleigh-Ritz method) yes
12
Algorithm 1 → Subspace Iteration Method ( Rayleigh-Ritz Method ) M-dimensional eigenvalue problem We need smallest N( ≪ M) eigen-pairs Minimize Reyleigh quotients by Conjugate-Gradient Method wave function update Initial guess Problem
13
Algorithm 2 O(MN 2 ) O(N 3 ) Subspace Diagonalization O(MN 2 ) ( Ritz vectors ) Gram-Schmidt Orthogonalization → as a basis set ← initial guess for the next iteration O(MN 2 ) Calc. Matrix Elements
14
Gram-Schmidt orthogonalization Time (sec) GFLOPS/node Old algorithm 661 (710) 0.70 (0.65) New algorithm 111 (140) 4.30 (3.50) Time & Performance for Gram-Schmidt O(N 3 ) part can be computed at 80% of the theoretical peak performance! ~ Active use of Level 3 BLAS in O(N 3 ) computation ~ → Collaboration with computer scientists much improve the performance of the RSDFT! Theoretical peak performance = 5.6 GFLOPS/node Part of the calculations can be performed as Matrix × Matrix operation! Algorithm of GS
15
PACS-CS(5.6GFLOPS/node) 256nodes → time for O(N 2 )-part and O(N 3 )-part become comparable Elapsed time for 1 step of iteration O(N 2 ) O(N 3 )
16
Application 1 Nano-meter size Si quantum dots
17
Si quantum dot is a promising material for several device applications Memory Single-electron transistor Optical Device Clarifying the relation between the “Dot size” and “Band gap” is important for controlling the device properties. System size is very large! A model of the Si quantum dot of 6.6 nm diameter ( Si 7055 H 1596 ) First-principles calculations are useful for such studies? → Yes, but …
18
(eV) Experimental fit curve From STS measurement B.Zanknoon et al., Nano letters 8, 1689 (2008). The ΔSCF gap seems to be closer to the ΔKS gap … Band Gaps 300 atoms>10,000 atoms
19
Application 2 Si nanowires
20
IEDM2005IEDM2006 Diameter of NW10 nm8 nm Gate length30 nm15 nm Vdd1.0 V I_on (n) 2.64 mA/ m1.4 mA/ m I_on (p) 1.11 mA/ m1.94 mA/ m I_off (n) 3.1 nA/ m2.0 nA/ m I_off (p) 0.0056 mA/ m1.0 nA/ m Samsung Si nanowire devices
21
4 nm diameter ( 425 atoms) 10 nm diameter ( 2341 atoms ) 20 nm diameter ( 8941 atoms ) There may be an optimum diameter in the region of 10 nm ~ 20 nm. Several size of Si nanowires
22
d=1nm Si21H20 ( 41 atoms ) Eg=2.60eV (LDA Bulk : 0.53eV) X Band Structure and DOS of SiNW (d=1nm)
23
d=4nm Si341H84 ( 425 atoms ) Eg=0.81eV (LDA Bulk=0.53eV) X Band Structure and DOS of SiNW (d=4nm)
24
X Si1361H164(1525 atoms), Eg=0.61eV Band Structure and DOS of SiNW (d=8nm) X Bulk Si Eg=0.53eV
25
Si12822H1544 ( 14,366 atoms ) ・ 10nm diameter 、 3.3nm height 、 (100) ・ Grid spacing : 0.45Å (~14Ry) ・ # of grid points : 4,718,592 ・ # of bands : 29,024 ・ Memory : 1,022GB ~ 2,044GB Si12822H1544 Top View Side View Si nano wire with surface roughness
26
PACS-CS1024 nodes ( peak performance : 5.6 GFLOPS/node ) Subspace diagonalization : 4600 sec. Gram-Schmidt : 2300 sec. Conjugate-Gradient Method : 3700 sec. Total Energy calc. : 1200 sec. Total(1 step) : 12,000 sec. DOS of SiNW with roughness DOS of Bulk Si d=10nm ( with roughness ) Si12822H1544(14,366 atoms) Eg=0.57eV
27
Application3 Si divacancy
28
Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. There are two possibilities for the structure of Si divacancy. Resonant-Bond type Large-paring type ・ Both “Large-paring” and “Resonant-Bond” structure were found. ・ Large-Paring type is the most stable (RB type is a local minimum) More recent LDA calculation (Oguet et al., 1999) EPR experiment (Watkins & Corbett, 1965) LDA calculation (Saito & Oshiyama, 1994) Large-Paring type Resonant-Bond type is stable (Large-Paring type was not found) What is the stable structure ? Model size ~ 60 atoms Model size ~ 300 atoms →Model Size dependence ?
29
Si divacancy d ac, d ab (Å) Model size (# of atoms) Large-paring Resonant-Bond Small-Paring Structures converge at 998-atom model. LP structure appears at 510 or larger models. RB structure is most stable, but the energy difference is very small (<10 meV) J.-I. Iwata, et al., Phys. Rev. B 77 (2008) 115208 Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. There are two possibilities for the structure of Si divacancy. Resonant-Bond type Large-paring type
30
We have developed Real-Space DFT program code for large systems by utilizing the massively parallel computers Collaboration with computer scientist much improve the performance of RSDFT (Especially, O(N 3 )-part calculation with BLAS 3) By using a few hundred ~ 1000CPUs, we have achieved the first-principles calculation for ・ Si 1000-atom system with atomic structure optimization ・ Self-Consistent electronic structures of Si 10,000-atom systems By using large atomic models → eliminate the model-size dependence We have applied the RSDFT for nano-meter scale Si materials (SiNW, SiQD) I think the RSDFT becomes an useful tool for future device development Summary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.