“NanoElectronics Modeling tool – NEMO5” Jean Michel D. Sellier Purdue University
A) Project Overview The project NEMO5 is a simulator for nano-electronic devices. It includes a variety of models (Schroedinger, NEGF, etc.) that helps the design of semiconductor devices from an atomistic perspective. Science goals Petascaled simulations of realistic electronic structures and transport at an atomistic level. The participants, description of team J.M. Sellier, T. Kubis, J. Fonseca, M. Povolotskyi, G. Klimeck, PhD students. History NEMO5 is the achievement of 18 years of research and development. Ancestors are NEMO1D, NEMO3D, NEMO3D-Peta. Sponsors Purdue University, NCN, Intel, Global Foundries, Samsung, Philips, etc.
B) Science Lesson What does the application do, and how? NEMO5 is first of all a general framework where to plug new solvers. The sequence of solver calls is specified by the user and the input and output of the solvers can be specified by the user. There are several levels of parallelization that can be specified by the user. The solvers implemented yet are full quantum electron structures (Schroedinger) and transport (NEGF). Preprocessing solvers/methods like strain calculations can be used too. We also have a Poisson solver, semi-classical models, etc. Every solver has some sort of parallelization, depending on the model used. For example, Schroedinger eigensolvers use spatial parallelization to make the calculations faster. Different kind of eigensolvers can be used to solve that particular equation.
C) Parallel Programming Model MPI, OpenMP, Hybrid, Pthreads, etc MPI, PETSc, SLEPc Languages C++ and Python Runtime libraries, build requirements MPI, PETSc, SLEPc, Boost, libmesh, libvtk, etc. Other infrastructure (python, sockets, java tools, etc) Python What platforms does the application currently run on? Purdue RCAC clusters like Rossmann, Coates, Hansen, etc Jaguar, Kraken Current status & future plans for the programming model The code scales very well for some purposes and has to be optimized for several other tasks (initialization of atom positions and bonds in particular). Future plans are: implementation/optimization of eigensolvers for Schroedinger.
D) Computational Methods What algorithms and math libraries do you use? The algorithms used in the code are basically the ones available in PETSc and SLEPc. In particular cases, where those routines are not well suited, we have our own PETSc low-level home-made algorithms (e.g. Lanczos and Block Lanczos eigensolvers). Current status and future plans for your computation The code scales well but after a certain number of atoms (several tens of millions) it start to suffer. We would like to optimize our eigensolvers to deal with even more atoms (peta-scale level).
E) I/O Patterns and Strategy Input I/O and output I/O patterns Silo, VTK, Point3D, HDF5, binary and ASCII home-made formats Approximate sizes of inputs and outputs It strongly depends on the simulation one is running. It usually runs from some kilobytes to several hundreds of mega. The outputs files are dumped out at the end of every solver. Checkpoint / Restart capabilities: what does it look like? Only for Poisson solver, very basic Current status and future plans for I/O We have plenty of output formats which are enough for our tasks. The checkpoint/Restart capability is very basic and needs to be improved. We are currently investigating what are the libraries around that could make life easier.
F) Visualization and Analysis How do you explore the data generated? Visit, Paraview, Matlab, Octave, GNUplot Do you have a visualization workflow? It depends on the user and what he/she wants to visualize Current status and future plans for your viz and analysis We are happy with what we have right now. It is sufficient for devices analyses.
G) Performance What tools do you use now to explore performance gprof What do you believe is your current bottleneck to better performance? Matrix vector multiplication What do you believe is your current bottleneck to better scaling? Matrix vector multiplication What features would you like to see in perf tools ease of use, intuitive tool, embedded visualization would be a plus Current status and future plans for improving performance faster algorithms and code optimization
H) Tools How do you debug your code? gdb What other tools do you use? N.A. Current status and future plans for improved tool integration and support N.A.
I) Status and Scalability How does your application scale now? Calculations scales well thanks to PETSc. Initialization is still a problem. Where do you want to be in a year? Very fast eigensolvers, petscale capabilities. What are your top 5 pains? (be specific) –1: Initialization of the Hamiltonian 2: Slow SLEPc algorithms 3: Check/Restart points 4: Home-made eigensolvers (Lanczos and Block Lanczos) 5: huge output files What did you change to achieve current scalability? SLEPc eigensolvers – home-made eigensolvers Current status and future plans for improving scaling optimization of our Lanczos eigensolvers
J) Roadmap Where will your science take you over the next 2 years? Bigger structures, more atoms, scalability What do you hope to learn / discover? Simulation of quantum dots at room temperature, explanation of Decoherence phenomenon, Electron-electron interaction (CI), harness the power of single impurity devices for the construction of quantum bits. What improvements will you need to make (algorithms, I/O, etc)? Faster and more scalable eigensolvers, implementation of models that include crystal temperature, same input/output What are your plans? Migrate models from previous nemo3d and nemo3d-peta, and the eigensolver, having a bigger selection of scalable eigensolvers.