Download presentation
Presentation is loading. Please wait.
Published byEsther Logan Modified over 9 years ago
1
G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013
2
Overview 1 Traditional Workflow in Molecular Dynamics Defining the Problem An Interchangeable Approach Aiding Analysis Current Usage
3
Basics of Atomistic Simulations Atoms in boxes Positions Updated by iteratively solving F=ma according to empirical force fields Velocity Type, charge, etc.. System wide data Simulation box Number of atoms Temperature, energy, pair potentials… 2
4
Molecular Dynamics Data InputInitial System Data Coordinates, types, charges, mass Interatomic bonds, angles Run Time Instructions Interaction Potential Equations (form, coefficients) OutputSystem Run Data (CPU rate, memory usage) System variables (pressure, stress, temperature) Atomic trajectories Atomic characteristics (charge, type) Per-AtomForce, PE, KE, stressProcessedNeighbor lists Time averaged: Mean squared displacement, radial distribution function ALL molecular dynamics data can be contained in ASCII text files 3
5
A Brief Guide to Atomistic File Types pdb, xyz, mol, cfg, sfd, gro, mdl, LAMMPS read_data, ccm, xsd, cif, car… 4
6
Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 5 Control file Structure file Format depends on program n=16, 500 Chains, rho=0.7918 8000 atoms 3 atom types 7500 bonds 1 bond types 0 angles 0 dihedrals 0 impropers 0 92.055 xlo xhi 0 70.395 ylo yhi 0 37.905 zlo zhi Masses 1 14.002 2 14.002 3 63.54 Atoms 1 1 2 1.80500000000000 1.80500000000000 1.80500000000000 2 1 1 2.65313400000000 3.07841000000000 1.80500000000000 units real timestep 1.0 atom_style bond dimension 3 boundary p p p #---------------Coordinates and Bonds -------------- lattice fcc 1.0 region 1 block -9.025 -1.805 0 70.395 0 37.905 #N=28 read_data n28lat pair_style lj/cut 9.805 pair_coeff 1 1 0.1431 3.923 pair_coeff 2 2 0.1432 3.923 pair_coeff 3 3 4.72 2.616 pair_modify mix arithmetic bond_style harmonic bond_coeff 1 41.82 1.54 group alkane type 1 2 group copper type 3 neighbor 1.0 bin thermo 1 thermo_style custom step temp pe ke etotal #minimize 1.0e-4 1.0e-6 100 1000 fix hope all nve run 100000
7
Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 6 Information about simulation run in control file Hardware, software version metadata formatting depends on system configuration Produces output of overall run statistics Loop time of 3515.13 on 32 procs for 50000 steps with 107008 atoms Pair time (%) = 1108.83 (31.5444) Bond time (%) = 78.4225 (2.231) Neigh time (%) = 162.274 (4.61645) Comm time (%) = 1270 (36.1294) Outpt time (%) = 523.248 (14.8856) Other time (%) = 372.363 (10.5931) Nlocal: 3344 ave 8049 max 0 min Histogram: 16 0 0 0 0 0 2 6 3 5 Nghost: 7940.66 ave 15817 max 0 min Histogram: 8 4 4 0 0 0 0 0 8 8 Neighs: 862976 ave 2.19776e+06 max 0 min Histogram: 16 0 0 0 0 2 2 6 2 4
8
Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 7 Output files generally dictated by control file Final structure file System properties log Other run-time analysis outputs HIGHLY VARIED FORMATING! Quantitative analysis of output by scripting, MATLAB or Excel
9
Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 8 Output structure file may or may not be in a format which can be fed into visualization software Many software options available: VMD Avogadro POVray VESTA … Analysis output may or may not be in a format which can be parsed by plotting software
10
An Endless Series of Parsing Problems Input file Convert from something you can manipulate/generate to something the code can read Output analysis Typically requires writing new parsing routines Different codes require re-writing scripts Visualizations May require extract data from other files manually Most visualization code is already equipped to parse a variety of file types 9
11
Data from Legacy Code Locally developed molecular dynamics code, FLX Trying to port data into another code, LAMMPS Ctrl+C, Ctrl+V and lots of manual editing… Very time consuming for each file 10
12
Obstacles to Data Sharing and Reuse 11 Energy barrier of converting files formats Example: A file downloaded directly from Protein Data Bank (.pdb) may not be readable by MD code (LAMMPS) Extracting relevant quantities from available data sets Parsing rules not always clear if unfamiliar with the format Formats not always well documented
13
Problem Statement 12 Too much redundant work Too little documentation or code clarity Too much time spent manipulating data formatting How can we fix this?
14
Our Approach: Interchangeable Libraries We created a General Atomic System (GAS) class All file read functions generate a GAS object GAS objects are accepted by Write file functions Analysis functions Manipulation functions 13 G.A.S. ReadWriteAnalyze Manip- ulate
15
Examining Existing Standards for Commonalities 14 Positions Type Number of atoms
16
Examining Existing Standards for Commonalities 15 Positions Type Number of atoms
17
Examining Existing Standards for Commonalities 16 Positions Type Number of atoms/ end of atoms section
18
Creating a Common Data Structure GAS class contains System data Internal functions Trivial ontology Simplicity in data structure is flexibility Internal functions should be as reliable as possible Obvious and explicit naming schemes 17
19
Ontological Details GAS System Data.number_of_atoms.x,.y,.z.atomic_number… and many more Internal Functions.update_number_of_ atoms.fill_id_list.sort_by_id … and many more 18
20
User Time Savings From read_data to xyz: timing comparisons Manual copy-paste, eliminating excess columns: 2.15 minutes Calling functions, including typing out calls: 1.05 minutes Actual function timing: ~6 seconds 19
21
Aiding Analysis 20 With all data in standard structure: Write all analysis based on this format Input format independent Allows reuse of analysis functions Reuse begs for optimization Intended reuse encourages documentation Nested analyses now possible Modularization saves: Time Effort Error
22
Traditional Scripting Problems Scripts typically used for: Quantitative analysis Modifying files to be parsed by various software Rewriting input/output handling for each script MATLAB, sed, awk and grep are not the friendliest or fastest parsing tools Lack of commenting Can only be applied to specific file types or a single file 21
23
Examples of Scripting 22 2.5 seconds
24
The Python Version… 23 0.4 seconds Once a function is written, can be called in just a few lines by ANY GAS system containing sufficient information
25
24 CC BY-NC-SA http://www.flickr.com/photos/katieharbath/
26
User Time Savings Open source and custom function libraries instead of MATLAB allows for brute force parallelization, shifting of load to external resources Faster run times: 2.5 using bash versus 0.4 in Python Faster coding times Reuse of functions without additional modifications needed Eliminating redundant coding efforts Use of common language promotes code reusability Writing code for “future” self as well as others 25
27
Ways We’re Using GAS Polymerization Analyze pair-pair distances Alter system topology Automatically generate system readable file Iterative system analysis Quantitative analysis of a series of files Radial distribution functions Density profile Bond length distributions Automatically generates easily parsed output files Automatic movie rendering 26
28
Automatic Movie Rendering 27
29
System Manipulation: Unwrapping Coordinates 28
30
Moving Forward More file formats More advanced analysis methods and functions Density functional theory support Non-spherical particles Collaboration with other groups Better metadata integration 29
31
Final Thoughts 30 Our lives are much better Our code is much more consistent Future users have a hope of understanding what we did If you want people to use it, it needs to be USEFUL and EASY G.A.S. ReadWriteAnalyze Manipa- ulate
32
Ways We’re Using GAS Polymerization Analyze pair-pair distances Alter system topology Automatically generate system readable file Iterative system analysis Quantitative analysis of a series of files Radial distribution functions Density profile Bond length distributions Automatically generates easily parsed output files Automatic movie rendering 31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.