Presentation is loading. Please wait.

Presentation is loading. Please wait.

G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013.

Similar presentations


Presentation on theme: "G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013."— Presentation transcript:

1 G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013

2 Overview 1  Traditional Workflow in Molecular Dynamics  Defining the Problem  An Interchangeable Approach  Aiding Analysis  Current Usage

3 Basics of Atomistic Simulations  Atoms in boxes  Positions Updated by iteratively solving F=ma according to empirical force fields  Velocity  Type, charge, etc..  System wide data  Simulation box  Number of atoms  Temperature, energy, pair potentials… 2

4 Molecular Dynamics Data InputInitial System Data Coordinates, types, charges, mass Interatomic bonds, angles Run Time Instructions Interaction Potential Equations (form, coefficients) OutputSystem Run Data (CPU rate, memory usage) System variables (pressure, stress, temperature) Atomic trajectories Atomic characteristics (charge, type) Per-AtomForce, PE, KE, stressProcessedNeighbor lists Time averaged: Mean squared displacement, radial distribution function ALL molecular dynamics data can be contained in ASCII text files 3

5 A Brief Guide to Atomistic File Types  pdb, xyz, mol, cfg, sfd, gro, mdl, LAMMPS read_data, ccm, xsd, cif, car… 4

6 Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 5 Control file Structure file Format depends on program n=16, 500 Chains, rho=0.7918 8000 atoms 3 atom types 7500 bonds 1 bond types 0 angles 0 dihedrals 0 impropers 0 92.055 xlo xhi 0 70.395 ylo yhi 0 37.905 zlo zhi Masses 1 14.002 2 14.002 3 63.54 Atoms 1 1 2 1.80500000000000 1.80500000000000 1.80500000000000 2 1 1 2.65313400000000 3.07841000000000 1.80500000000000 units real timestep 1.0 atom_style bond dimension 3 boundary p p p #---------------Coordinates and Bonds -------------- lattice fcc 1.0 region 1 block -9.025 -1.805 0 70.395 0 37.905 #N=28 read_data n28lat pair_style lj/cut 9.805 pair_coeff 1 1 0.1431 3.923 pair_coeff 2 2 0.1432 3.923 pair_coeff 3 3 4.72 2.616 pair_modify mix arithmetic bond_style harmonic bond_coeff 1 41.82 1.54 group alkane type 1 2 group copper type 3 neighbor 1.0 bin thermo 1 thermo_style custom step temp pe ke etotal #minimize 1.0e-4 1.0e-6 100 1000 fix hope all nve run 100000

7 Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 6 Information about simulation run in control file Hardware, software version metadata formatting depends on system configuration Produces output of overall run statistics Loop time of 3515.13 on 32 procs for 50000 steps with 107008 atoms Pair time (%) = 1108.83 (31.5444) Bond time (%) = 78.4225 (2.231) Neigh time (%) = 162.274 (4.61645) Comm time (%) = 1270 (36.1294) Outpt time (%) = 523.248 (14.8856) Other time (%) = 372.363 (10.5931) Nlocal: 3344 ave 8049 max 0 min Histogram: 16 0 0 0 0 0 2 6 3 5 Nghost: 7940.66 ave 15817 max 0 min Histogram: 8 4 4 0 0 0 0 0 8 8 Neighs: 862976 ave 2.19776e+06 max 0 min Histogram: 16 0 0 0 0 2 2 6 2 4

8 Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 7 Output files generally dictated by control file Final structure file System properties log Other run-time analysis outputs HIGHLY VARIED FORMATING! Quantitative analysis of output by scripting, MATLAB or Excel

9 Through a Traditional Workflow Generate input file(s) Run simulation Analyze output Visualization/ plotting analysis 8 Output structure file may or may not be in a format which can be fed into visualization software Many software options available: VMD Avogadro POVray VESTA … Analysis output may or may not be in a format which can be parsed by plotting software

10 An Endless Series of Parsing Problems  Input file  Convert from something you can manipulate/generate to something the code can read  Output analysis  Typically requires writing new parsing routines  Different codes require re-writing scripts  Visualizations  May require extract data from other files manually  Most visualization code is already equipped to parse a variety of file types 9

11 Data from Legacy Code  Locally developed molecular dynamics code, FLX  Trying to port data into another code, LAMMPS  Ctrl+C, Ctrl+V and lots of manual editing…  Very time consuming for each file 10

12 Obstacles to Data Sharing and Reuse 11  Energy barrier of converting files formats  Example: A file downloaded directly from Protein Data Bank (.pdb) may not be readable by MD code (LAMMPS)  Extracting relevant quantities from available data sets  Parsing rules not always clear if unfamiliar with the format  Formats not always well documented

13 Problem Statement 12  Too much redundant work  Too little documentation or code clarity  Too much time spent manipulating data formatting  How can we fix this?

14 Our Approach: Interchangeable Libraries  We created a General Atomic System (GAS) class  All file read functions generate a GAS object  GAS objects are accepted by  Write file functions  Analysis functions  Manipulation functions 13 G.A.S. ReadWriteAnalyze Manip- ulate

15 Examining Existing Standards for Commonalities 14  Positions  Type  Number of atoms

16 Examining Existing Standards for Commonalities 15  Positions  Type  Number of atoms

17 Examining Existing Standards for Commonalities 16  Positions  Type  Number of atoms/ end of atoms section

18 Creating a Common Data Structure  GAS class contains  System data  Internal functions  Trivial ontology  Simplicity in data structure is flexibility  Internal functions should be as reliable as possible  Obvious and explicit naming schemes 17

19 Ontological Details GAS System Data.number_of_atoms.x,.y,.z.atomic_number… and many more Internal Functions.update_number_of_ atoms.fill_id_list.sort_by_id … and many more 18

20 User Time Savings  From read_data to xyz: timing comparisons  Manual copy-paste, eliminating excess columns: 2.15 minutes  Calling functions, including typing out calls: 1.05 minutes Actual function timing: ~6 seconds 19

21 Aiding Analysis 20  With all data in standard structure:  Write all analysis based on this format  Input format independent  Allows reuse of analysis functions  Reuse begs for optimization  Intended reuse encourages documentation  Nested analyses now possible  Modularization saves:  Time  Effort  Error

22 Traditional Scripting Problems  Scripts typically used for:  Quantitative analysis  Modifying files to be parsed by various software  Rewriting input/output handling for each script  MATLAB, sed, awk and grep are not the friendliest or fastest parsing tools  Lack of commenting  Can only be applied to specific file types or a single file 21

23 Examples of Scripting 22 2.5 seconds

24 The Python Version… 23 0.4 seconds  Once a function is written, can be called in just a few lines by ANY GAS system containing sufficient information

25 24 CC BY-NC-SA http://www.flickr.com/photos/katieharbath/

26 User Time Savings  Open source and custom function libraries instead of MATLAB allows for brute force parallelization, shifting of load to external resources  Faster run times:  2.5 using bash versus 0.4 in Python  Faster coding times  Reuse of functions without additional modifications needed  Eliminating redundant coding efforts  Use of common language promotes code reusability  Writing code for “future” self as well as others 25

27 Ways We’re Using GAS  Polymerization  Analyze pair-pair distances  Alter system topology  Automatically generate system readable file  Iterative system analysis  Quantitative analysis of a series of files Radial distribution functions Density profile Bond length distributions  Automatically generates easily parsed output files  Automatic movie rendering 26

28 Automatic Movie Rendering 27

29 System Manipulation: Unwrapping Coordinates 28

30 Moving Forward  More file formats  More advanced analysis methods and functions  Density functional theory support  Non-spherical particles  Collaboration with other groups  Better metadata integration 29

31 Final Thoughts 30  Our lives are much better  Our code is much more consistent  Future users have a hope of understanding what we did  If you want people to use it, it needs to be USEFUL and EASY G.A.S. ReadWriteAnalyze Manipa- ulate

32 Ways We’re Using GAS  Polymerization  Analyze pair-pair distances  Alter system topology  Automatically generate system readable file  Iterative system analysis  Quantitative analysis of a series of files Radial distribution functions Density profile Bond length distributions  Automatically generates easily parsed output files  Automatic movie rendering 31


Download ppt "G.A.S. ReadWriteAnalyze Manip- ulate Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013."

Similar presentations


Ads by Google