Analyzing the Results of a Simulation and Estimating Errors Jason Cooper
Types of Error Big and obvious errors Systematic error Statistical (random) error
Big, Obvious Errors Arise from gross error, often in the particle configuration. Examine intermediate conformations (MD or MC) for obvious problems, regardless of the focus of the study. Conformations typically stored every 5-25 steps.
Systematic Error Characterization Results in a constant bias or skew from the expected result. Expected distribution Biased distribution Skewed distribution
Systematic Error Characterization Calculated values for simple thermodynamic properties should be normally distributed:
Systematic Error Characterization 1.Sort data into bins of approximately equal number. Expected number is given by: 2.Calculate chi-squared statistic: ( 2 > 1 indicates a poor match)
Systematic Error Sources Four main sources of systematic error: –The model (limitations of the basis set, functional, etc.) –The algorithms used (drift in Euler integration of a DE) –Numerical precision (round-off and quantization error) –Implementation (programming error)
Systematic Error The Fix Systematic errors are most easily isolated when several algorithms are applied: –to several different chemical systems, –on several different computers, –using several different compilers, –etc…
Statistical Error Characterization Characteristic normal distribution of values about the set average: M is the number of independent data values
Statistical Error Relaxation Time and Statistical Inefficiency Successive data values are well correlated, and not independent. To find the effective M, we need to know the statistical inefficiency of the system.
Statistical Error Relaxation Time and Statistical Inefficiency We begin by dividing our M sequential configurations into b blocks each containing n b values of the property A:
Statistical Error Relaxation Time and Statistical Inefficiency The variance of the block averages is then given by: Where A i is the average for the i th block and A total is the average calculated only over those values covered in the blocks.
Statistical Error Relaxation Time and Statistical Inefficiency For large n b, A i become uncorrelated and: Next, define the statistical inefficiency s: and, finally... so that
Statistical Error Relaxation Time and Statistical Inefficiency We solve for s: Where s can be visualized in two ways: –The factor by which the variance exceeds a naïve estimate (statistical inefficiency); or –The number of steps per block required to give uncorrelated block averages (relaxation time).
Statistical Error Relaxation Time and Statistical Inefficiency In practice, s is calculated from a plot similar to the following:
Statistical Error Relaxation Time and Statistical Inefficiency Care must be taken to avoid boundary effects:
Statistical Error Application of Statistical Inefficiency: Sampling Simulation is divided into blocks of size n b ≥ s Blocks may be sampled in one of three ways: –Stratified systematic sampling –Stratified random sampling –Coarse graining Coarse graining most commonly applied for scalar properties. Sampling applied otherwise.
Statistical Error Sources Arises from the finite nature of the simulation: –Finite number of atoms or molecules considered –Finite number of sequential values taken –Finite precision retained in intermediate values
Statistical Error The Fix Three main approaches: –Increase the number of atoms or molecules considered in the simulation; –Increase the duration of the simulation (number of samples taken); or –Reduce the statistical inefficiency of the algorithms used.