Statistical Data Analysis: II PART II 1. From trajectories to Probability Distribution Functions (PDF) 2. Sampling PDF with Particle Dynamics
Sampling
Sampling PDF’s with Particle Dynamics The ergodic theorem highlights an equivalence/duality between the Trajectory (Dynamics) versus PDF descriptions (Statistics). The question is how to sample a PDF using trajectories, i.e. Particle Dynamics. Since this is a major theme of Statistical Mechanics and Stochastic Optimization (search in hyperdimensional spaces), next we dig a bit deeper into this matter.
What is sampling? A sequence of numbers whose frequency reflects the given PDF with frequencies: is drawn times on a total of Such that:
Monte Carlo Sampling 1. Draw at random in X space 2. Compute 3. Draw a random 4. If : Accept Else: Reject and try again Repeat 1-4 until N acceptances Q: How about efficiency? N/Ntry can be very poor <10%
Dynamic Sampling (Trajectories) Generate a discrete trajectory Such that the time spent “around” Forms a sequence Such that where Plus: All values are accepted! Minus: The trajectory should visit all target space X
The Dirac Distribution By convoluting with a given function, picks up a single value Namely:
Dirac Transformation Rules By convoluting with a given function, picks up a single value Corresponding to zeros of the Dirac argument (crossing points): T1: More complicated: T2: Why? Because:
Dirac transformation rule Given the transformation The Dirac distribution transforms accordingly to: Namely: Useful identity:
Dynamical systems Consider a first-order equation of motion: With initial conditions: This is a mapping from x0 to x parametrized by time Solved by: The PDF generated by the single trajectory with Initial Condition x0 is: Dirac score Each crossing at x scores 1. Based on Dirac Rule T2:
Dynamical systems The corresponding PDF generated out of an ensemble of Initial Conditions x0 is: Dirac score:
Dynamic sampling via stochastic dynamics By now, we have learned how to sample using Deterministic trajectories . Since initial conditions matter, we may need MANY initial conditions, or a VERY LONG single trajectory (if the system is ergodic) Both can be inefficient. How about using stochastic dynamics?
PDF evolution The probability of finding a particle starting at x0 at t=t0 at position x at time t, is given by: By averaging over initial conditions: What is the evolution equation of this PDF? Therefore P obeys a continuity equation: Steady-state: As expected!
Stochastic Particle Dynamics We add noise: Stochastic Particle Dynamics Noise obeys Fluctuation-Dissipation Theorem: Each noise realization gives rise to its own trajectory. No need to change the initial condition. The formal solution is (Stratonovich):
Averaging over noise One can therefore define p(x;t) by integrating over both Initial conditions AND/OR Noise Distribution (typically gaussian). Noise can mitigate/eliminate averaging over initial conditions Differential Ito calculus: The mean is unaffected: The variance picks up a diffusive component:
Fokker-Planck equation Therefore P obeys now a continuity-diffusion equation, (Fokker-Planck Equation): Stationary (Equilibrium) distribution: Where: and Is the Partition Function Each of the PDF discussed so far can thus be linked to a dynamic stochastic process each with its own potential V(x) and diffusion D (temperature)!
Thermodynamic Partition Function For a thermodynamic system, x is 6N-dimensional vector, with N order Avogadro! Computing the PF is generally impossible other than for Gaussian ensembles (quadratic potential): Main analytical technique: Saddle Point Expand around local minima and perform the Gaussian integrals Main numerical technique: Metropolis Monte Carlo Perform a random walk in x-space. If the move lowers V(x) It is accepted. If it raises V(x), it is accepted with probability exp(-deltaV), deltaV>0. Also, Stochastic Particle Dynamics.
Molecular Dynamics Hamiltonian eqs of motion Energy = Hamiltonian function Closed orbit in phase-space (x,p) Each orbit corresponds to a different value of E Microcanonical: Canonical:
Sampling PDF’s with Molecular Dynamics If the system is ergodic, a single trajectory of span T is equivalent to M replicas of span T/M The deterministic particle dynamics is not effective in visiting large portions of phase space. Many initial conditions may be needed…
Sampling PDF’s with Langevin Dynamics If the system is ergodic, a single trajectory of span T Is equivalent to M replicas of span T/M The stochastic Langevin Dynamics is more effective In visiting larger portions of phase space
Which one is best suited to what? PDF versus SDE We are presented with a dual, equivalent representation of statistical phenomena: Statistical PDF versus Stochastic Particle Dynamics (SPD). Which one is best suited to what? SPD is more direct, it provides the detailed dynamics leading to the statistical distributions. However it entails heavy statistical averaging. Capturing rare events can be excruciating. PDF’s give directly the noise-free statistical distribution. However one must solve the FPE, which is generally less straightforward, especially as the number of dimensions increases. In d>3-6, SPD is the only option (Molecular Dynamics and/or MonteCarlo methods)
Appendix: A note on ergodicity Ergodic systems visit their entire phase-space X, so that the time spent on a given region R within X is proportional to the Size of that region (“measure” in math language) Ergodic Ergodicity is often assumed but hardly proven on rigorous grounds. Notable exception: Sinai Billiard
Broken-Ergodicity Ergodicity can be broken by teh existence of Additional invariants of motion, besides energy Broken-ergodicity, the white region is dynamically unaccessible Because it implies a change in the invariant Ensemble average: Many complex systems show broken ergodicity: Glasses, proteins…. Usually associated with highly corrugated free-energy landscapes Historically, broken ergodicity was first observed in the famous Fermi-Pasta-Ulam-Tsingou problem (see pdf file)
Ergodic Theorem E R Time average of a given observable A(x): G O N Ensemble average: Brackets denote average over Initial Conditions. First order dynamical system:
Assignements 1. Sample the Gaussian distribution with the first order stochastic dynamics xdot=-k*x + noise (Langevin equation) 2. Optional for the enthusiast: Compute the PDF generated by an harmonic oscillator by performing ensemble averaging over a set of initial conditions (positions and velocities) with the same energy E (micro-canonical ensemble). Hint: you must bin (x,p) phase space…
End of Lecture