Download presentation
Presentation is loading. Please wait.
1
Statistics
2
Large Systems Macroscopic systems involve large numbers of particles. Microscopic determinism Macroscopic phenomena The basis is in mechanics from individual molecules. Classical and quantum Statistical thermodynamics provides the bridge between levels. Consider 1 g of He as an ideal gas. N = 1.5 10 23 atoms Use only position and momentum. 3 + 3 = 6 coordinates / atom Total 9 10 23 variables Requires about 4 10 9 PB Find the total kinetic energy. K = ( p x 2 + p y 2 + p z 2 )/2 m About 100 ops / collision At 100 GFlops, 9 10 14 s 1 set of collisions in 3 10 7 yr
3
Ensemble Computing time averages for large systems is infeasible. Imagine a large number of similar systems. Prepared identically Independent This ensemble of systems can be used to derive theoretical properties of a single system.
4
Probability Probability is often made as a statement before the fact. A priori assertion - theoretical 50% probability for heads on a coin Probability can also reflect the statistics of many events. 25% probability that 10 coins have 5 heads Fluctuations where 50% are not heads Probability can be used after the fact to describe a measurement. A posteriori assertion - experimental Fraction of coins that were heads in a series of samples
5
Head Count Take a set of experimental trials. N number of trials n number of values (bins) i a specific trial (1 … N) j a specific value (1 … n) Use 10 coins and 20 trials. trial#headstrial#heads 15115 28121 36135 45145 56156 66166 71172 85184 97196 104206
6
Distribution Sorting trials by value forms a distribution. Distribution function f counts occurrences in a bin The mean is a measure of the center of the distribution. Mathematical average Coin distribution = 4.95 Median - midway value Coin median = 5 Mode - most frequent value Coin mode = 6 012345678910 x f(x)f(x) 0 1 2 3 4 5 6 7
7
Probability Distribution The distribution function has a sum equal to the number of trials N. A probability distribution p normalizes the distribution function by N. Sum is 1 The mean can be expressed in terms of the probability. 012345678910 x P(x)P(x) 0 0.1 0.2 0.3
8
Subsample Subsamples of the data may differ on their central value. First five trials Mean 6.0 Median 6 Mode 5 and 6, not unique Experimental probability depends on the sample. Theoretical probability predicts for an infinitely large sample. trial#headstrial#heads 15115 28121 36135 45145 56156 66166 71172 85184 97196 104206
9
Deviation Individual trials differ from the mean. The deviation is the difference of a trial from the mean. mean deviation is zero The fluctuation is the mean of the squared deviations. Fluctuation is the variance Standard deviation squared
10
Correlation Events may not be random, but related to other events. Time measured by trial The correlation function measures the mean of the product of related deviations. Autocorrelation C 0 Different variables can be correlated.
11
Independent Trials Autocorrelation within a sample is the variance. Coin experiment C 0 = 3.147 Nearest neighbor correlation tests for randomness. Coin experiment C 1 = -0.345 Much less than C 0 Ratio C 1 / C 0 = -0.11 Periodic systems have C peak for some period . trial#headstrial#heads 15115 28121 36135 45145 56156 66166 71172 85184 97196 104206
12
Correlation Measure Independent trials should peak strongly at 0. No connection to subsequent events No periodic behavior “This sample autocorrelation plot shows that the time series is not random, but rather has a high degree of autocorrelation between adjacent and near- adjacent observations.” nist.gov
13
Continuous Distribution Data that is continuously distributed is treated with an integral. Probability still normalized to 1 The mean and variance are given as the moments. First moment mean Second moment variance Correlation uses a time integral.
14
Joint Probability The probabilities of two systems may be related. The intersection A B indicates that both conditions are true. Independent probability → P(A B) = P(A)P(B) The union A B indicates that either condition is true. P(A B) =P(A)+P(B)-P(A B) P(A) + P(B), if exclusive A B C C = A B
15
Joint Tosses Define two classes from the coin toss experiment. A = { x < 5 } B = { 2 < x < 8 } Individual probabilities are a union of discrete bins. P(A) = 0.25, P(B) = 0.80 P(A B) = 0.95 Dependent sets don’t follow product rule. P(A B) = 0.1 P(A)P(B) xP(x)P(x) 00 10.10 20.05 30 40.10 50.30 60.35 70.05 8 90 100
16
Conditional Probability The probability of an occurrence on a subset is a conditional probability. Probability with respect to subset. P(A | B) =P(A B) / P(B) Use the same subsets for the coin toss example. P(A | B) = 0.10 / 0.80 = 0.13 A B C C = A | B
17
Combinatorics The probability that n specific occurrences happen is the product of the individual occurrences. Other events don’t matter. Separate probability for negative events Arbitrary choice of events require permutations. Exactly n specific events happen at p: No events happen except the specific events: Select n arbitrary events from a pool of N identical types.
18
Binomial Distribution Treat events as a Bernoulli process with discrete trials. N separate trials Trials independent Binary outcome of trial Probability same for all trials The general form is the binomial distribution. Terms same as binomial expansion Probabilities normalized mathworld.wolfram.com
19
Mean and Standard Deviation The mean of the binomial distribution: Consider an arbitrary x, and differentiate, and set x = 1. The standard deviation of the binomial distribution:
20
Poisson Distribution Many processes are marked by rare occurrences. Large N, small n, small p This is the Poisson distribution. Probability depends on only one parameter Np Normalized when summed from n =0 to .
21
Poisson Properties The mean and standard deviation are simply related. Mean = Np, standard deviation 2 = , Unlike the binomial distribution the Poisson function has values for n > N.
22
Poisson Away From Zero The Poisson distribution is based on the mean = Np. Assumed N >> 1, N >> n. Now assume that n >> 1, large and P n >> 0 only over a narrow range. This generates a normal or Gaussian distribution. Let x = n – . Use Stirling’s formula.
23
Normal Distribution The full normal distribution separates mean and standard deviation parameters. Tables provide the integral of the distribution function. Useful benchmarks: P (| x - | < 1 = 0.683 P (| x - | < 2 = 0.954 P (| x - | < 3 = 0.997 0 x P(x)P(x)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.