Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)

Similar presentations


Presentation on theme: "The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)"— Presentation transcript:

1 The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)

2 Chapter 6 Data Processing and Analysis Significant Figures Average, Variance, Deviation Probability distribution Correlation Coefficient Parameter Fitting Examples Contents Covered in Chapter 21

3 Assignment P.504: 17, 19

4 Two sets of measurements. (a) The volume of the liquid is reported as 1.23 +/-0.01 and 1.17 +/- 0.01 mL. (b) The mass (the reading on this balance scale) is reported as 1.778 +/- 0.001 g and 1.781 +/- 0.001 g. The precision of the balance reading is greater than that of the volume. Measurement (Experiment) Deviation (measure of error)

5 3

6 The number of significant figures that result when adding or subtracting.

7 The number of significant figures that result when multiplying or dividing.

8 Accuracy and Precision Accuracy  Overall Measurement Precision  Single Measurement Accurate (free of systematic errors)  Inaccurate Precise (free of random errors)  Imprecise 7.84 g,7.85 g,7.83 g 7.83 g,7.92 g,7.93 g Real mass=7.89 g Higher precision Higher accuracy

9 Frequency distribution 4673635766 3445865265 7744308544 6282765554 9522936574 04567 8 1239 10 2 4 6 8 Result (x) Frequency(n) Bar chartTabular form Experimental results

10 Random errors 40.644.947.139.545.338.942.947.045.044.2 39.340.748.443.148.944.943.237.145.342.7 47.546.540.940.538.933.349.143.741.3 45.336.949.337.347.244.342.943.443.141.1 51.143.337.936.953.239.345.742.747.146.8 33 41434547 49 35373951 53 2 4 6 8 10 Value(x) Frequency(n) Frequency histogram 32 40424446 48 34363850 52 10 20 30 40 50 Value(x) Cumulative frequency (fn) 54 Cumulative graph Experimental results

11 How many experiments we should perform? Theoretically, N should be infinite in order to achieve perfect precision and accuracy. In practice, N should be a large number to achieve high accuracy. In “acceptable” practice, N is a small number (or just 1!) when the requirement of accuracy is not strict or the instrumental precision is limited.

12 Mean, mode and median Arithmetic mean (average): Mode: the value of the variable that has the greatest frequency (the most popular value, or the most probable value) Median: the value of the variable that divides the distribution into two equal halves. Unimodal, bimodal,…,multimodal 04567 8 1239 10 2 4 6 8 Result (x) Frequency(n) 04567 8 1239 10 2 4 6 8 Result (x) Frequency(n)

13 Variance and standard deviation 33 41434547 49 35373951 53 2 4 6 8 10 Value(x) Frequency(n) Interquartile range Upper quartile Lower quartile Range (dispersion, spread) The mean deviation is always zero: Variance (mean of squares of deviations): Standard deviation (root mean square deviation):

14 High vs. Low Variance These graphs illustrate the notion of variance. The one on the left is more dispersed than the one on the right. It has a higher variance Same total frequencies, different relative frequencies. Equal zeroth moments, different second moments. Variance ~ dispersion, spread

15 Example

16 Calculating variance

17 Some less used measures of deviation Kurtosis (degree of “peakedness”): Skewness (degree of asymmetry): n-th (centered) moment: Zero-th moment ~ total probability (1), First moment ~ average deviation (0), Second moment ~ variance Third moment ~ skewness Fourth moment ~ kurtosis

18 Positive vs. Negative Skewness Exhibit 1 These graphs illustrate the notion of skewness. Both PDFs have the same expectation and variance. The one on the left is positively skewed. The one on the right is negatively skewed. Low vs. High Kurtosis Exhibit 1 These graphs illustrate the notion of kurtosis. The PDF on the right has higher kurtosis than the PDF on the left. It is more peaked at the center, and it has fatter tails. Same total frequencies, same variances, different skewnessies. Equal zeroth moments, equal first moments, equal second moments, different third moments. Equal zeroth moments, equal first moments, equal second moments, equal third moments, different fourth moments. Same total frequencies, same variances, same skewnessies different kurtoses. Skewness ~ asymmetry Kurtosis ~ peakedness

19 Centered and Uncentered Moments n-th (centered) moment:n-th (uncentered) moment:

20 Classroom Exercise: Calculate the zeroth through fourth moments of the following tests 46736 35736 Experimental results

21 Classroom Exercise: Calculate the zeroth through fourth moments of the following tests 46736 35736 Experimental results n-th moment:

22 Probability distributions The binomial distribution (discrete) The Boltzmann distribution (discrete) The uniform distribution (continuous) The Gaussian distribution (continuous)

23 Expectation value (mean) Nn(H)f(H) 2561180.461 404020680.512 1200060620.505 24000119420.498

24 The binomial distribution (Bernoulli distribution) nn(H)f(H) 2561180.461 404020680.512 1200060620.505 24000119420.498 The probability with the m heads: = The probability of “head up” × the probability of “tail up” for a single toss × the total number of outcomes More general cases: two exclusive events with probability of p and q=1-p The probability of ‘head up’ for each toss is ½.

25 The distribution of molecular states EEE E These particles might be distinguishable Distribution = Population pattern..............

26 Enormous possibilities! EE E E E E E E E E E E E E..............

27 Distinguishable particles E E E EE

28 Principle of equal a priori probabilities All possibilities for the distribution of energy are equally probable. An assumption and a good assumption.

29 They are equally probable EE E E E E E E E E E E E E..............

30 E E E E E They are equally probable

31 Configuration and weights {5,0,0,...} The numbers of particles in the states

32 {3,2,0,...}

33 One configuration may have large number of instantaneous configurations

34 {N-2,2,0,...} How many instantaneous configurations? N(N-1)/2

35 E 18!/3!/4!/5!/6! {3,4,5,6}

36 Configuration and weights W is huge! 20 particles: {1,0,3,5,10,1}  W=931000000 How about 10000 particles with {2000,3000,4000,1000}?

37 Stirling’s approximation: n ln n!Eq.(21.25)n ln n-nln n!-( n ln n-n) 1015.10415.09613.0262.078 52156.361156.359153.4652.896 10 2.2*10 11 12.7 10 23 5.2*10 24 27.4

38 W max {n i}max {n i } There is an overwhelming configuration W

39 The Boltzmann distribution Which distribution is most probable? We may use Lagrange multiplier method to find the extreme value of W. It can be found from thermodynamics:

40 The uniform distribution ba 0 x b-a ρ(x)

41 Radial distribution functions Radial density function:

42 Radial distribution functions of the hydrogenic atoms

43 Look too familiar?

44 The Gaussian distribution (The normal distribution) μ-3 x σ=0.5 ρ(x) μ-2μ-1μ μ+1μ+2μ+3 σ=1.0 σ=2.5

45 μ-3σ x ρ(x) μ-2σ μ-σμ-σ μ μ+σμ+σμ+2σμ+3σ 68% 16%

46 Multiple variables For independent variables: Correlation coefficient: Positive/negative correlation Covariance:

47 Examples x y x y

48 Inorg. Chem. Org chem math Org chem

49 Examples math Phys chem math Inorg chem

50 Example a 1.2 2.1 3.3 4.1 5.5 6.2 7.2 8.3 8.9 10.1 11.1 11.6 b 4.4 4.9 6.4 7.3 8.8 10.3 11.7 13.2 14.8 15.3 16.5 17.2 Find the correlation of the following two parameters: Correlation coefficient:

51 Classroom Exercise x 1 2 3 4 5 6 7 8 9 10 11 12 y 4.4 4.9 6.4 7.3 8.8 10.3 11.7 13.2 14.8 15.3 16.5 17.2 Find the correlation of the following two parameters: Correlation coefficient:

52 Regression (fitting) How to judge the quality of a fit?

53 Simple least square fitting y(x) xnxn x1x1 0 x εiεi Minimize the quantity:

54 The straight-line fit Which is best fit?

55 Example x0369121518 y3.32.52.31.71.40.50.2 Find the linear least square fit for the following data set σ=0.25

56 Classroom Exercise Find the linear straight-line fit for the following data points: X 1 2 3 4 5 6 7 8 9 10 11 12 Y 4.4 4.9 6.4 7.3 8.8 10.3 11.7 13.2 14.8 15.3 16.5 17.2 σ=0.14

57 Chi-square fitting minimum Justification:

58 The straight-line fit

59 Example: Fitting an Exponential Function t s(t) t1.02.03.04.05.06.07.08.09.010.0 s 3.22.31.61.41.31.21.11.051.03 σ0.520.250.120.140.110.080.030.02

60 Sample statistics In practice, the sample is necessarily finite so one can only obtain an estimate of the parent population or distribution are obtained from the statistics of the sample. It gives the best estimate of mean but underestimate the variance by a factor of (N-1)/N. This may be significant for small sample sizes. To correct this point, introduce

61 Example For the sample of three values, The mean: Two versions of s 2 give

62 Error Analysis How close your exp value is to the real value? t -distribution (Student’s distribution):

63 Degree of freedom ( N-1 ) Confidence factor 0.90.950.990.999 16.3112.7163.66636.62 22.924.309.9331.60 32.353.185.8412.92 42.132.784.608.61 52.022.574.036.87 61.942.453.715.96 71.892.373.505.41 81.862.313.365.04 91.832.263.254.78 101.812.233.174.59 111.802.203.114.44 121.782.183.064.32 131.772.163.014.22 141.762.142.984.14 151.752.132.954.07 161.752.122.924.02 171.742.112.903.97 181.732.102.883.92 191.732.092.863.88 201.722.092.853.85 t -distribution (Student’s distribution):

64 211.722.082.833.82 221.722.072.823.79 231.712.072.823.77 241.712.062.803.75 251.712.062.793.73 261.712.062.783.71 271.702.052.773.69 281.702.052.763.67 291.702.052.763.66 301.702.042.753.65 401.682.022.703.55 601.672.002.663.46 1201.661.982.623.37 ¥ 1.651.962.583.29 t -distribution (Student’s distribution)—cont.

65 Error Bar: An Example The measured values of certain quantity with 10 repetitive experiments are as follows: 40.644.947.139.545.338.942.947.045.044.2 We have Looking up the t-table (N-1= 9), with 95% confidence (  t=2.26), the error is in the range: i.e., the error bar should be centered at 44.35 with a length of 2*6.278. It must be emphasized that each datapoint has its own error bar (although in many cases only typical or representative error bars are given.) In other cases, error bars are just variances.

66 Hypothesis Test, s,N, μ If t t* , the test supports the originally claimed range of μ with a probability smaller than 5 %。 There’re many methods for testing a statistical hypothesis. t-test: From the original claim (null hypothesis), look up t, denoted as t*. You make a series of sampling ( N experiments). the range of t.

67 Hypothesis Test A student read an article on 13C NMR relaxation time of nanoparticles. The spectrum was a single peak. The authors reported a relaxation time of 2.45 sec ± 0.05 sec, with 95% of confidence. Because relaxation time is very crucial for understanding this sample, the student made his own measurements under the same conditions and he got the following data: We have Looking up the t-table (N-1= 9), with 95% confidence (  t*=2.26), μ=(2.45-0.05,2.45-0.05) = (2.50,2.40). 2.25 2.30 2.35 2.32 2.34 2.22 2.28 2.19 2.23 2.27 Is the value reported in the article trustworthy? = 2.275, s = 0.06 , N = 10 , we immediately have, with 95.0 % confidence, the range of t: (-6.81, -12.26) , smaller than -2.26. It thus concludes that the reported value of 13 C relaxation time was not trustworthy because it’s probability is lower than 5%.

68 Principal Components Analysis Purpose: Real life is complex. Most problems in real life are complex. Facing a complex problem, such as the factors affecting the economy of a region, the factors causing certain disease, metabolism of biological system, Approach 1: To have a through and comprehensive understanding, we must simplify, purify, isolate, idealize, …..  scientific method, or physical method Approach 2: To have a partial and superficial understanding, we do not have to simplify, purify, isolate, idealize…. Instead, we may use intact sample and perform, in vivo, in situ, real time studies  statistical method. It is often followed or complimented by scientific method.

69 Principal Components Analysis Procedure: (1)Data from a set of samples (2)Covariance matrix S (3)Diagonalize S and obtain D (PC) (4)Contribution factors (5)Loading factors


Download ppt "The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)"

Similar presentations


Ads by Google