The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

The Simple Regression Model
Chapter 7 Statistical Data Treatment and Evaluation
FREQUENCY ANALYSIS Basic Problem: To relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions.
Chapter 10 Simple Regression.
Probability Densities
Sampling Distributions
SIMPLE LINEAR REGRESSION
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
Probability and Statistics Review
Chapter 11 Multiple Regression.
CHAPTER 6 Statistical Analysis of Experimental Data
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
PSYC512: Research Methods PSYC512: Research Methods Lecture 8 Brian P. Dyre University of Idaho.
SIMPLE LINEAR REGRESSION
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
SIMPLE LINEAR REGRESSION
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
AM Recitation 2/10/11.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Statistical Analysis Statistical Analysis
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
CORRELATION & REGRESSION
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Measures of Variability OBJECTIVES To understand the different measures of variability To determine the range, variance, quartile deviation, mean deviation.
Review of Probability Concepts ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes SECOND.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 Statistical Analysis – Descriptive Statistics Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
ENGR 610 Applied Statistics Fall Week 2 Marshall University CITE Jack Smith.
Chapter Eight: Using Statistics to Answer Questions.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
High Performance Statistical Queries. Sponsors Agenda  Introduction  Descriptive Statistics  Linear dependencies  Continuous variables  Discrete.
BPS - 5th Ed. Chapter 231 Inference for Regression.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Estimating standard error using bootstrap
Confidence Intervals Cont.
Part 5 - Chapter
Review 1. Describing variables.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Part Three. Data Analysis
Basic Estimation Techniques
Review of Probability Concepts
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
Interval Estimation and Hypothesis Testing
Descriptive Statistics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

The Mathematics for Chemists (I) (Fall Term, 2004) (Fall Term, 2005) (Fall Term, 2006) Department of Chemistry National Sun Yat-sen University 化學數學(一)

Chapter 6 Data Processing and Analysis Significant Figures Average, Variance, Deviation Probability distribution Correlation Coefficient Parameter Fitting Examples Contents Covered in Chapter 21

Assignment P.504: 17, 19

Two sets of measurements. (a) The volume of the liquid is reported as /-0.01 and / mL. (b) The mass (the reading on this balance scale) is reported as / g and / g. The precision of the balance reading is greater than that of the volume. Measurement (Experiment) Deviation (measure of error)

3

The number of significant figures that result when adding or subtracting.

The number of significant figures that result when multiplying or dividing.

Accuracy and Precision Accuracy  Overall Measurement Precision  Single Measurement Accurate (free of systematic errors)  Inaccurate Precise (free of random errors)  Imprecise 7.84 g,7.85 g,7.83 g 7.83 g,7.92 g,7.93 g Real mass=7.89 g Higher precision Higher accuracy

Frequency distribution Result (x) Frequency(n) Bar chartTabular form Experimental results

Random errors Value(x) Frequency(n) Frequency histogram Value(x) Cumulative frequency (fn) 54 Cumulative graph Experimental results

How many experiments we should perform? Theoretically, N should be infinite in order to achieve perfect precision and accuracy. In practice, N should be a large number to achieve high accuracy. In “acceptable” practice, N is a small number (or just 1!) when the requirement of accuracy is not strict or the instrumental precision is limited.

Mean, mode and median Arithmetic mean (average): Mode: the value of the variable that has the greatest frequency (the most popular value, or the most probable value) Median: the value of the variable that divides the distribution into two equal halves. Unimodal, bimodal,…,multimodal Result (x) Frequency(n) Result (x) Frequency(n)

Variance and standard deviation Value(x) Frequency(n) Interquartile range Upper quartile Lower quartile Range (dispersion, spread) The mean deviation is always zero: Variance (mean of squares of deviations): Standard deviation (root mean square deviation):

High vs. Low Variance These graphs illustrate the notion of variance. The one on the left is more dispersed than the one on the right. It has a higher variance Same total frequencies, different relative frequencies. Equal zeroth moments, different second moments. Variance ~ dispersion, spread

Example

Calculating variance

Some less used measures of deviation Kurtosis (degree of “peakedness”): Skewness (degree of asymmetry): n-th (centered) moment: Zero-th moment ~ total probability (1), First moment ~ average deviation (0), Second moment ~ variance Third moment ~ skewness Fourth moment ~ kurtosis

Positive vs. Negative Skewness Exhibit 1 These graphs illustrate the notion of skewness. Both PDFs have the same expectation and variance. The one on the left is positively skewed. The one on the right is negatively skewed. Low vs. High Kurtosis Exhibit 1 These graphs illustrate the notion of kurtosis. The PDF on the right has higher kurtosis than the PDF on the left. It is more peaked at the center, and it has fatter tails. Same total frequencies, same variances, different skewnessies. Equal zeroth moments, equal first moments, equal second moments, different third moments. Equal zeroth moments, equal first moments, equal second moments, equal third moments, different fourth moments. Same total frequencies, same variances, same skewnessies different kurtoses. Skewness ~ asymmetry Kurtosis ~ peakedness

Centered and Uncentered Moments n-th (centered) moment:n-th (uncentered) moment:

Classroom Exercise: Calculate the zeroth through fourth moments of the following tests Experimental results

Classroom Exercise: Calculate the zeroth through fourth moments of the following tests Experimental results n-th moment:

Probability distributions The binomial distribution (discrete) The Boltzmann distribution (discrete) The uniform distribution (continuous) The Gaussian distribution (continuous)

Expectation value (mean) Nn(H)f(H)

The binomial distribution (Bernoulli distribution) nn(H)f(H) The probability with the m heads: = The probability of “head up” × the probability of “tail up” for a single toss × the total number of outcomes More general cases: two exclusive events with probability of p and q=1-p The probability of ‘head up’ for each toss is ½.

The distribution of molecular states EEE E These particles might be distinguishable Distribution = Population pattern

Enormous possibilities! EE E E E E E E E E E E E E

Distinguishable particles E E E EE

Principle of equal a priori probabilities All possibilities for the distribution of energy are equally probable. An assumption and a good assumption.

They are equally probable EE E E E E E E E E E E E E

E E E E E They are equally probable

Configuration and weights {5,0,0,...} The numbers of particles in the states

{3,2,0,...}

One configuration may have large number of instantaneous configurations

{N-2,2,0,...} How many instantaneous configurations? N(N-1)/2

E 18!/3!/4!/5!/6! {3,4,5,6}

Configuration and weights W is huge! 20 particles: {1,0,3,5,10,1}  W= How about particles with {2000,3000,4000,1000}?

Stirling’s approximation: n ln n!Eq.(21.25)n ln n-nln n!-( n ln n-n) * *

W max {n i}max {n i } There is an overwhelming configuration W

The Boltzmann distribution Which distribution is most probable? We may use Lagrange multiplier method to find the extreme value of W. It can be found from thermodynamics:

The uniform distribution ba 0 x b-a ρ(x)

Radial distribution functions Radial density function:

Radial distribution functions of the hydrogenic atoms

Look too familiar?

The Gaussian distribution (The normal distribution) μ-3 x σ=0.5 ρ(x) μ-2μ-1μ μ+1μ+2μ+3 σ=1.0 σ=2.5

μ-3σ x ρ(x) μ-2σ μ-σμ-σ μ μ+σμ+σμ+2σμ+3σ 68% 16%

Multiple variables For independent variables: Correlation coefficient: Positive/negative correlation Covariance:

Examples x y x y

Inorg. Chem. Org chem math Org chem

Examples math Phys chem math Inorg chem

Example a b Find the correlation of the following two parameters: Correlation coefficient:

Classroom Exercise x y Find the correlation of the following two parameters: Correlation coefficient:

Regression (fitting) How to judge the quality of a fit?

Simple least square fitting y(x) xnxn x1x1 0 x εiεi Minimize the quantity:

The straight-line fit Which is best fit?

Example x y Find the linear least square fit for the following data set σ=0.25

Classroom Exercise Find the linear straight-line fit for the following data points: X Y σ=0.14

Chi-square fitting minimum Justification:

The straight-line fit

Example: Fitting an Exponential Function t s(t) t s σ

Sample statistics In practice, the sample is necessarily finite so one can only obtain an estimate of the parent population or distribution are obtained from the statistics of the sample. It gives the best estimate of mean but underestimate the variance by a factor of (N-1)/N. This may be significant for small sample sizes. To correct this point, introduce

Example For the sample of three values, The mean: Two versions of s 2 give

Error Analysis How close your exp value is to the real value? t -distribution (Student’s distribution):

Degree of freedom ( N-1 ) Confidence factor t -distribution (Student’s distribution):

¥ t -distribution (Student’s distribution)—cont.

Error Bar: An Example The measured values of certain quantity with 10 repetitive experiments are as follows: We have Looking up the t-table (N-1= 9), with 95% confidence (  t=2.26), the error is in the range: i.e., the error bar should be centered at with a length of 2* It must be emphasized that each datapoint has its own error bar (although in many cases only typical or representative error bars are given.) In other cases, error bars are just variances.

Hypothesis Test, s,N, μ If t t* , the test supports the originally claimed range of μ with a probability smaller than 5 %。 There’re many methods for testing a statistical hypothesis. t-test: From the original claim (null hypothesis), look up t, denoted as t*. You make a series of sampling ( N experiments). the range of t.

Hypothesis Test A student read an article on 13C NMR relaxation time of nanoparticles. The spectrum was a single peak. The authors reported a relaxation time of 2.45 sec ± 0.05 sec, with 95% of confidence. Because relaxation time is very crucial for understanding this sample, the student made his own measurements under the same conditions and he got the following data: We have Looking up the t-table (N-1= 9), with 95% confidence (  t*=2.26), μ=( , ) = (2.50,2.40) Is the value reported in the article trustworthy? = 2.275, s = 0.06 , N = 10 , we immediately have, with 95.0 % confidence, the range of t: (-6.81, ) , smaller than It thus concludes that the reported value of 13 C relaxation time was not trustworthy because it’s probability is lower than 5%.

Principal Components Analysis Purpose: Real life is complex. Most problems in real life are complex. Facing a complex problem, such as the factors affecting the economy of a region, the factors causing certain disease, metabolism of biological system, Approach 1: To have a through and comprehensive understanding, we must simplify, purify, isolate, idealize, …..  scientific method, or physical method Approach 2: To have a partial and superficial understanding, we do not have to simplify, purify, isolate, idealize…. Instead, we may use intact sample and perform, in vivo, in situ, real time studies  statistical method. It is often followed or complimented by scientific method.

Principal Components Analysis Procedure: (1)Data from a set of samples (2)Covariance matrix S (3)Diagonalize S and obtain D (PC) (4)Contribution factors (5)Loading factors