Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.

Similar presentations


Presentation on theme: "Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use."— Presentation transcript:

1 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use and abuse of statistics l Statistical analysis as model building l Parameters and estimators l Parametric versus non-parametric statistics l Estimation techniques: least squares and maximum likelihood l The use and abuse of statistics l Statistical analysis as model building l Parameters and estimators l Parametric versus non-parametric statistics l Estimation techniques: least squares and maximum likelihood

2 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.2 Some opinions of statistics “There are three types of lies: lies, damn lies, and statistics!” Benjamin Disraeli “If your experiment needs statistics, you should have done a better experiment.” Ernest Rutherford

3 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.3 Some opinions of statistics “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem “The purpose of models is not to fit the data, but to sharpen the questions.” Samuel Karlin examination; he may be able to say what the experiment died of.” Sir Ronald Fisher

4 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.4 The uses of statistics l Provide a data summary l Help discover trends and patterns. l Evaluate magnitude and direction of experimental effects l Provide a data summary l Help discover trends and patterns. l Evaluate magnitude and direction of experimental effects l Assist in the design of experiments and field studies l A priori decisions about usefulness of experiments. l Assist in the design of experiments and field studies l A priori decisions about usefulness of experiments. l Evaluate biological hypotheses by testing to see whether observed patterns are consistent with predictions. DescriptionDesignHypothesis-testing

5 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.5 What statistics can and can’t do l provide objective criteria for evaluating hypotheses l help optimize effort l help you critically evaluate arguments l provide objective criteria for evaluating hypotheses l help optimize effort l help you critically evaluate arguments l tell the truth (probabilistic conclusions only!) l compensate for poor design l indicate biological significance: statistical significance does not mean biological significance, nor vice versa! CanCan’t

6 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.6 Four important questions to ask yourself before beginning any statistical analysis l Is there any reason to believe that your observations are independent and that in fact the data represent a “random sample”? And if so, random with respect to what? l Is it even possible to answer your question with the data you collected? l Can the contemplated analysis even answer your question, assuming there is an answer? l Are there alternate ways of analyzing the data? l Is there any reason to believe that your observations are independent and that in fact the data represent a “random sample”? And if so, random with respect to what? l Is it even possible to answer your question with the data you collected? l Can the contemplated analysis even answer your question, assuming there is an answer? l Are there alternate ways of analyzing the data?

7 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.7 The four ages of statistical man AgeDefining characteristicsComment StoneTotal ignoranceIgnorance is not bliss! BronzeNodding familiarity, but understanding purely superficial Statistics a (small) sidebar to scientific investigation (See Rutherford, Ernest) SilverModerate familiarity coupled with a strong desire to demonstrate same; statistical reach exceeds grasp Overwhelming concern with statistical minutae; scientific forest often obscured by statistical trees. GoldKnows when statistical issues are (and are not) important; recognizes limitations (of self and statistical science) That to which we can/should all aspire.

8 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.8 Statistical analysis as model building l All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. l “Model fitting” is then the process by which model parameters are estimated. l All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. l “Model fitting” is then the process by which model parameters are estimated. X Y Y 22 22   42 Group 1 Group 2 Group 3 Linear regression ANOVA

9 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.9 Parameters, statistics and estimators l parameters characterize populations (which in general cannot be completely enumerated) l statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) l parameters characterize populations (which in general cannot be completely enumerated) l statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) l The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Population Sample

10 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.10 Parametric statistical analysis l Estimating model parameters based on a finite sample and inferring from these estimates the values of the corresponding population parameters l Therefore, parametric analysis requires relatively restrictive assumptions about the relationships between the sample and the population, i.e. about the distributions from which samples are drawn and the nature of the drawing (e.g., normal distributions and random sampling) X Y Sample Population Inference X

11 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.11 Non-parametric statistical analysis l Calculation of model parameters based on a finite sample, but no inference to corresponding population parameters l Therefore, non-parametric analysis requires relatively minimal assumptions about the relationships between the sample and the population (e.g. normal distributions of sampled variables not required) 

12 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.12 Least squares estimation (LSE) An ordinary least squares (OLS) estimate of a model parameter  is that which minimizes the sum of squared differences between observed and predicted values: l Predicted values are derived from some model whose parameters we wish to estimate OLS  SS R

13 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.13 Example: LSE of the population mean l Data consists of a set of n observations x 1, x 2, …, x n. l The “model” for the I th observation is: What is the LSE of the (only) model parameter  ? To obtain this estimate, choose a value for , calculate SS R, choose another value, recalculate SS R, …. l Data consists of a set of n observations x 1, x 2, …, x n. l The “model” for the I th observation is: What is the LSE of the (only) model parameter  ? To obtain this estimate, choose a value for , calculate SS R, choose another value, recalculate SS R, …. The LSE of  is that value which minimizes SS R … l …which turns out to be the sample mean:  SS R

14 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.14 Example: LSE of model parameters in simple linear regression l Data consists of a set of n paired observations (x 1, y 1 ), …, (x n y n ) l The “model” for the I th observation is: What is the LSE of the model parameters  and  ? l Data consists of a set of n paired observations (x 1, y 1 ), …, (x n y n ) l The “model” for the I th observation is: What is the LSE of the model parameters  and  ? X Y ii Residual:

15 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.15 Maximum likelihood estimation (MLE) A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. l MLEs are obtained by maximizing the loss function A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. l MLEs are obtained by maximizing the loss function l …or equivalently, by minimizing the negative log likelihood function MLE  L or - log L - log L L

16 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.16 Example: MLEs of normal distribution parameters l Data consists of a set of n observations x 1, x 2, …, x n. Model: sample comes from a normal distribution N( ,  2 ), so  and  2 are the model parameters we want to estimate. l The model probability density is: l Data consists of a set of n observations x 1, x 2, …, x n. Model: sample comes from a normal distribution N( ,  2 ), so  and  2 are the model parameters we want to estimate. l The model probability density is: l …and log likelihood is: To obtain MLE estimates for  and  2, iterate - L until convergence criteria are satisfied.

17 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.17 Example: MLEs of non linear model parameters l Data consists of a set of n paired observations (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). l Model is that the expected value of y is the sum of two exponentials: l The distribution of y at each x is assumed Poisson: l Data consists of a set of n paired observations (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). l Model is that the expected value of y is the sum of two exponentials: l The distribution of y at each x is assumed Poisson: l …and log likelihood is To obtain MLE estimates for        and  , iterate - log L until convergence criteria are satisfied.

18 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.18 Algorithms for LSE/MLE l All use some sort of generalized “gradient descent” method. l If loss function is well behaved, then estimation is relatively easy. l However, if it is not well behaved, incorrect estimates may be obtained. l All use some sort of generalized “gradient descent” method. l If loss function is well behaved, then estimation is relatively easy. l However, if it is not well behaved, incorrect estimates may be obtained. LSE/MLE  SS R or - log L LSE/MLE  SS R or - log L Gradient descent

19 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.19 Important notes l While it is often possible to obtain estimates of model parameters using both LSE and MLE, these estimates may differ. l Especially for non- linear models, estimation of parameters can be tricky because the loss function surfaces often have a very rugged topography (many local peaks and valleys).


Download ppt "Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use."

Similar presentations


Ads by Google