Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Brief introduction on Logistic Regression
Inference for Regression
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
BIO 4118 Applied Biostatistics
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Today Concepts underlying inferential statistics
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Regression and Correlation Methods Judy Zhong Ph.D.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Simple Linear Regression
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Academic Research Academic Research Dr Kishor Bhanushali M
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 8: Simple Linear Regression Yang Zhenlin.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L1.1 BIO 4118 Applied Biostatistics Scott Findlay Vanier 306, 313, 314
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Nonparametric Statistics
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Computacion Inteligente Least-Square Methods for System Identification.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Virtual University of Pakistan Lecture No. 34 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Estimating standard error using bootstrap
Data Modeling Patrice Koehl Department of Biological Sciences
Chapter 13 Simple Linear Regression
Nonparametric Statistics
Statistical Data Analysis - Lecture /04/03
BIO 4118 Applied Biostatistics
CONCEPTS OF HYPOTHESIS TESTING
Nonparametric Statistics
Discrete Event Simulation - 4
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Elements of a statistical test Statistical null hypotheses
Lecture 7: Single classification analysis of variance (ANOVA)
Presentation transcript:

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use and abuse of statistics l Statistical analysis as model building l Parameters and estimators l Parametric versus non-parametric statistics l Estimation techniques: least squares and maximum likelihood l The use and abuse of statistics l Statistical analysis as model building l Parameters and estimators l Parametric versus non-parametric statistics l Estimation techniques: least squares and maximum likelihood

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.2 Some opinions of statistics “There are three types of lies: lies, damn lies, and statistics!” Benjamin Disraeli “If your experiment needs statistics, you should have done a better experiment.” Ernest Rutherford

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.3 Some opinions of statistics “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem “The purpose of models is not to fit the data, but to sharpen the questions.” Samuel Karlin examination; he may be able to say what the experiment died of.” Sir Ronald Fisher

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.4 The uses of statistics l Provide a data summary l Help discover trends and patterns. l Evaluate magnitude and direction of experimental effects l Provide a data summary l Help discover trends and patterns. l Evaluate magnitude and direction of experimental effects l Assist in the design of experiments and field studies l A priori decisions about usefulness of experiments. l Assist in the design of experiments and field studies l A priori decisions about usefulness of experiments. l Evaluate biological hypotheses by testing to see whether observed patterns are consistent with predictions. DescriptionDesignHypothesis-testing

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.5 What statistics can and can’t do l provide objective criteria for evaluating hypotheses l help optimize effort l help you critically evaluate arguments l provide objective criteria for evaluating hypotheses l help optimize effort l help you critically evaluate arguments l tell the truth (probabilistic conclusions only!) l compensate for poor design l indicate biological significance: statistical significance does not mean biological significance, nor vice versa! CanCan’t

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.6 Four important questions to ask yourself before beginning any statistical analysis l Is there any reason to believe that your observations are independent and that in fact the data represent a “random sample”? And if so, random with respect to what? l Is it even possible to answer your question with the data you collected? l Can the contemplated analysis even answer your question, assuming there is an answer? l Are there alternate ways of analyzing the data? l Is there any reason to believe that your observations are independent and that in fact the data represent a “random sample”? And if so, random with respect to what? l Is it even possible to answer your question with the data you collected? l Can the contemplated analysis even answer your question, assuming there is an answer? l Are there alternate ways of analyzing the data?

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.7 The four ages of statistical man AgeDefining characteristicsComment StoneTotal ignoranceIgnorance is not bliss! BronzeNodding familiarity, but understanding purely superficial Statistics a (small) sidebar to scientific investigation (See Rutherford, Ernest) SilverModerate familiarity coupled with a strong desire to demonstrate same; statistical reach exceeds grasp Overwhelming concern with statistical minutae; scientific forest often obscured by statistical trees. GoldKnows when statistical issues are (and are not) important; recognizes limitations (of self and statistical science) That to which we can/should all aspire.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.8 Statistical analysis as model building l All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. l “Model fitting” is then the process by which model parameters are estimated. l All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. l “Model fitting” is then the process by which model parameters are estimated. X Y Y 22 22   42 Group 1 Group 2 Group 3 Linear regression ANOVA

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.9 Parameters, statistics and estimators l parameters characterize populations (which in general cannot be completely enumerated) l statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) l parameters characterize populations (which in general cannot be completely enumerated) l statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) l The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Population Sample

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.10 Parametric statistical analysis l Estimating model parameters based on a finite sample and inferring from these estimates the values of the corresponding population parameters l Therefore, parametric analysis requires relatively restrictive assumptions about the relationships between the sample and the population, i.e. about the distributions from which samples are drawn and the nature of the drawing (e.g., normal distributions and random sampling) X Y Sample Population Inference X

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.11 Non-parametric statistical analysis l Calculation of model parameters based on a finite sample, but no inference to corresponding population parameters l Therefore, non-parametric analysis requires relatively minimal assumptions about the relationships between the sample and the population (e.g. normal distributions of sampled variables not required) 

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.12 Least squares estimation (LSE) An ordinary least squares (OLS) estimate of a model parameter  is that which minimizes the sum of squared differences between observed and predicted values: l Predicted values are derived from some model whose parameters we wish to estimate OLS  SS R

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.13 Example: LSE of the population mean l Data consists of a set of n observations x 1, x 2, …, x n. l The “model” for the I th observation is: What is the LSE of the (only) model parameter  ? To obtain this estimate, choose a value for , calculate SS R, choose another value, recalculate SS R, …. l Data consists of a set of n observations x 1, x 2, …, x n. l The “model” for the I th observation is: What is the LSE of the (only) model parameter  ? To obtain this estimate, choose a value for , calculate SS R, choose another value, recalculate SS R, …. The LSE of  is that value which minimizes SS R … l …which turns out to be the sample mean:  SS R

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.14 Example: LSE of model parameters in simple linear regression l Data consists of a set of n paired observations (x 1, y 1 ), …, (x n y n ) l The “model” for the I th observation is: What is the LSE of the model parameters  and  ? l Data consists of a set of n paired observations (x 1, y 1 ), …, (x n y n ) l The “model” for the I th observation is: What is the LSE of the model parameters  and  ? X Y ii Residual:

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.15 Maximum likelihood estimation (MLE) A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. l MLEs are obtained by maximizing the loss function A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. l MLEs are obtained by maximizing the loss function l …or equivalently, by minimizing the negative log likelihood function MLE  L or - log L - log L L

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.16 Example: MLEs of normal distribution parameters l Data consists of a set of n observations x 1, x 2, …, x n. Model: sample comes from a normal distribution N( ,  2 ), so  and  2 are the model parameters we want to estimate. l The model probability density is: l Data consists of a set of n observations x 1, x 2, …, x n. Model: sample comes from a normal distribution N( ,  2 ), so  and  2 are the model parameters we want to estimate. l The model probability density is: l …and log likelihood is: To obtain MLE estimates for  and  2, iterate - L until convergence criteria are satisfied.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.17 Example: MLEs of non linear model parameters l Data consists of a set of n paired observations (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). l Model is that the expected value of y is the sum of two exponentials: l The distribution of y at each x is assumed Poisson: l Data consists of a set of n paired observations (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). l Model is that the expected value of y is the sum of two exponentials: l The distribution of y at each x is assumed Poisson: l …and log likelihood is To obtain MLE estimates for        and  , iterate - log L until convergence criteria are satisfied.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.18 Algorithms for LSE/MLE l All use some sort of generalized “gradient descent” method. l If loss function is well behaved, then estimation is relatively easy. l However, if it is not well behaved, incorrect estimates may be obtained. l All use some sort of generalized “gradient descent” method. l If loss function is well behaved, then estimation is relatively easy. l However, if it is not well behaved, incorrect estimates may be obtained. LSE/MLE  SS R or - log L LSE/MLE  SS R or - log L Gradient descent

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.19 Important notes l While it is often possible to obtain estimates of model parameters using both LSE and MLE, these estimates may differ. l Especially for non- linear models, estimation of parameters can be tricky because the loss function surfaces often have a very rugged topography (many local peaks and valleys).