Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.

Slides:



Advertisements
Similar presentations
BA 275 Quantitative Business Methods
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Introduction to Regression Analysis
Classical Regression III
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 12 Multiple Regression
The Simple Regression Model
The Basics of Regression continued
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
T-test.
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Chapter 11 Multiple Regression.
BCOR 1020 Business Statistics
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression and Correlation
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 7 – T-tests Marshall University Genomics Core Facility.
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
Correlation and Linear Regression
Biostatistics course Part 16 Lineal regression Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Testing Hypotheses about Differences among Several Means.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Simple Linear Regression ANOVA for regression (10.2)
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Relationship with one independent variable.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Dependent (response) Variable Independent (control) Variable Random Error XY x1x1 y1y1 x2x2 y2y2 …… xnxn ynyn Raw data: Assumption:  i ‘s are independent.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
The p-value approach to Hypothesis Testing
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
I231B QUANTITATIVE METHODS Analysis of Variance (ANOVA)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10 – Correlation and linear regression: Introduction.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
The simple linear regression model and parameter estimation
REGRESSION G&W p
Chapter 11: Simple Linear Regression
Relationship with one independent variable
Correlation and Regression
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Model Comparison: some basic concepts
Relationship with one independent variable
Simple Linear Regression
Presentation transcript:

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models

Recap of Models Last time we saw: – A statistical model is a mathematical function that predicts the value of a dependent variable from the values of independent variables – Depends on parameters Unknown values that are properties of the population – “Fitting a model to data” means finding the values of the parameters which make the observed values most likely

Linear Regression One example of a model is Simple Linear Regression – Predict a dependent variable as a linear function of an independent variable – Has parameters intercept and slope Y i = β 0 + β 1 X i + ε i

Comparing Models In the linear regression example, we also computed a p-value – The null hypothesis was that the slope was zero – I.e. we compared the model Y = β 0 + β 1 × X + ε to Y = β 0 + ε So we can think of this statistical test as the comparison between two models – In fact, we can think of most (perhaps all) statistical tests as the comparison between two models Marshall University School of Medicine

Hypothesis test of linear regression as a comparison of models Marshall University School of Medicine

Why model comparison is not straightforward It is not enough just to compare the “residuals” between two models – Remember the residuals are the error terms in the model – A model with more parameters will always come closer to the data However, the confidence intervals will be wider So the model will be less useful for predicting future values Marshall University School of Medicine

Comparing the models and R 2 HypothesisDistance measured from Sum of squaresPercentage of variation NullMean155, % Linear relationshipStraight line63, % Difference(Improvement)92, % Marshall University School of Medicine The total sum of squares of the distance of points from the mean i.e. the total variance is 155, The total sum of squares of the residuals is 63, The difference between these is 92,280.93, which is 59.3% of the total variance So the linear model results in an improvement in the variance which is 59.3% of the total: this is the definition of R 2 : R 2 =0.593

Interpreting the difference in variance With a little algebra, you can show that the difference between the total variance and the sum of the squares of the residuals is the sum of the squares of the distance between the regression line and the mean So the regression line “accounts for 59.3% of the variance” Marshall University School of Medicine

Computing a p-value for model comparison To compute a p-value for the comparison of models, we look at both the sum of squares for each model and the degrees of freedom for each model – The number of degrees of freedom is the number of data points, minus the number of parameters in the model – We had 13 data points, so there are 12 degrees of freedom for the null hypothesis model, and 11 degrees of freedom for the linear model Marshall University School of Medicine

Mean squares and F-ratio Source of variation Sum of squaresDegrees of Freedom Mean squaresF-ratio Regression92, Random63,361115,760 Total155,64212 Marshall University School of Medicine The same data presented in the format of an ANOVA (we will see this later) “Total” represents the total variation in the data “Random” is the variation in the data around the regression line “Regression” is the difference between them: the sum of squares of distances from the regression line to the mean The “mean squares” is the sum of squares divided by the degrees of freedom The F-ratio is the ratio of mean squares

Computing a p-value The null hypothesis is that the “horizontal line model” is the correct model – i.e. the slope in the regression model is zero If the null hypothesis were true, the F-ratio would be close to 1 (this is not obvious!) The distribution of values of the F-ratio, assuming the null hypothesis is known, is a known distribution – Called the F-distribution – depends on two different degrees of freedom – so a p-value can be computed The p-value in this example is p= Marshall University School of Medicine

Recap We re-examined the linear regression example and re-cast it as a comparison of statistical models Can compute a p-value for the null hypothesis that the simpler model is “correct” – “As correct as the more complex model” This is the same p-value we computed before The R 2 value is the proportion of variance “explained by” the regression We can do the same for other statistical tests! Marshall University School of Medicine

A t-test considered as a comparison of models Recall the GRHL2 expression in Basal-A and Basal-B cancer cells We can re-cast this as a linear regression… – Let x=0 for Basal A cells and x=1 for Basal B cells Our linear model is: Expression = β 0 + β 1 × x + ε with the null hypothesis Expression = β 0 + ε What is β 1 ? – Slope = increase in expression for increase in one unit of x – = difference in expression between Basal A and Basal B – = difference in means… Marshall University School of Medicine

t-test as a comparison of models Marshall University School of Medicine

Results of running the t-test as a comparison of models Running the linear regression gives estimates of the intercept of and slope of The table of variances is Marshall University School of Medicine ModelSum of squares DFMean Squares F-ratio Regression Residual Total

Interpreting the table of variances The total sum of squares (33.753) is the sum of squares of the differences between each value and the overall mean – This, divided by the df (33.753/26=1.298) is the sample variance The residual sum of squares is the sum of the squares of each expression value minus its predicted value – The predicted value is just the mean for its basal type – This is the “within group” variance The regression sum of squares is the sum of squares of the differences between predicted values and the overall mean – This is the sum of squares of the differences between the group means and the overall mean One squared difference for each data point These interpretations will be really useful to consider when we study ANOVA Marshall University School of Medicine