Comparing k Populations

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Cross Tabulation and Chi Square Test for Independence.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Crosstabs and Chi Squares Computer Applications in Psychology.
Simple Linear Regression Analysis
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Correlation & Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Correlation and Regression
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Multivariate Data Summary. Linear Regression and Correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Environmental Modeling Basic Testing Methods - Statistics III.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
Statistical Analysis ANOVA Roderick Graham Fashion Institute of Technology.
Correlation. The statistic: Definition is called Pearsons correlation coefficient.
Hypothesis Testing Example 3: Test the hypothesis that the average content of containers of a particular lubricant is 10 litters if the contents of random.
Linear Regression Hypothesis testing and Estimation.
Multivariate Data Summary. Linear Regression and Correlation.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Chi-square test.
Inference about the slope parameter and correlation
Test of independence: Contingency Table
Correlation & Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
SEMINAR ON ONE WAY ANOVA
Factorial Experiments
CHAPTER 12 ANALYSIS OF VARIANCE
Correlation and Simple Linear Regression
Hypothesis testing and Estimation
Comparing k Populations
Correlation and Regression
Comparing k Populations
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Multivariate Data Summary
Hypothesis testing and Estimation
Correlation and Simple Linear Regression
Statistical Inference about Regression
What if. . . You were asked to determine if psychology and sociology majors have significantly different class attendance (i.e., the number of days.
Correlation and Regression
Comparing k Populations
Analyzing the Association Between Categorical Variables
Simple Linear Regression and Correlation
ANOVA Analysis of Variance.
SIMPLE LINEAR REGRESSION
Marketing Research and Consumer Behavior Insights
Statistical Inference for the Mean: t-test
One way Analysis of Variance (ANOVA)
Presentation transcript:

Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The F test – for comparing k means Situation We have k normal populations Let mi and s denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s

We want to test against

A convenient method for displaying the calculations for the F-test The ANOVA Table A convenient method for displaying the calculations for the F-test

Anova Table Mean Square F-ratio Between k - 1 SSBetween MSBetween Source d.f. Sum of Squares Mean Square F-ratio Between k - 1 SSBetween MSBetween MSB /MSW Within N - k SSWithin MSWithin Total N - 1 SSTotal

To Compute F (and the ANOVA table entries): 1) 2) 3) 4) 5)

Then 1) 2) 3) 4)

The c2 test for independence

Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count xij = the number of subjects for which R = I and C = j. R = rows, C = columns

Example Both Systolic Blood pressure (C) and Serum Cholesterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: <126 127-146 147-166 167+ The categories for Cholesterol are: <200 200-219 220-259 260+

Table: two-way frequency

The c2 test for independence Define = Expected frequency in the (i,j) th cell in the case of independence.

Justification - for Eij = (RiCj)/n in the case of independence Let pij = P[R = i, C = j] = P[R = i] P[C = j] = rigj in the case of independence = Expected frequency in the (i,j) th cell in the case of independence.

H0: R and C are independent Then to test H0: R and C are independent against HA: R and C are not independent Use test statistic Eij= Expected frequency in the (i,j) th cell in the case of independence. xij= observed frequency in the (i,j) th cell

Sampling distribution of test statistic when H0 is true - c2 distribution with degrees of freedom n = (r - 1)(c - 1) Critical and Acceptance Region Reject H0 if : Accept H0 if :

Standardized residuals Test statistic degrees of freedom n = (r - 1)(c - 1) = 9 Reject H0 using a = 0.05

Hypothesis testing and Estimation Linear Regression Hypothesis testing and Estimation

Fitting the best straight line to “linear” data The Least Squares Line Fitting the best straight line to “linear” data

The equation for the least squares line Let

Computing Formulae:

Then the slope of the least squares line can be shown to be:

and the intercept of the least squares line can be shown to be:

The residual sum of Squares Computing formula

Estimating s, the standard deviation in the regression model : Computing formula This estimate of s is said to be based on n – 2 degrees of freedom

Sampling distributions of the estimators

The sampling distribution slope of the least squares line : It can be shown that b has a normal distribution with mean and standard deviation

The sampling distribution intercept of the least squares line : It can be shown that a has a normal distribution with mean and standard deviation

Estimating s, the standard deviation in the regression model : Computing formula This estimate of s is said to be based on n – 2 degrees of freedom

(1 – a)100% Confidence Limits for slope b : ta/2 critical value for the t-distribution with n – 2 degrees of freedom

(1 – a)100% Confidence Limits for intercept a : ta/2 critical value for the t-distribution with n – 2 degrees of freedom

Example In this example we are studying building fires in a city and interested in the relationship between: X = the distance of the closest fire hall and the building that puts out the alarm and Y = cost of the damage (1000$) The data was collected on n = 15 fires.

The Data

Scatter Plot

Computations

Computations Continued

Computations Continued

Computations Continued

Least Squares Line y=4.92x+10.28

95% Confidence Limits for slope b : 4.07 to 5.77 t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom

95% Confidence Limits for intercept a : 7.21 to 13.35 t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom