INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance.

Slides:



Advertisements
Similar presentations
Chapter 4: Basic Estimation Techniques
Advertisements

Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Correlation and regression Dr. Ghada Abo-Zaid
Correlation and Regression
Review for the chapter 6 test 6. 1 Scatter plots & Correlation 6
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
CHAPTER 11: CHI-SQUARE TESTS.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Hypothesis Testing in Linear Regression Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
Chapter 10 Correlation and Regression
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
Other Chi-Square Tests
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Business Statistics for Managerial Decision Making
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chi-Square Tests.  Two way classification table – presents information on more than one variable for each element  Example: college students broken.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Other Chi-Square Tests
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
CHAPTER 11 CHI-SQUARE TESTS
Basic Estimation Techniques
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Basic Estimation Techniques
M248: Analyzing data Block D.
CHAPTER 26: Inference for Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
CHAPTER 11 CHI-SQUARE TESTS
SIMPLE LINEAR REGRESSION
Topic 8 Correlation and Regression Analysis
Section 11-1 Review and Preview
Introduction to Regression
Presentation transcript:

INDEPENDENT VARIABLES AND CHI SQUARE

Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance of one does not affect the probability of the occurence of the other. Formally, X and Y are independent if P (X | Y) = P (X) or P (Y | X) = P (Y) What does it mean?

Independent versus Dependent Variables y 1 … y k … y q y 1 … y k … y q x1x1……xhxh……XpXpx1x1……xhxh……XpXp…… n n 1k … n 1q n n 1k … n 1q n h1 … n hk … n hq n h1 … n hk … n hq n p1 … n pk … n pq n p1 … n pk … n pq n 10 n h0 n p0 n 01 n 0k n pq n Consider the following contingency table X Y We say that X is independent from Y if

Independent versus Dependent Variables: example 1 The following table gives a contingency table of an observed population (in million) based on gender (X) and healt insurance coverage (Y). Are the two variables independent? That is the health insurance coverage depends on gender? Covered by healt insurance Not Covered by healt insurance Total Male Female Total

Independent versus Dependent Variables Covered by healt insurance Not Covered by healt insurance Total Male Female Total111 We have to verify 2. YES X Y

Independent versus Dependent Variables Covered by healt insurance Not Covered by healt insurance Total Male Female Total We have to verify 1. YES X Y

Independent versus Dependent Variables: example 2 Consider the example of the 420 employees. Are the variable Smoke (X) independent from the variable College Graduate (Y)? College Graduate Not a College Graduate Total Smoker Nonsmoker Total

Independent versus Dependent Variables: example 2 College Graduate Not a College Graduate Total Smoker Nonsmoker Total We have to verify No independence!!

Independent versus Dependent Variables Two variables are maximally dependent if the contingency table is y 1 … y k … y q y 1 … y k … y q x1x1……xhxh……xpxpx1x1……xhxh……xpxp…… n … 0 0 … 0 … n hq 0 … 0 … n hq 0 … n pk … 0 0 … n pk … 0 n 11 n hq n pk n 11 … n pk … n hq n 11 … n pk … n hqn There is a one-to-one relation between the categories of the two variables

Chi square How caw we measure the “degree” of dependence between two variables? independent Remind that two variables are independent if From these relations we get: n hk * theoretical expected frequencyE n hk * is called theoretical or expected frequency ( E ) since it expresses the frequency of the category h of X and k of Y in condition of independence.

Chi square The observed frequencies n ik are indicated with (O). If the observed frequencies (O) are equal to the expected frequencies (E ) the variables are independent. Chi square. We can build an indicator of independence/dependence between the two variables called Chi square. The formula is It is evident the if Chi square is equal to 0 (O=E ) the two variables are independent.

Chi square: example 1 gender opinion Violence and lack of discipline have become major problems in schools in the United States. A random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. The two-way classification of the responses of these adults is represented in the following table. Are the two variables gender and opinion independent? In Favor (F) Against (A) No Opinions (N) Men (M) Women (W)

Chi square: example 1 In Favor (F) Against (A) No Opinion (N) Row Total s Men (M) Women (W) Column Totals In order to compute the chi square we have to compute the expected frequencies as follows:

Chi square: example 1 In Favor (F) Against (A) No Opinion (N) Row Totals Men (M) O 93 (O ) E (105.00) (E ) 70 (59.50) 12 (10.50) 175 Women (W) 87 (75.00) 32 (42.50) 6 (7.50) 125 Column Totals For example

Chi square: example 1 The value of the chi square is different from 0 and hence we should conclude that the two variables are independent.

Chi square: critical value However it can happen that even if the chi square is different from 0, its value is sufficiently small to think that there is independence between the variables of interest. critical value But which value of the chi square can be considered a critical value so that values under this critical value indicate independence and values over this critical value indicate dependence between the two variables? It does not exist a fixed critical value. It is determined time by time depending on the data we are examining by using the methods and the principles of the statistical inference

Chi square: critical value critical value We do not deal with the computation of the critical value. critical value However the critical value is computed from all the Statistical software, included Excel.Rule critical value > chi square independent 1.If the critical value > chi square the two variables can be considered independent critical value < chi square dependent 2.If the critical value < chi square the two variables can be considered dependent in the sense that they influence reciprocally. In the previous example the critical value is It is greater than the value of the chi square (8.252) than we can say that the two variables are independent, that is the opinion of the selected people is not influenced by the gender.

Chi square: example 2  A researcher wanted to study the relationship between gender and owning cell phones. She took a sample of 2000 adults and obtained the information given in the following table. Own Cell PhonesDo Not Own Cell Phones Men Women Looking at the table can we conclude that gender and owning cell phones are related for all adults?

Chi square: example 2 Own Cell Phones (Y) Do Not Own Cell Phones (N) Row Totals Men (M) 640 (588.60) 450 (501.40) 1090 Women (W) 440 (491.40) 470 (418.60) 910 Column Totals We have to compute the expected frequencies

Chi square: example 2 Critical value= The critical value is less than the chi square and hence we can conclude the two variables are dependent, that is owning cell phone depends on gender.

LINEAR REGRESSION LINEAR REGRESSION

LINEAR REGRESSION So far we investigated the relation of independence/dependence between two variables (qualitative or quantitative). However this kind of relation is reciprocal, in the sense that we don’t know if one variable influences the other or vice versa and we don’t know how strong is this relation. Linear regression. If we would like to know if one variable influences the other and how strong this relation is we have to refer to Linear regression. By using the regression analysis we can evaluate the magnitude of change in one variable due to a certain change in another variable and we can predict the value of one variable for a given value of the other variable. (Linear) regression quantitative (Linear) regression is a statistical analysis that evaluates if exists a linear relationship between two quantitative variables, X and Y.

SIMPLE LINEAR REGRESSION Definition simple regression modelindependent dependent. A regression model is a mathematical equation that describes the relationship between two or more variables. A simple regression model includes only two variables: one independent and one dependent. The dependent variable is the one being explained, and the independent variable is the one used to explain the variation in the dependent variable.

Why is it called “regression model” or “regression analysis”? The method was first used to examine the relationship between the heights of fathers and sons. The two were related, of course. But they found that a tall father tended to have sons shorter than himself; a short father tended to have sons taller than himself. The height of sons regressed to the mean. The term "regression" is now used for many sorts of curve fitting. SIMPLE LINEAR REGRESSION linear regression model. A (simple) regression model that gives a straight-line relationship between two variables is called a linear regression model.

LINEAR REGRESSION : example 1 We want to investigate the relation between Incomes (in hundreds of dollars) (X) and Food Expenditures of Seven Households (Y). That is we want to investigate if Income influences Household’s decision about Food Expenditure and how strong is this influence. Income (X)Food Expenditure (Y)

LINEAR REGRESSION : example 1 Scatter plot. We can represent the data with a Scatter plot. A scatter plot is a plot of the values of Y versus the corresponding values of X: Income Food expenditure First household Seventh household

LINEAR REGRESSION : example 1 The scatter plot seems to reveal a linear relationship between the two variables: a linear regression model might be indicated. In the Figure the points (observations) are replaced by a linear model (a) and non linear model (b). Linear Income Nonlinear Income Food Expenditure

LINEAR REGRESSION: the equation How can we write the linear model mathematically? y = a + b x intercept Constant term or y- intercept Slope Dependent variable Independent variable

LINEAR REGRESSION: intercept a ? How can we represent a graphically? The intercept is the Y value of the line when X equals zero. The intercept determines the position of the line on the Y axis. a1a1 a2a2 a3a3 a4a4 Y X 0

LINEAR REGRESSION: slope b ? How can we represent b graphically? The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. If the slope is positive, Y increases as X increases. If the slope is negative, Y decreases as X increases. X Yb>0 X Yb<0 X Y b1b1 b2b2 b2>b1b2>b1b2>b1b2>b1

LINEAR REGRESSION best” Coming back to the example, among all the possible lines that can interpolate the points in the scatter plot which is the “best” ? Income Food expenditure Choosing the best line (or the line that best describes the relation between X and Y) means finding the “best” a and the “best” b