Summarizing One or Two Categorical Variables & Relationships Between Categorical Variables Presentation 2.

Slides:



Advertisements
Similar presentations
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: The Chi-Square Test
Chapter 13: Inference for Distributions of Categorical Data
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Presentation 12 Chi-Square test.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Aug. 21, 2012 Chapter 1 Sections 1 & 2. What is statistics? Conducting studies to collect, organize, summarize, analyze and draw conclusions from data.
Chapter 11: Inference for Distributions of Categorical Data.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Exploring Data Section 1.1 Analyzing Categorical Data.
Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.
CHAPTER 11 SECTION 2 Inference for Relationships.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Analysis of Two-Way tables Ch 9
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 17 l Chi-Squared Analysis: Testing for Patterns in Qualitative Data.
+ Chi Square Test Homogeneity or Independence( Association)
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Warm up On slide.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
AP Statistics Chapter 13 Section 1. 2 kinds of Chi – Squared tests 1.Chi-square goodness of fit – extends inference on proportions to more than 2 proportions.
Introductory Statistics. Test of Independence Review Hypothesis Testing Checking Requirements & Descriptive Statistics.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Chapter 12 Lesson 12.2b Comparing Two Populations or Treatments 12.2: Test for Homogeneity and Independence in a Two-way Table.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Presentation 12 Chi-Square test.
Lecture #28 Thursday, December 1, 2016 Textbook: 16.1
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Analyzing the Association Between Categorical Variables
UNIT V CHISQUARE DISTRIBUTION
Presentation transcript:

Summarizing One or Two Categorical Variables & Relationships Between Categorical Variables Presentation 2

Types of Variables Categorical – Possible values define group or categories, not necessarily in an apparent ordering Categorical – Possible values define group or categories, not necessarily in an apparent ordering Ex.Color of M&M’s Gender Stat 200 Section Ordinal – Categorical variable where values or categories have a natural ordering Ordinal – Categorical variable where values or categories have a natural ordering Ex.Rate the roller coaster on a scale of 1-5 (1 is terrible and 5 is excellent) Age groups (child, teen, adult, senior citizen) Shirt sizes (S, M, L, XL) Quantitative – Measurements or counts, recorded as numerical values Quantitative – Measurements or counts, recorded as numerical values Ex.Height Temperature # of Red M&M’s

Possible Roles Played by Variables: Response Variables – are the variables of which we want to determine the outcome. These are the variables of main interest. Response Variables – are the variables of which we want to determine the outcome. These are the variables of main interest. Explanatory Variables – are partially explain the value of the response variable for the individual. Explanatory Variables – are partially explain the value of the response variable for the individual.

For each of the following identify the response and the explanatory variables as well as the variable type: 1. Is there a relationship between a person’s gender and their favorite kind of music? Response:Explanatory: Response:Explanatory: 2. Do men and women listen to the same number of hours of music? Response:Explanatory: Response:Explanatory: 3. Does a person’s hometown influence the amount they would pay for a single CD? Response:Explanatory: Response:Explanatory: 4. Do people who play musical instruments rate the types of music the same? Response:Explanatory: Response:Explanatory: 5. Do people who have a CD burner prefer to buy or burn their CDs? Response:Explanatory: Response:Explanatory:

Summarizing Categorical Variables: For one variable: For one variable: 1. Numerical Summaries: counts and percents 2. Graphical Summaries: Pie Chart or Bar Graph For two variables: For two variables: 1. Numerical Summaries: 2-way tables with counts and row percents. The explanatory variable should be the row variable (first variable entered in Minitab) and the response variable should be the column variable (second variable entered in Minitab). 2. Graphical Summaries: Bar Graph

Example for One Categorical Variable: Where do Penn State alumni live? The PSU Alumni Association would like to obtain the answer to this question from all PSU alumni. They can’t ask all alumni so they take a random sample of 50 alumni from the directory. They determined the state of residence from the address. Here are the results: Where do Penn State alumni live? The PSU Alumni Association would like to obtain the answer to this question from all PSU alumni. They can’t ask all alumni so they take a random sample of 50 alumni from the directory. They determined the state of residence from the address. Here are the results: StateFrequency PA25 NJ10 MD5 VA5 OH2 NY2 OTHER1 TOTALn=50 What do these descriptive statistics tell us?

Example for Two Categorical Variables: Do most college students have a credit card? A study would like to determine if the percentage of students that have at least one credit card differs based on year in school. Four different samples (Fr, So, Jr, Sr) each having 100 PSU students, were obtained. Each student was asked one question, “Do you currently have at least one credit card?” Do most college students have a credit card? A study would like to determine if the percentage of students that have at least one credit card differs based on year in school. Four different samples (Fr, So, Jr, Sr) each having 100 PSU students, were obtained. Each student was asked one question, “Do you currently have at least one credit card?” Identify the response and the explanatory variable in this case: Identify the response and the explanatory variable in this case: Response: Explanatory: Response: Explanatory: What do these descriptive statistics tell us? YesNoRow total Freshman Sophomore Junior Senior Column Total

Assessing the Statistical Significance of the Relationship between two Categorical Variables. Suppose we ask 15 randomly picked students 2 questions: 1. Do you smoke? 2. Did you have a beer last night? We summarize the results using the Cross Tabulation function in Minitab : Tabulated Statistics: smoke, beer Rows: smoke Columns: beer Rows: smoke Columns: beer n y All n y All n n y y All All Cell Contents -- Count Count % of Row % of Row

Inference about the Population! How can we tell if there’s a relationship between being a smoker and drinking beer last night? How can we tell if there’s a relationship between being a smoker and drinking beer last night? Does the relationship presented in sample data hold in the population presented by this sample? Does the relationship presented in sample data hold in the population presented by this sample? Techniques used to make generalizations about the population using a sample are known as inferential statistics. Techniques used to make generalizations about the population using a sample are known as inferential statistics. A statistically significant relationship is one that is large enough to be unlikely to have occurred in the observed sample if there is no relationship in the population. A statistically significant relationship is one that is large enough to be unlikely to have occurred in the observed sample if there is no relationship in the population.

Null and Alternative Hypotheses Another way to express our objective is that we are deciding between two possible hypotheses about the population: Another way to express our objective is that we are deciding between two possible hypotheses about the population: Null Hypothesis: The two variables are not related. Alternative Hypothesis: The two variables are related. In our example we have: In our example we have: Null Hypothesis : Being a smoker and drinking beer last night are not related. Alternative Hypothesis : Being a smoker and drinking beer last night are related.

Chi-square Statistic We usually use Chi-square Statistic to handle this type of questions. We usually use Chi-square Statistic to handle this type of questions. Chi-square Statistic measures the statistical significance of the association between 2 categorical variables. A large Chi-square Statistic indicates there is a statistically significant relationship between the 2 variables. Chi-square Statistic measures the statistical significance of the association between 2 categorical variables. A large Chi-square Statistic indicates there is a statistically significant relationship between the 2 variables. How Chi-square Statistic works? It measures the difference between the observed counts and the counts that would be expected if there were no relationship (under the null hypothesis). How Chi-square Statistic works? It measures the difference between the observed counts and the counts that would be expected if there were no relationship (under the null hypothesis).

Chi-Square Statistic and p-value A large Chi-square Statistic indicates there is a statistically significant relationship between the 2 variables. However, how large is large? A large Chi-square Statistic indicates there is a statistically significant relationship between the 2 variables. However, how large is large? This is why we need to use “p-value” as an indicator to tell us if the Chi-square Statistic is “large enough”. This is why we need to use “p-value” as an indicator to tell us if the Chi-square Statistic is “large enough”. We can obtain the p-value in our Minitab output. We can obtain the p-value in our Minitab output. How to use the p-value? How to use the p-value? 1. The bigger the Chi-square Statistic is, the smaller the p- value will be. 2. Generally, when the p-value is less than 0.05 (5%), we will assume that the observed relationship did not occur by chance, and it is statistically significant. 3. Generally, when the p-value larger than 0.05 (5%), we will say the observed relationship could have occurred just by chance. Therefore, we can not reject the null hypothesis that there is no relationship.  Example: Part 3 of the activity….