Introduction to Categorical Data Analysis

Slides:



Advertisements
Similar presentations
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Advertisements

Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved Hypothesis Testing Using a Single Sample.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Probability Distributions
Final Review Session.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
BS704 Class 7 Hypothesis Testing Procedures
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Generalized Linear Models
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Presentation 12 Chi-Square test.
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Confidence Intervals and Hypothesis Testing - II
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Fundamentals of Hypothesis Testing: One-Sample Tests
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Stat 1080 “Elementary Probability and Statistics” By Dr. AFRAH BOSSLY
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Ch.4 DISCRETE PROBABILITY DISTRIBUTION Prepared by: M.S Nurzaman, S.E, MIDEc. ( deden )‏
Chapter 13: Categorical Data Analysis Statistics.
CHAPTER Discrete Models  G eneral distributions  C lassical: Binomial, Poisson, etc Continuous Models  G eneral distributions 
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 4.
Chapter Eight: Using Statistics to Answer Questions.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
Chapter 3 Discrete Random Variables and Probability Distributions  Random Variables.2 - Probability Distributions for Discrete Random Variables.3.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
BINARY LOGISTIC REGRESSION
Presentation 12 Chi-Square test.
Chapter 9: Inferences Involving One Population
Chapter 12 Tests with Qualitative Data
Basic Statistics Overview
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Chapter 3 Discrete Random Variables and Probability Distributions
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Discrete random variable X Examples: shoe size, dosage (mg), # cells,…
Chapter 10 Analyzing the Association Between Categorical Variables
Overview and Chi-Square
Inference on Categorical Data
Introductory Statistics
Presentation transcript:

Introduction to Categorical Data Analysis KENNESAW STATE UNIVERSITY STAT 8310

Introduction The ‘General Linear Model’ (AKA as Normal Theory Methods) Linear Regression Analysis The Analysis of Variance These methods are appropriate for analyzing data with: A quantitative (or continuous) response variable Quantitative and/or categorical explanatory variables

Example of a Typical Regression EXAMPLE: Predicting the Blood Pressure (measured in mmHg) from Cholesterol level (measured in mg/dL) & smoking status (smoker, non-smoker) mmHg = millimeters of mercury mg/dL = milligrams of cholesterol per deciliter

Introduction Categorical Data Analysis (CDA) involves the analysis of data with a categorical response variable. Explanatory variables can be either categorical or quantitative.

Example of CDA EXAMPLE: Predicting the presence of heart disease (yes, no) from Cholesterol level (measured in mg/dL) & smoking status (smoker, non-smoker)

Quantitative Variables A quantitative variable measures the quantity or magnitude of a characteristic or trait possessed by an experimental unit. has well defined units of measurement. often answer the question, ‘how much?’. Sometimes referred to as a continuous variable.

Quantitative Variables What are some examples of quantitative explanatory variables? What are some examples of quantitative response variables?

Categorical Variables A categorical variable has a measurement scale consisting of a set of categories places or identifies experimental units as belonging to a particular group or category Sometimes referred to as a qualitative or discrete variable.

Categorical Variables What are some examples of categorical explanatory variables? What are some examples of categorical response variables?

Types of Categorical Variables Dichotomous (AKA Binary) Categorical variables with only 2 possible outcomes EXAMPLE: Smoker (yes, no) Polychotomous or Polytomous Categorical variables with more than 2 possible outcomes EXAMPLE: Race (Caucasian, African American, Hispanic, Other)

Another Dimension of Polytomous Categorical Variables Nominal Are those that merely place experimental units into unordered groups or categories. EXAMPLE: Favorite Music (classical, rock, jazz, opera, folk)

Another Dimension of Polytomous Categorical Variables Ordinal Categorical variables whose values exhibit a natural ordering. EXAMPLE: Prognosis (poor, fair, good, excellent)

Types of Variables

Summarizing Categorical Variables Often times in CDA, it is possible to fully analyze data using a summarization of the data (the raw data is many times not necessary!). Therefore, in CDA we make the distinction between raw data and grouped data.

Summarizing Categorical Variables A natural way to summarize categorical variables is raw counts or frequencies. A frequency table summarizes the raw counts of 1 categorical variable. A contingency table summarizes the raw counts of 2 or more categorical variables.

Summarizing Categorical Variables Along with frequencies, we also often summarize categorical variables with: Proportions Percentages

Summarizing Categorical Variables Example of some raw data: What kind of variable is Final Exam Grade?

Summarizing Categorical Variables Example of a frequency table for these data is:

Summarizing Categorical Variables 2 Example of some raw data:

Summarizing Categorical Variables 2 Example of a contingency table for these data is:

Summarizing Categorical Variables 2 Traditionally, when summarizing explanatory & response variables in a contingency table, the explanatory variables are expressed in rows, and the response variables in columns.

Summarizing Categorical Variables Graphical means for summarizing categorical variables include pie charts and bar charts.

Probability Distributions In typical linear regression, we assume that the response variable is normally distributed and therefore use the normal distribution during hypothesis testing.

Probability Distributions In CDA, we use: The Binomial Distribution For dichotomous variables The Multinomial Distribution For polytomous variables The Poisson Distribution

The Binomial Distribution Appropriate when there are: n independent and identical trials 2 possible outcomes (generically named “success” & “failure”)

The Binomial PMF PMF = Probability Mass Function Gives the probability of outcome y for Y Y ~ Bin(n, π)

A Review of Combinations and Factorials nCy The Binomial Coefficient – counts the total number of ways one could obtain y successes in n trials.

A Review of Combinations and Factorials Factorials – n! is the product of all positive integers less than or equal to n. 0! = 1 1! = 1 Example: 4! = 4 x 3 x 2 x 1 = 24

Example Problem A coin is tossed 10 times. Let Y = the number of heads. Use statistical notation to specify the distribution of Y. Find the mean [E(Y)] and standard deviation of Y [σ(Y)] What is the P(Y = 8)?

The Multinomial Distribution Used for modeling the distribution of polytomous variables

Example Problem Researchers categorize the outcomes from a particular cancer treatment into 3 groups (no effect, improvement, remission). Suppose (π1, π2, π3) = (.20, .70, .10). Show all possible outcomes if n = 2. Find the multinomial probability that (n1, n2, n3) = (2,6,1).

Overview of CDA Methods Contingency Table Analysis Logistic Regression (AKA Logit Models) Multicategory Logit Models Loglinear Models

Contingency Table Analysis The historical method for analyzing CD Involves constructing a n-way contingency table (where n = the number of categorical variables)

Contingency Table Analysis We use contingency table analysis for the following: Identify the presence of an association The hypothesis test of independence Measure or gauge the strength of an association

Logistic Regression (AKA Logit Models) We use Logit Models to: Analyze data with a dichotomous response variable A single or multiple categorical and/or continuous explanatory variables

Multicategory Logit Models We use Multicategory Logit Models to: Analyze data with a polytomous response variable A single or multiple categorical and/or continuous explanatory variables

Loglinear Models We use Loglinear Models to analyze data: with a polytomous response variable OR with multiple response variables where the distinction between explanatory and response variable is not clear & 1 or more of those variables is polytomous Often associated with the analysis of count data

Review of 1 Proportion Hypothesis Tests MOTIVATING EXAMPLE: National data in the 1960s showed that about 44% of the adult population had never smoked cigarettes. In 1995, a national health survey interviewed a random sample of 881 adults and found that 414 had never been smokers. Has the percentage of adults who never smoked increased?

Review of 1 Proportion Hypothesis Tests STEPS: Gather information Check assumptions Compute Tn & obtain p-value Make conclusions

Review of 1 Proportion Hypothesis Tests ANSWER: There is sufficient statistical evidence to reject the null hypothesis and conclude that the proportion of adults who have never smoked has increased; z = 1.789, p = .036.

Review of Confidence Intervals for Proportions MOTIVATING EXAMPLE: Construct a 99% Confidence Interval for the true population of adult non-smokers based on this sample data.

Review of Confidence Intervals for Proportions ANSWER: We are 99% confident that the interval from .427 to .513 contains the true proportion of adults who have never smoked.

Review of Confidence Intervals for Proportions ANSWER: We are 99% confident that the interval from .427 to .513 contains the true proportion of adults who have never smoked.

Class Activity 1 Go to the course website at: http://www.science.kennesaw.edu/~dyanosky/stat8310.html Navigate to the ‘Class Activities’ Page. Complete CA.1

Solutions to Class Activity 1 (#1) We reject the null hypothesis at the α = .05 level and conclude that percent of non-compliant vehicles has increased; z = 2.38, p = .009. We are 90% confident that the interval from .147 to .235 contains the true proportion of non-compliant vehicles.

Solutions to Class Activity 1 (#2) We fail to reject the null hypothesis at the α = .01 level. There is insufficient evidence to conclude that the population proportion of smokers has changed; z = -1.78, p = .075. We are 95% confident that the interval from .497 to .563 contains the true proportion of adults who currently smoke.