Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11.

Slides:



Advertisements
Similar presentations
LEARNING PROGRAMME Hypothesis testing Part 2: Categorical variables Intermediate Training in Quantitative Analysis Bangkok November 2007.
Advertisements

Simple Logistic Regression
Bivariate Analysis Cross-tabulation and chi-square.
GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 7 SPSS: Recode and Compute.
Exploring Marketing Research William G. Zikmund Chapter 20: Basic Data Analysis.
Table manners GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 10.
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Types of question and types of variable Training session 4 GAP Toolkit 5 Training in basic drug abuse data management and analysis.
QM Spring 2002 Business Statistics SPSS: A Summary & Review.
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
Session 7.1 Bivariate Data Analysis
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics
CHAPTER 2 Basic Descriptive Statistics: Percentages, Ratios and rates, Tables, Charts and Graphs.
Two-Way Frequency Tables
MR2300: MARKETING RESEARCH PAUL TILLEY Unit 10: Basic Data Analysis.
CHAPTER 14, QUANTITATIVE DATA ANALYSIS. Chapter Outline  Quantification of Data  Univariate Analysis  Subgroup Comparisons  Bivariate Analysis  Introduction.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
AS 737 Categorical Data Analysis For Multivariate
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Exploring Marketing Research William G. Zikmund Chapter 20: Basic Data Analysis.
Coding closed questions Training session 5 GAP Toolkit 5 Training in basic drug abuse data management and analysis.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Bivariate Data Learn to set up bivariate data in tables and calculate relative frequencies.
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Chapters 1 and 2 Week 1, Monday. Chapter 1: Stats Starts Here What is Statistics? “Statistics is a way of reasoning, along with a collection of tools.
Descriptive statistics I Distributions, summary statistics.
Data cleaning GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 12.
HW#8: Chapter 2.5 page Complete three questions on the last two slides.
Relations and Categorical Data Target Goal: I can describe relationships among categorical data using two way tables. 1.1 cont. Hw: pg 24: 20, 21, 23,
As shown in Table 1, the groups differed in terms of language skills and the type of job last held. The intake form asked the client to indicate languages.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
EPID Introduction to Analysis and Interpretation of HIV/STD Data LECTURE 1: Examining Your Data and Steps in Data Analysis Manya Magnus, Ph.D. Summer.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Business Research Methods William G. Zikmund
Perform Descriptive Statistics Section 6. Descriptive Statistics Descriptive statistics describe the status of variables. How you describe the status.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Aim: How do we analyze data with a two-way table?
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Data Lab # 4 June 16, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
Elementary Analysis Richard LeGates URBS 492. Univariate Analysis Distributions –SPSS Command Statistics | Summarize | Frequencies Presents label, total.
DESCRIPTIVE STATISTICS. Nothing new!! You are already using it!!
Chapter 2. **The frequency distribution is a table which displays how many people fall into each category of a variable such as age, income level, or.
RESEARCH METHODS Lecture 32. The parts of the table 1. Give each table a number. 2. Give each table a title. 3. Label the row and column variables, and.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Statistics 200 Lecture #7 Tuesday, September 13, 2016
Hypothesis Testing Review
Statistical Analysis of Categorical Variables
Methods Chapter Format Sources of Data Measurements
CHAPTER 6: Two-Way Tables
Warmup Which part- time jobs employed 10 or more of the students?
Week 3 Lecture Notes PSYC2021: Winter 2019.
Chapter 1: Exploring Data
Statistical Analysis of Categorical Variables
CMNS November 2011.
Chapter 1: Exploring Data
CLASS 6 CLASS 7 Tutorial 2 (EXCEL version)
Chapter 1: Exploring Data
Presentation transcript:

Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Objectives To introduce cross-tabulation as a method of investigating the relationship between two categorical variables To describe the SPSS facilities for cross-tabulation To discuss a range of simple statistics to describe the relationship between two categorical variables To reinforce the range of SPSS skills learnt to date

Bivariate analysis The relationship between two variables A two-way table: –Rows: categories of one variable –Columns: categories of the second variable

FrequencyPercentValid PercentCumulative Percent ValidMale Female Total MissingSystem6.4 Total Gender

FrequencyPercentValid PercentCumulative Percent ValidSwallow Smoke Snort Inject Total MissingSystem13.8 Total Mode of ingestion Drug 1 Out-of-range values (note that none of the digits are > 5)

Cleaning Mode1 Save a copy of the original Recode the out-of-range values into a new value (for example,12, 15, 23, 24,25, 34, 234 into the value 8) Set the new value as a user-defined missing value (for example, 8 is declared a missing value and given the label “Out-of-range”).

FrequencyPercentValid PercentCumulative Percent ValidSwallow Smoke Snort Inject Total MissingOut-of-range382.4 System13.8 Total513.2 Total Mode of ingestion Drug 1

Gender MaleFemaleTotal Swallow Smoke Snort Inject Total Mode of ingestion Drug1 Row totals Joint frequencies Grand total Count Mode of ingestion Drug1 * Gender cross-tabulation Column totals

Percentages The difference in sample size for men and women makes comparison of raw numbers difficult Percentages facilitate comparison by standardizing the scale There are three options for the denominator of the percentage: –Grand total –Row total –Column total

Gender MaleFemaleTotal SwallowCount % of Total39.6%12.8%52.4% SmokeCount % of Total36.5%5.1%41.6% SnortCount % of Total2.9%1.1%4.0% InjectCount % of Total1.3%.7%2.0% TotalCount % of Total80.3%19.7%100.0% Mode of ingestion Drug1 Marginal distribution Mode1 Joint distribution Mode1 & Gender Mode of ingestion Drug1 * Gender cross-tabulation Marginal distribution Gender

Mode of ingestion Drug1 * Gender cross-tabulation Gender MaleFemaleTotal SwallowCount % within Mode of ingestion Drug1 75.6%24.4%100.0% SmokeCount % within Mode of ingestion Drug1 87.8%12.2%100.0% SnortCount % within Mode of ingestion Drug1 72.1%27.9%100.0% InjectCount % within Mode of ingestion Drug1 66.7%33.3%100.0% TotalCount % within Mode of ingestion Drug1 80.3%19.7%100.0% The distribution of Gender conditional on Mode1 Mode of ingestion Drug1

Mode of ingestion Drug1 * Gender cross-tabulation Gender MaleFemaleTotal SwallowCount % within Gender49.3%65.1%52.4% SmokeCount % within Gender45.4%25.8%41.6% SnortCount % within Gender3.6%5.7%4.0% InjectCount % within Gender1.6%3.4%2.0% TotalCount % within Gender100.0% Mode of ingestion Drug1 The distribution of Mode1 conditional on Gender

Choosing percentages “Construct the proportions so that they sum to one within the categories of the explanatory variable.” Source: (C. Marsh, Exploring Data: An Introduction to Data Analysis for Social Scientists (Cambridge, Polity Press, 1988), p )

Dimensions Definitions of vertical and horizontal variables

Two-by-two tables Tables with two rows and two columns A range of simple descriptive statistics can be applied to two-by-two tables It is possible to collapse larger tables to these dimensions

Gender * White pipe cross-tabulation White pipe YesNoTotal MaleCount % within Gender23.2%76.8%100.0% FemaleCount % within Gender7.0%93.0%100.0% TotalCount % within Gender19.9%80.1%100.0% Gender

White pipe YesNo GenderMale Female

Relative risk Divide the probabilities for “success”: –For example: P(Whitpipe=Yes|Gender=Male)= P(Whitpipe=Yes|Gender=Female)= Relative risk is /0.0701=3.309 The proportion of males using white pipe was over three times greater than females

Odds The odds of “success” are the ratio of the probability of “success” to the probability of “failure” For example: - For males the odds of “success” are /0.7682= For females the odds of “success” are /0.9299=0.075

Odds ratio Divide the odds of success for males by the odds of success for females For example: 0.302/0.075=4.005 The odds of taking white pipe as a male are four times those for a female

95% Confidence interval ValueLowerUpper Odds ratio for Gender (Male / Female) For cohort white pipe = Yes For cohort white pipe = No N of valid cases1565 Risk estimate Relative risk of “success” Relative risk of “failure” Odds ratio M/F

Exercise 1: cross-tabulations Create and comment on the following cross-tabulations: –Age vs Gender –Race vs Gender –Education vs Gender –Primary drugs vs Mode of ingestion Suggest other cross-tabulations that would be useful

Exercise 2: cross-tabulation Construct a dichotomous variable for age: Up to 24 years and Above 24 years Construct a dichotomous variable for the primary drug of use: Alcohol and Not Alcohol Create a cross-tabulation of the two new variables and interpret Generate Relative Risks and Odds Ratios and interpret

Summary Cross-tabulations Joint frequencies Marginal frequencies Row/Column/Total percentages Relative risk Odds Odds ratios