SW388R7 Data Analysis & Computers II Slide 1 Incorporating Nonmetric Data with Dummy Variables The logic of dummy-coding Dummy-coding in SPSS.

Slides:



Advertisements
Similar presentations
General Linear Models The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests.
Advertisements

Computing Transformations
Slide 1 Incorporating Nonmetric Data with Dummy Variables For many of the multivariate techniques we will study, it is assumed that the independent or.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Dummy Coding 1 PSYC 4310/6310 Advanced.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Assumption of normality
Outliers Split-sample Validation
Detecting univariate outliers Detecting multivariate outliers
Chi-square Test of Independence
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Data Management: Quantifying Data & Planning Your Analysis
Multinomial Logistic Regression Basic Relationships
Assumption of Homoscedasticity
CHAPTER 14, QUANTITATIVE DATA ANALYSIS. Chapter Outline  Quantification of Data  Univariate Analysis  Subgroup Comparisons  Bivariate Analysis  Introduction.
Standard Binary Logistic Regression
LEVEL OF MEASUREMENT Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Logistic Regression – Basic Relationships
Logistic Regression – Complete Problems
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Sampling Distribution of the Mean Problem - 1
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Practice Problems Homework Problems Using Scripts.
Hierarchical Binary Logistic Regression
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
U-Tab™ Tutorial - Creating New Variables Overview © 2004 Weeks Computing Services. All Rights Reserved. If the variables included in your U-Tab file are.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Multinomial Logistic Regression Basic Relationships
BY Zachary Hamer. Step one  First you will need to go to your desktop and click on the START button. A box should pop up.
110/10/2015Slide 1 The homework problems on comparing central tendency and variability extend our focus on central tendency and variability to a comparison.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression: Complete Problems Outliers and Influential Cases Split-sample Validation.
The Practice of Social Research Chapter 14 – Quantitative Data Analysis.
SW318 Social Work Statistics Slide 1 Compare Central Tendency & Variability Group comparison of central tendency? Measurement Level? Badly Skewed? MedianMeanMedian.
As shown in Table 1, the groups differed in terms of language skills and the type of job last held. The intake form asked the client to indicate languages.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW318 Social Work Statistics Slide 1 Frequency: Nominal Variable Practice Problem This question asks the frequency of widowed respondents of the survey.
Level of Measurement Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Chi-square Test of Independence
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
Data Lab # 4 June 16, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Calculating Inter-coder Reliability
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.
The frequency distribution
SW388R7 Data Analysis & Computers II Slide 1 Solving Homework Problems in SPSS The data sets Options for variable lists in statistical procedures Options.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Conjoint Analysis. 1. Managers frequently want to know what utility a particular product feature or service feature will have for a consumer. 2. Conjoint.
LEVEL OF MEASUREMENT Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
جامعة الملك سعود – المملكة العربية السعودية
Conjoint Analysis.
Presentation transcript:

SW388R7 Data Analysis & Computers II Slide 1 Incorporating Nonmetric Data with Dummy Variables The logic of dummy-coding Dummy-coding in SPSS

SW388R7 Data Analysis & Computers II Slide 2 Dummy-coding variables  For many of the multivariate techniques we will study, it is assumed that the independent or dependent variables in the analysis are metric variables. If we have a nonmetric, or categorical, variable we can incorporate it into our analysis by converting the categorical variable to a set of dichotomous, dummy-coded variables.  A dichotomous variable arguably satisfies the interval level of measurement. On some construct, one of the categories represents more or less of the construct, so the definition of ordinal data is satisfied. Moreover, since there are only two categories, the unit of measure between them must be equal for all categories, satisfying the definition of interval data.

SW388R7 Data Analysis & Computers II Slide 3 Selecting a reference group  To dummy-code a variable, we first identify one category or subgroup of the nonmetric variable as the reference or comparison group.  The effects which we identify in our analysis will be differences from the reference or comparison group.  For example, suppose that I were contrasting salaries of men and women in some group of employees and I was interested in how women's salaries differed from men's salaries. Assuming there is a "gender" variable in my data set coded either "male" or "female," I would select "male" as the reference category on the nonmetric "gender" variable.

SW388R7 Data Analysis & Computers II Slide 4 A two-category example - 1  After we have identified the reference category, we create a new variable for each of the remaining categories or subgroups of the nonmetric variable.  Thus, a nonmetric variable will be represented in the analysis by a number of new dichotomous variables equal to one less than the number of categories in the original nonmetric variable.  In the salary example given above, with "male" selected as my reference group, the remaining group on the nonmetric gender variable is "female," so I create a new variable called "women." (I usually would name it "female" but I don't want to have to have two entities with the same name in this example.)

SW388R7 Data Analysis & Computers II Slide 5 A two-category example - 2  Finally, I code the new variable with one of two dichotomous values, usually 1 and 0.  The new variable is assigned a 1 if the original variable indicated membership in the category represented by the new variable.  If the subject was not a member of the category designated by the new variable, the new variable is coded 0 for that subject.  In the example above, if a subject was in the "female" group of the "gender" variable, her code for the new "women" variable is 1. If a subject was not in the "female" group of the "gender" variable, his code for the new "women" variable is 0.

SW388R7 Data Analysis & Computers II Slide 6 A three-category example - 1  If the original nonmetric variable had three or more categories, we would create two or more new variables and code them with the same scheme.  Suppose for example, that we have a variable for political identification, named "partyid" which contains three values for "Republican," "Democrat," and "Independent." I select "Independent" as my reference category because I am interested in the effect of being a Republican or a Democrat.  Dummy-coding requires that I create and code two new variables, one for "Republican" which I will name "Repub" and one for "Democrat" which I will name "Demo."

SW388R7 Data Analysis & Computers II Slide 7 A three-category example - 2  Each subject in the data set will be assigned a value for both the new variables, "Repub" and "Demo," using the following scheme:  If a subject is a "Republican" on the original "partyid" variable, they are assigned a value of 1 for the new "Repub" variable and a value of 0 for the new "Demo" variable.  If a subject is a "Democrat" on the original "partyid" variable, they are assigned a value of 0 for the new "Repub" variable and a value of 1 for the new "Demo" variable.  If a subject is an "Independent" on the original "partyid" variable, they are assigned a value of 0 for the new "Repub" variable and a value of 0 for the new "Demo" variable, because they are not Republican and they are not Democrat.

SW388R7 Data Analysis & Computers II Slide 8 Example in SPSS  In GSS2000R, the variable marital contains five categories: married, widowed, divorce, separated, and never married.  Assuming my research question dealt with marital experiences, the never married category is selected as the reference category.  We will create four other variables to represent each of the other marital experiences, with each variable representing one experience. The variables will be married, widowed, divorced, and separatd (using the 8 allowable characters for SPSS variable names).

SW388R7 Data Analysis & Computers II Slide 9 Coding scheme for new variables Original Variable Coding Coding for New Variables marriedwidoweddivorcedseparatd 1 = married = widowed = divorced = separated = never married0000 The coding scheme for the new variables in shown in the table below.

SW388R7 Data Analysis & Computers II Slide 10 Using Recoding in SPSS to Create New Variables Select the Recode | Into Different Variables command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 11 Creating the married variable First, select the variable to be dummy-coded, marital, from the list of variables and move it to the Numeric Variable -> Output Variable list box. Second, type in the name for the new variable and click on the Change button to replace the ? with this new variable name.

SW388R7 Data Analysis & Computers II Slide 12 Assigning values to new variable Next, click on the Old and New Values button to assign values to the new variable.

SW388R7 Data Analysis & Computers II Slide 13 Preserving missing values Third, click on the Add button to include this recoding for the variable First, mark the System- or user-missing option button on the Old Value panel. Second, mark the System-missing option button on the New Value panel. If we forget to explicitly assign missing values, cases with missing data will be recoded with a 0 and become part of the reference group.

SW388R7 Data Analysis & Computers II Slide 14 Coding the married category Third, click on the Add button to include this recoding for the variable First, to recode the 1 = married category to the dummy variable, mark the Value option button and type a 1 in the text box on the Old Value panel. Second, mark the Value option button and type a 1 in the text box on the New Value panel. This coding says: if they were originally in the married category for marital, they are assigned a value of 1 for the married dummy variable.

SW388R7 Data Analysis & Computers II Slide 15 Coding the other categories Third, click on the Add button to include this recoding for the variable First, to identify subjects in the categories other than married, mark the All other values option button on the Old Value panel. Second, mark the Value option button and type a 0 in the text box on the New Value panel. This coding says: if they were originally NOT in the married category for marital, they are assigned a value of 0 for the married dummy variable.

SW388R7 Data Analysis & Computers II Slide 16 Completing the re-coding When we have completed the coding for the new variable, click on the Continue button.

SW388R7 Data Analysis & Computers II Slide 17 Completing the married variable Click on the OK button to create the new variable in the data editor.

SW388R7 Data Analysis & Computers II Slide 18 Variable and coding for widowed variable Following the same steps, we create the dummy variable for subjects who were 2 = widowed on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 2 = widowed is translated into a 1 on the new variable.

SW388R7 Data Analysis & Computers II Slide 19 Variable and coding for divorced variable Following the same steps, we create the dummy variable for subjects who were 3 = divorced on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 3 = divorced is translated into a 1 on the new variable.

SW388R7 Data Analysis & Computers II Slide 20 Variable and coding for separated variable Following the same steps, we create the dummy variable for subjects who were 4 = separated on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 4 = separated is translated into a 1 on the new variable.

SW388R7 Data Analysis & Computers II Slide 21 Dummy-coded variables for married subjects Subjects with a code value of 1 = married on the original marital variable now have a 1 for married and a 0 for the other new variables.

SW388R7 Data Analysis & Computers II Slide 22 Dummy-coded variables for widowed subjects Subjects with a code value of 2 = widowed on the original marital variable now have a 1 for widowed and a 0 for the other new variables.

SW388R7 Data Analysis & Computers II Slide 23 Dummy-coded variables for divorced subjects Subjects with a code value of 3 = divorced on the original marital variable now have a 1 for divorced and a 0 for the other new variables.

SW388R7 Data Analysis & Computers II Slide 24 Dummy-coded variables for never married subjects Subjects with a code value of 5 = never married on the original marital variable now have a 0 for all new variables. This was the reference category.