Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.

Slides:



Advertisements
Similar presentations
10. NLTS2 Documentation Overview. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules.
Advertisements

The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Getting to know your variables Jane E. Miller, PhD The Chicago Guide to Writing.
RESEARCH METHODS Lecture 30. DATA TRANSFORMATION.
STAT 250.3: Introduction to Biostatistics Instructor: Efi Antoniou Introduction.
Page 1, CBSE graduate course Component-Based Software Engineering Building reliable component-based systems Tasks for the CBSE group.
Organizing data in tables and charts: Criteria for effective presentation Jane E. Miller, Ph.D. Rutgers University.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Organizing data in tables and charts: Different criteria for different tasks Jane.
Logarithmic specifications Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Paper versus speech versus poster: Different formats for communicating research.
Statistics 1 Course Overview
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating effective tables and charts Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction patterns from logit coefficients: Interaction between two.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Numbers as evidence: Applying expository writing techniques to writing about numbers.
Comparing overall goodness of fit across models
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating the shape of a polynomial from regression coefficients Jane E. Miller,
The Chicago Guide to Writing about Numbers, 2nd Edition. Getting to know your variables Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate.
Types of quantitative comparisons Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
The Chicago Guide to Writing about Numbers, 2 nd edition. Summarizing a pattern involving many numbers: Generalization, example, exception (“GEE”) Jane.
The Chicago Guide to Writing about Numbers, 2nd Edition. Explaining an exhibit live: The “Vanna White technique” for describing tables, charts or other.
Chapter 2 Data.
Writing about ratios Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2nd Edition.
The Chicago Guide to Writing about Numbers, 2 nd edition. Basics of writing about numbers: Reporting one number Jane E. Miller, PhD.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Writing prose to present results of interactions Jane E. Miller, PhD.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Data structure for a discrete-time event history analysis Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Criteria for choosing a reference category Jane E. Miller, PhD.
Chap 1-1 Chapter 1 Introduction and Data Collection Business Statistics.
Choosing tools to present numbers: Tables, charts, and prose Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2nd Edition.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 1 Section 1 – Slide 1 of 20 Chapter 1 Section 1 Introduction to the Practice of Statistics.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Standardized coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing tools to present numbers: Tables, charts, and prose Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Choosing a comparison group Jane E. Miller, PhD.
Creating and Managing Assessments Erin Shelley. Today’s Workshop Overview of Assessments Creating Questions Creating Assessments and Adding Questions.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Comparing two numbers or series of numbers Jane E. Miller, PhD.
Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing statistical significance of differences between coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns between two categorical independent.
Organizing & Reporting Data: An Intro Statistical analysis works with data sets  A collection of data values on some variables recorded on a number cases.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns with continuous independent variables.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 1 Section 1 – Slide 1 of 20 Chapter 1 Section 1 Introduction to the Practice.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Presenting results Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating charts to present interactions Jane E. Miller, PhD.
Approaches to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 1-1 Chapter 1 Introduction and Data Collection Basic Business Statistics 10 th Edition.
Introduction and Data Collection Basic Business Statistics 10 th Edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD.
Copyright © 2009 Pearson Education, Inc. Chapter 2 Data.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction effects from OLS coefficients: Interaction between 1 categorical.
Assignments, Assessments and Grade Book
Overview of categorical by categorical interactions: Part I: Concepts, definitions, and shapes Interactions in regression models occur when the association.
Review of Related Literature
Creating variables and specifying models to test for interactions between two categorical independent variables This lecture is the third in the series.
RESEARCH METHODS Lecture 30
Chapter Fourteen Data Preparation.
Introduction to interactions in regression models: Concepts and equations Jane E. Miller, PhD Interactions in regression models occur when the association.
Overview of categorical by continuous interactions: Part II: Variables, specifications, and calculations Interactions in regression models occur when.
Welcome Reporting: Individual Student Report (ISR), Student Roster Report, and District Summary of Schools Report Welcome to the Reporting: Individual.
Testing whether a multivariate specification can be simplified
Presentation transcript:

Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.

Overview Why researchers sometimes need to create new variables to conduct their analysis Why it is important to plan ahead for how to create those new variables What information is required to identify the new variables needed for the research question How to write clear instructions on how to get from the variables you have to the variables you need The Chicago Guide to Writing about Numbers, 2nd Edition.

Why create new variables? For many statistical analyses, variables available on the original data set are not yet in the form needed to address the research question of interest. Examples: – You want to study total family income, but the data set has separate variables measuring income components such as earned income, government benefits, and alimony. – You want to compare outcomes for age groups (children, working age adults, and the elderly), but the data set reports respondent’s age in single years. The Chicago Guide to Writing about Numbers, 2 nd edition.

Conceptualizing the new variable should precede programming it Important to separate – Researching and planning how those variables should be defined – Programming the new variable in an electronic database Each of those tasks – Has its own challenging aspects – Uses different Skills Resources

Some common patterns of creating new from existing variables A categorical version of a continuous variable A simplified (collapsed) categorical variable A binary indicator from a continuous variable A new continuous variable that combines 2+ continuous variables A mathematical transformation of a continuous variable The Chicago Guide to Writing about Numbers, 2 nd edition.

A categorical version of a continuous variable Original variable – Age in years (continuous) Needed variable – Age group (categorical) The Chicago Guide to Writing about Numbers, 2 nd edition.

A simplified (collapsed) categorical variable Original variable – Ten-category ethnicity variable Needed variable – Three-category ethnicity variable The Chicago Guide to Writing about Numbers, 2 nd edition.

A binary indicator from a continuous variable Original variable – Birth weight in grams (continuous) Needed variable – Indicator of low birth weight status (yes or no) The Chicago Guide to Writing about Numbers, 2 nd edition.

A new continuous variable that aggregates 2+ continuous variables Original variable(s)New variable Separate measures of income for each family member Total family income Multiple attitudinal itemsA composite attitudinal scale The Chicago Guide to Writing about Numbers, 2 nd edition.

A new continuous variable calculated from 2+ continuous variables Original variable(s)New variable Separate measure of county-level population and poverty rate Number of poor persons in the county = population × % poor Separate measures of weight (kg.) and height (meters) Body Mass Index = weight/(height 2 ) The Chicago Guide to Writing about Numbers, 2 nd edition.

A mathematical transformation of a continuous variable Original variable(s)New variable Income in dollarsLogged income Income in dollarsIncome in thousands of dollars The Chicago Guide to Writing about Numbers, 2 nd edition.

Planning steps for creating new variables Finding relevant variables in the original data set Becoming acquainted with the units and categories for available variables Consulting the published literature on the topic to see how those concepts have been measured or classified by other researchers Identifying pertinent formulas and thresholds Writing out the logic or math needed to create the new variables from existing variables The Chicago Guide to Writing about Numbers, 2 nd edition.

Steps toward creating a new variable 1.Identify the name(s) of the original variable(s) in the data set that contain the data needed to create the new variable. 2.For the new variable, devise – A name (acronym) to convey Content (meaning) of the new variable The dates or survey rounds when the data were collected, if pertinent – A label (short descriptive phrase) for the new variable Mention units, if pertinent The Chicago Guide to Writing about Numbers, 2 nd edition.

For new continuous variables Write the formula to calculate the value of the new variable from the original variables. Specify the units of the original variable(s) and the new variable. The Chicago Guide to Writing about Numbers, 2 nd edition.

Example: Calculating course grades from component test scores For a hypothetical college course, the overall course grade is based on three exam scores – Two mid-term exams (EXAM1 and EXAM2) Each scored from 0 to 25 points – A final exam (FINAL) Scored from 0 to 50 points For each student, the instructor wants to calculate – The percentage of questions s/he got correct on exam 1 – Total numeric course grade – Course letter grade, based on standard grade cutoffs The Chicago Guide to Writing about Numbers, 2 nd edition.

Calculating percentage of exam questions correct from number of questions correct Logic: From the information in the data set, how does one calculate the percentage of questions correct? Concepts: Percentage of questions correct is number of questions correct divided by the total number of questions on the exam, multiplied by 100. Formula: Replace concepts with names of variables: PCCOREX1 = (EXAM1/25) * 100 STEP 2: name for new variable, not yet in data set. STEP 1: Identify existing variables, already in data set from which new variable will be calculated. STEP 3: Write the mathematical formula The Chicago Guide to Writing about Numbers, 2 nd edition.

Creating a variable for total numeric course grade from exam scores Logic: From the information in the data set, how does one calculate total numeric course grade? Concepts: Overall numeric course grade is the sum of the three exam scores. Formula: Replace concepts with names of variables: TOTGRADE = EXAM1 + EXAM2 + FINAL STEP 2: name for new variable, not yet in data set. STEP 1: Identify existing variables, already in data set from which new variable will be calculated. STEP 3: Write the mathematical formula The Chicago Guide to Writing about Numbers, 2 nd edition.

For new categorical variables Write the logical steps to classify the values of the original variable into the values of the new variable. Show how every possible value of the original variable maps into a value of the new variable. List the – Value label (descriptive phrase) for each value (category) of the new variable; – Code (numeric value) that the new variable will take on for each value or set of values of the original variable. The Chicago Guide to Writing about Numbers, 2 nd edition.

Classifying numeric course grades into letter grade ranges TOTGRADE Variable Label: Numeric course grade  LETTRGRD Variable Label: Final letter grade Values of original variable Values (codes) of new variableValue labels <601F 60 TO 692D 70 TO 793C 80 TO 894B 90 OR HIGHER5A STEP 2: name for new variable, not yet in data set. STEP 1: Identify existing variables from which new variable will be created. STEP 3: Write the logic for classifying the numeric scores into letter grade ranges, based on the university’s standard grade cutoffs. E.g., scores below 60 are classified an “F.”

Missing values for the new variable Provide instructions to ensure that cases that have missing values on the original variables will also have missing values for new variables that are based on them. Needed whether the new variable was created using – A formula – Classification instructions The Chicago Guide to Writing about Numbers, 2 nd edition.

Summary It is often necessary to create new variables to answer one’s research question. Planning steps for creating new variables include – Identifying source variables available in a data set – Finding references about how such variables are conventionally analyzed – Becoming familiar with units or categories of the variables – Writing formulas or classification instructions to create the new variables from the original variables – Providing instructions about missing values for the original and new variables The Chicago Guide to Writing about Numbers, 2nd Edition.

Summary, cont. With the formulas and classification instructions for creating the new variables, one can then use a spreadsheet or statistical software to create those variables within an electronic data set. Separate – The researching and planning steps – The programming steps The Chicago Guide to Writing about Numbers, 2 nd edition.

Suggested resources Miller, J. E The Chicago Guide to Writing about Numbers, 2nd Edition. University of Chicago Press, chapter 10. The Chicago Guide to Writing about Numbers, 2 nd edition.

Suggested practice exercises The Chicago Guide to Writing about Numbers, 2nd Edition. NAME of original variable ______________________ LABEL for original variable ______________________  NAME of new variable _______________________  LABEL for new variable _______________________ Values of original variableValues (codes) of new variable Value labels of new variable Instructions and a planning template can be downloaded from the supplemental online materials at

Suggested online appendixes How to Create the Variables You Need from the Variables You Have – Exercise includes Step-by-step instructions A template planning grid for a new categorical variable – Paper for instructors on how to teach the concepts and skills Getting to Know Your Variables – Exercise to familiarize researchers with the concepts, units, categories of variables in their data set – Paper for instructors on how to teach the concepts and skills The Chicago Guide to Writing about Numbers, 2nd Edition.

Contact information Jane E. Miller, PhD Online materials available at The Chicago Guide to Writing about Numbers, 2nd Edition.