Writing with Data: Incorporating Statistics Into Causal Research Statlab Workshop Spring 2011 Brian Fried and Kevin Callender.

Slides:



Advertisements
Similar presentations
StatLab Workshop Fall 2011 Brian Fried and Jeremy Green.
Advertisements

Quantitative Techniques Lecture 1: Economic data 30 September 2004.
Everything I wish I had known about research design and data analysis… Statlab Workshop Spring 2005 Heather Lord and Melanie Dirks.
Design of Experiments Lecture I
Cross Sectional Designs
© Copyright 2001, Alan Marshall1 Regression Analysis Time Series Analysis.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
Logit & Probit Regression
Departments of Medicine and Biostatistics
Introduction to Statistics: Political Science (Class 7) Part I: Interactions Wrap-up Part II: Why Experiment in Political Science?
Data Analysis Statistics. Inferential statistics.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Statistics Micro Mini Threats to Your Experiment!
Writing tips Based on Michael Kremer’s “Checklist”,
So, You’re Going to Write an Empirical Paper Statlab Workshop October 31 st, 2003 David Nickerson.
Copyright © 2013 Wolters Kluwer Health | Lippincott Williams & Wilkins Statistical Methods for Health Care Research Chapter 1 Using Research and Statistics.
Data Analysis Statistics. Inferential statistics.
Lecture 2 Research Questions: Defining and Justifying Problems; Defining Hypotheses.
Today Concepts underlying inferential statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.
Everything I wish I had known about research design and data analysis… Statlab Workshop Fall 2006 Kyle Hood and Frank Farach.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
The Practice of Social Research
Example of Simple and Multiple Regression
QUANTITATIVE METHODS I203 Social and Organizational Issues of Information.
Statistical Analyses & Threats to Validity
Introduction to Linear Regression and Correlation Analysis
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Research Design & Analysis 2: Class 22 Announcement: Honours conference, Saturday 8:30-12:15 BAC 132 Multiple regression SPSS output –(optional lab assignment)
Correlational Research Chapter Fifteen Bring Schraw et al.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Review of Research Methods. Overview of the Research Process I. Develop a research question II. Develop a hypothesis III. Choose a research design IV.
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
11/25/2015Marketing Research2 Observation (Variables) Theory (Concepts)
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
WERST – Methodology Group
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
EPSE 592 Experimental Designs and Analysis in Educational Research
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Dr. Justin Bateh. Point of Estimate the value of a single sample statistics, such as the sample mean (or the average of the sample data). Confidence Interval.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Regression Analysis.
REGRESSION G&W p
Regression Analysis.
Individual Assignment 6
Presentation transcript:

Writing with Data: Incorporating Statistics Into Causal Research Statlab Workshop Spring 2011 Brian Fried and Kevin Callender

Outline of Workshop Part I: Causation and Statistics What is Causation? Correlation? What is Causation? Correlation? Why Statistics? Why Statistics? Threats to Inference Threats to Inference Part II: Gathering and Using Data Gathering Data Gathering Data Managing Data Managing Data Part III: Writing with Statistics A General Outline, with an example A General Outline, with an example

Causation vs. Correlation Causation… …correlation

Why Statistics Probabilistic Relationships (see previous graph) Probabilistic Relationships (see previous graph) Multivariate Relationships We can analyze the relationships between multiple variables at the same time. (e.g. education, age, gender, income …. -> voting) Multivariate Relationships We can analyze the relationships between multiple variables at the same time. (e.g. education, age, gender, income …. -> voting) What is a regression? What is a regression?

Threats to Inference Endogeneity (vs exogeneity of errors) Endogeneity (vs exogeneity of errors) Autocorrelation (time series) Autocorrelation (time series) Homo/Heteroskedasticity Homo/Heteroskedasticity Internal vs. external validity Internal vs. external validity Probably the most important step in research design; advanced techniques can often compensate.

Part II: Data Think about analyses early! (Ideal vs. Possible) What’s Possible? What’s Convincing? Experimental Ideal Experimental Ideal Practical Data Limitations Practical Data Limitations Collecting Your Own Data Collecting Your Own Data Using Other Data Using Other Data Some data sources: Statlab Webpage ( Statlab Webpage ( Advisors/Professional Contacts Advisors/Professional Contacts Yale StatCat ( Yale StatCat ( ICPSR ( ICPSR ( Reference Librarian (Julie Linden) Reference Librarian (Julie Linden)

(Quant.) Data Types and Uses Dependent Variable ( response, outcome, criterion) Independent Variables ( explanatory or predictor variables) Control / Confounding Variables Categorical and Continuous Variables Remember: Types of variables we choose determine the statistics we use Qualitative knowledge always helps!

Once You’ve Found or Collected Your Data Download the data and documentation StatTransfer (Statlab) StatTransfer (Statlab) Determine data file type Probably a text file (.txt,.dat,.raw) Probably a text file (.txt,.dat,.raw) Converting text & delimited files Choose a statistical software program

Managing your data Back up all Master Data Files Codebook Merging Data Merging Data Adding variables, cases, computing new variables Adding variables, cases, computing new variables Keep a roadmap Keep a log of all analyses with what you have done Keep a log of all analyses with what you have done Save syntax files Save syntax files

Syntax Files What are they? Text-files used to enter commands in bulk Why? You will make mistakes, need to make changes How do I know what to write? Program’s manual provides the underlying command

Part III: Writing Introduction Theory (Lit Review) Data Description Analysis/ResultsConclusion

Introduction Question What is the question you want to answer? Why should we care? Hypothesis Succinctly state your claim Context & Summary

Motivation Are politics becoming more programmatic in Brazil? Are politics becoming more programmatic in Brazil? Is Bolsa Familia, a conditional cash transfer (CCT) program that benefits a quarter of Brazil’s population, programmatic? Is Bolsa Familia, a conditional cash transfer (CCT) program that benefits a quarter of Brazil’s population, programmatic? An Illustrative Example: Bolsa Familia

Programa Bolsa Família – key facts Conditional cash transfer (CCT) program, launched in October This was not the first CCT program in Brazil; some existing programs (like Bolsa Escola) were incorporated into Bolsa Familia. Conditional cash transfer (CCT) program, launched in October This was not the first CCT program in Brazil; some existing programs (like Bolsa Escola) were incorporated into Bolsa Familia. Benefits families with per capita income below US$78. Benefits families with per capita income below US$ million poor families (almost 50 million people) currently receive support in all 5,564 Brazilian municipalities; 12 million poor families (almost 50 million people) currently receive support in all 5,564 Brazilian municipalities; Size of stipend: between US$13 and US$114, depending on the family’s size and poverty level. Size of stipend: between US$13 and US$114, depending on the family’s size and poverty level. Average amount: US$54 per family Average amount: US$54 per family 2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP) 2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP) An Illustrative Example: Bolsa Familia

Theory/Lit. Review What does existing theory say? What do you believe? What do you believe? Position yourself within theoretical debates. Position yourself within theoretical debates. Identify Testable Hypotheses Choose Method Best Suited to Testing Your Hypothesis Do you need statistics after all? Quantitative v Qualitative research Quantitative v Qualitative research

Research Question Do political criteria explain the variation in Bolsa Familia’s coverage across municipalities? Theoretical (Cox and McCubbins 1986, Dixit and Londregan 1996, Lindbeck and Weibell 1987) and empirical (Ames 1987, Levitt and Snyder 1995, Schady 2000, Dahberg and Johansson 2002, Stokes 2004, Kitschelt 2010) reasons to believe that political spending is often targeted, especially given Brazil’s history with clientelism and pork. Theoretical (Cox and McCubbins 1986, Dixit and Londregan 1996, Lindbeck and Weibell 1987) and empirical (Ames 1987, Levitt and Snyder 1995, Schady 2000, Dahberg and Johansson 2002, Stokes 2004, Kitschelt 2010) reasons to believe that political spending is often targeted, especially given Brazil’s history with clientelism and pork. An Illustrative Example: Bolsa Familia

How do politicians target? “Core” “Core” “Swing” “Swing” Mobilization Mobilization An Illustrative Example: Bolsa Familia

Descriptive Statistics Variables Dependent Variable(s) Independent Variable(s) Important Control Variable(s) Graphs Summary Statistics on Key Variables Number, Mean, Minimum, Maximum, Standard Deviation Cross-Tabs

Descriptive Statistics Mean Stand. Dev. MinMaxMissing Dependent Variable Coverage in Explanatory Variables PT Vote Share for Deputado Federal PT Vote Share for President An Illustrative Example: Bolsa Familia

Coverage in 2009This continuous variable is the ratio of recipients over the number estimated to be poor in each municipality in November of PT Voteshare for Deputado FederalThis continuous variable captures a core targeting strategy and measures average PT vote share for federal deputy across the 2002 and 2006 elections. PT Voteshare for PresidentThis continuous variable captures a core targeting strategy and measures average PT vote share for president across the 2002 and 2006 elections. Key Variables An Illustrative Example: Bolsa Familia

Descriptive Statistics Mean Stand. Dev. MinMaxMissing Explanatory Variables PT Mayor in Base Mayor in Change in Support for PT Presidential Candidate Close Presidential Election in An Illustrative Example: Bolsa Familia

So, how do I analyze my data? Correlational design Correlation allows you to quantify relationships between variables (r, r-squared) Correlation allows you to quantify relationships between variables (r, r-squared) Correlation, partial correlation Correlation, partial correlation Regression allows you predict scores on 1 variable from subjects score on another variable(s) Regression allows you predict scores on 1 variable from subjects score on another variable(s) Group differences t-test & ANOVA t-test & ANOVA Chi-square for categorical and frequency data Chi-square for categorical and frequency data Significance v. effect size Simulations

Methods of Analysis (Empirical Strategy ) We discussed this in Part I, but one generally devotes a section to explaining how one will identify a causal relationship prior to the results section. Coverage = β 0 + β 1 (political criteria) + β X X + e

Results: Explaining Coverage in 2009 Explanatory VariableRegression Coefficient Core Indicators PT Vote Share for Deputado Federal-.473*** PT Vote Share for President-.0972*** PT Mayor-.0241** Base Mayor-.0208*** Swing Indicators Change in Support for PT Presidential Candidate -.175*** Close Presidential Election An Illustrative Example: Bolsa Familia

Effect of Standard Deviation Shift of Explanatory Variables on Coverage in 2009 Shift Explained by Political Criteria Effect of Shift in Support PT Vote Share for Deputado Federal PT Vote Share for President PT Mayor* Base Mayor* Change in Support for PT Presidential Candidate Close Presidential Election in 2006* 0.007

Robustness Identify Threats to Inference! (Do I have any?)

Robustness Check: Relationship between Coverage in 2004 and Prior Elections Shift Explained by Political Criteria Effect of Shift in Support PT Vote Share for Deputado Federal in PT Vote Share for President in PT Mayor in 2000*0.002 Base Mayor in 2000*0.005 Change in Support for PT Presidential Candidate (1998 to 2002) Close Presidential Election in 2002*-0.016

Putting Output into a Paper Cut and Paste Graphs Cut and Paste into Word Processing document Save as.jpeg or.tif file Tables Cut and Paste Format in Word Processing document Import into Excel, format, and then place in Word

More Advanced Analysis Multivariate techniques are only a start; they do help to account for confounding factors, allow for testing change over time and more complex hypotheses … (See: Tabachnick & Fidell, Using Multivariate Statistics) 1) Be honest about your abilities. 2) Ask for help 3) Best off including techniques that you fully understand, but may be worth learning something new!

Take Away Messages 1) Begin by thinking about what question interests. 2) Look for data and consider appropriate methods; identify what hypotheses are actually testable. 3) Design and run analysis; keep a codebook/syntax files! 4) Back up data 5) Ask for help-especially when choosing method—and seek feedback on research design. 6) Research and Writing an Iterative Process