Using Matching Techniques with Pooled Cross-sectional Data Paul Norris Scottish Centre for Crime and Justice Research University of Edinburgh

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

A Spreadsheet for Analysis of Straightforward Controlled Trials
Research Skills Workshop Designing a Project
Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Measuring the impact of intimate partner violence against women on victims and childrens well-being: An application of Matching Decomposition Techniques.
Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:
Presenter Name(s) Issue date National Student.
Autocorrelation and Heteroskedasticity
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Analyzing Intersectionality of Inequalities in Employment
Poverty trajectories after risky life events in Germany, Spain, Denmark and the United Kingdom: a latent class approach Leen Vandecasteele Post-doctoral.
Transitions from independent to supported environments in England and Wales: examining trends and differentials using the ONS Longitudinal Study Emily.
Multilevel modelling short course
Changing Gender Differences in Old Age – Self-Reported Health in UK Yiu-tung Suen DPhil Sociology Candidate University of Oxford
Depression and work incapacity in Scotland: Evidence from the Scottish Health and British Household Panel Surveys Matt Sutton Will Whittaker Health Methodology.
Offending Crime and Justice Survey Stephen Roe Crime Surveys Programme, Home Office Tel:
Using the ESDS surveys to look at change over time Vanessa Higgins ESDS Government Centre for Census and Survey Research (CCSR) University of Manchester.
Peter Lynn ESDS Event, 12 March 2004 Weighting for Longitudinal Surveys Peter Lynn University of Essex, UK.
Melissa Davy Office for National Statistics
Longitudinal LFS Catherine Barham and Paul Smith ONS.
Something Fishy? Uncovering heterogeneity in the distribution of crime victimisation in general populations Tim Hope and Paul Norris SCCJR.
Domestic violence, sexual assault and stalking:
Employment transitions over the business cycle Mark Taylor (ISER)
1 The Social Survey ICBS Nurit Dobrin December 2010.
Postgraduate Course 7. Evidence-based management: Research designs.
(This presentation may be used for instructional purposes)
Relationships Between Two Variables: Cross-Tabulation
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Edouard Manet: The Bar at the Folies Bergere, 1882
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Basic Concepts of Further Analysis.
Employment Trendswww.ilo.org/trends Theo Sparreboom Employment Trends International Labour Organization Geneva, Switzerland Working poverty in the world.
Psychology Practical (Year 2) PS2001 Correlation and other topics.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
Methodological issues in LS analysis of mortality and fertility by ethnic group Bola Akinwale.
Historical Changes in Stay-at-Home Mothers: 1969 to 2009 American Sociological Association Annual Meeting Atlanta, GA August 14-17, 2010 Rose M. Kreider,
Correlational and Differential Research
Cross Sectional Designs
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores.
A Tale of Two Sources Bringing Together Scotland’s Crime Statistics Trish Campbell, Justice Analytical SGJusticeAnalys.
Patterns of Police Reporting Amongst Victims of Partner Abuse: Analysis of the SCJS 2008/09 Sarah MacQueen Scottish Centre for Crime and Justice Research.
Today Concepts underlying inferential statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Using the Health Survey for England to examine ethnic differences in obesity, diet and physical activity Vanessa Higgins & Angela Dale Centre for Census.
Measuring Output from Primary Medical Care, with Quality Adjustment Workshop on measuring Education and Health Volume Output OECD, Paris 6-7 June 2007.
Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.
Parents’ basic skills and children’s test scores Augustin De Coulon, Elena Meschi and Anna Vignoles.
Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Sampling, sample size estimation, and randomisation
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
FCS681 Types of Research Design: Non Experimental Hira Cho.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Chapter Eight: Quantitative Methods
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Conducting surveys and designing questionnaires. Aims Provide students with an understanding of the purposes of survey work Overview the stages involved.
Looking for statistical twins
Understanding different types and methods of research
Constructing Propensity score weighted and matched Samples Stacey L
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Chapter Eight: Quantitative Methods
Cross Sectional Designs
The European Statistical Training Programme (ESTP)
Chapter: 9: Propensity scores
3 basic analytical tasks in bivariate (or multivariate) analyses:
Presentation transcript:

Using Matching Techniques with Pooled Cross-sectional Data Paul Norris Scottish Centre for Crime and Justice Research University of Edinburgh

What is Pooled Cross-sectional Survey Data? “In the repeated cross-sectional design, the researcher typically draws independent probability samples at each measurement point” (Menard, 1991, p26) - Asks comparable questions to each sample - Samples will typically contain different individuals - Each sample reflects population at the time it is drawn For more details on this type of data, and possible approaches to analysis, see Firebaugh (1997), Menard (1991) Micklewright (1994) and Ruspini (2002)

Why Use Pooled Cross-sectional Data? Repeated Cross-sectional surveys are much more common than panel based survey data Data available covering a much wider range of topics Researchers often more used to analysing cross- sectional data Cross-sectional data avoids issues such as sample attrition Can give increased sample size for cross-sectional models?

Limitations of Pooled Cross- sectional Data Does not involve following the same individuals over time Most useful for exploring aggregate level change – hard to establish intra-cohort changes Difficult to establish causal order - particularly at the individual level Questions and definitions can change over time For a discussion of the issues confronted when creating a pooled version of the General Household Survey see Uren (2006)

Studying Aggregate Trends n:1992=1013, 1995=815, 1999=746, 2003=1251 Error bars show 95% confidence intervals

Which Shifts Underpin Aggregate Change? Changes in an aggregate pattern can be attributed to two types of underlying shift:- Model Change Effects – the behaviour of individuals (with identical characteristics) changes over time Distributional Effects – the makeup of the “population” changes over time For a more complete description of these terms see Gomulka, J and Stern, N (1990) and Micklewright (1994)

Separating Distribution and Model Change Effects Estimates of distributional and model change effects can be created by considering what outcomes would occur if the behaviour from one time period was applied to the population from different time periods Build up a matrix of predicted outcomes for different behaviours and populations These figures allow us to see what would occur if population was constant and behaviour changed and vice versa For an example of such a matrix see Gomulka, J and Stern, N (1990)

Comparing Reporting to the Police in 1992 with 2002 Reporting Behaviour Mix of Crime Imagine a simple case where the change in crime reported to the police is a function of two factors: The mix of crime (Population distribution) Willingness to report different crimes (Behaviour model)

Estimating Alternative Reporting Rates 1992 Proportion of Crime Reporting Percentage Vandalism Acquisitive Violence Total The missing figures on the previous slide can be calculated by applying the reporting rates for each crime from one year to the crime mix from the other year2002 Proportion of Crime Reporting Percentage Vandalism Acquisitive Violence Total

Estimating Alternative Reporting Rates 1992 Proportion of Crime Reporting Percentage Vandalism Acquisitive Violence Total The missing figures on the previous slide can be calculated by applying the reporting rates for each crime from one year to the crime mix from the other year2002 Proportion of Crime Reporting Percentage Vandalism Acquisitive Violence Total

Updated Matrix With Estimated Reporting Rate Reporting Behaviour Mix of Crime Both the change in the mix of crime and change in reporting behaviour appear to have lowered reporting between 1992 and 2002 Relative impact of distributional and model change effects depends on which year’s data is considered

What is Propensity Score Matching? A method for identifying counterfactual cases across different samples Employs a predicted probability of group membership— e.g., 1993 SCVS verses 2003 SCVS on observed predictors, usually obtained from logistic regression to create a counterfactual group Matches together cases from the two samples which have similar predicted probabilities Once counterfactual group is constructed – outcome is compared across groups For a more complete description of propensity score matching see Sekhon (2007)

Using Propensity Score Matching to Estimate Distributional and Model Effects Reporting Behaviour Mix of Crime The estimates provided by the propensity score matching are identical to those calculated earlier. What a waste of a Thursday afternoon, or is it?

Generalising to More Factors In reality changes in reporting are likely to be a function of more than just the two factors we have considered Need to generalise the outcome matrix Reporting Behaviour PopulationDistribution Much harder to account for multiple factors in manual calculations

Factors Influencing Reporting to the Police The decision to report crime to the police is likely to be a function of many factors Type of Crime Attitude to the Police Quantity of Loss Insurance Age Gender Social Class Income Family Status Injury Relationship to Offender Perceived Threat Culpability Social Context Repeated Incident

Estimates Using “Full” Matching Reporting Behaviour PopulationDistribution Matching on crime type, gender, age, social class, ethnicity, household income, weapon used, threat used, doctor visited, insurance claimed, value of damage/theft, Injury, took place at home, tenure and marital status Change in reporting seems to be most related to distributional changes Estimates appear more consistent across behaviour/distributional mixes Change in population of crimes and victims seems to have lowered reporting rates Reporting behaviour also slipped (but non-significant)

Balanced Samples Propensity score refers to an “overall” indicator of differences between the two samples Important to check characteristics of cases are evenly distributed across samples after matching Still issues of multivariate comparability A more complete discussion of how to asses balance is given in Sekhon (2007)

Generic Matching Achieving balance can prove difficult in propensity score matching Generic matching is one possible approach to this problem Uses an evolutionary algorithm to match cases Aim is to maximise the p-value associated with the covariate which represents the greatest difference between the two samples See Sekhon (2007) "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R."

Generic Matching – Computational Issues Generic matching is very computer intensive (both cpu and memory) R routine can be used on a computer cluster Analysis based on example dataset from Sekhon (2007) contains 185 treatment cases and matches on 10 variables

Strengths of Matching for Separating Distribution and Model Change Effects Intuitively simple – what is the change in outcome if we hold population constant? Applicable to a wide range of data sources Can be implemented in most standard software packages Offers a perspective on social change over time

Weaknesses of Matching for Separating Distribution and Model Change Effects Only considers aggregate level change Success relies on matching on all relevant factors Comparability of data over time can be questioned Issues around reliability of matching:- Can be difficult to achieve accurate matching using regression based methods Generic matching can be computer intensive

Bibliography Sekhon, J (2007) "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Micklewright, J (1994) “The Analysis of Pooled Cross-sectional Data" in Dale, A and Davies, R (1994) Analyzing Social Change. Sage Publishing “The Analysis of Pooled Cross-sectional Data" Menard, S (1991) Longitudinal Research. Sage Publications Uren, Z (2006) The GHS Pseudo Cohort Dataset (GHSPCD): Introduction and Methodology [cited 01/05/2008] Gomulka, J and Stern, N (1990) “The Employment of Married Women in the UK: " in Economica, 57(226): “The Employment of Married Women in the UK: " FireBaugh, G (1997) Analyzing Repeated Surveys. Sage Publications Ruspini, E (2002) Introduction to Longitudinal Research. Routledge