Labour Market Evaluation: Theory and Practice Seamus McGuinness 11 th November 2011.

Slides:



Advertisements
Similar presentations
Active labour market measures and entrepreneurship in Poland Rafał Trzciński Impact Evaluation Spring School Hungary,
Advertisements

Dr Linda Allin Division of Sport Sciences The value of real life evaluation research for student learning and employability in Sports Development.
Challenges in evaluating social interventions: would an RCT design have been the answer to all our problems? Lyndal Bond, Kathryn Skivington, Gerry McCartney,
Activation in Ireland: Are we on the Right Path? Elish Kelly (ESRI) Seamus McGuinness (ESRI) Philip O’Connell (UCD Geary Institute) Conference on Irish.
Cross Sectional Designs
Estimating net impacts of the European Social Fund in England Paul Ainsworth Department for Work and Pensions July 2011
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.
Incorporating considerations about equity in policy briefs What factors are likely to be associated with disadvantage? Are there plausible reasons for.
Pooled Cross Sections and Panel Data II
Non-Experimental designs: Developmental designs & Small-N designs
Non-Experimental designs: Developmental designs & Small-N designs
CHAPIN HALL Permanency, Disparity and Social Context Fred Wulczyn Chapin Hall, University of Chicago.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
TOOLS OF POSITIVE ANALYSIS
1 Building services through partnership
PAI786: Urban Policy Class 2: Evaluating Social Programs.
TRADUIRE LA RECHERCHE EN ACTION Employment RCTs in France Bruno Crépon.
How Do Employment Effects of Job Creation Schemes Differ with Respect to the Foregoing Unemployment Duration? Reinhard Hujer University Frankfurt/M. 3rd.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Some perspectives on the importance of policy evaluation Joost Bollens HIVA- K.U.Leuven 1Joost Bollens.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Labour Market Evaluation: Theory and Practice Seamus McGuinness 8 th November 2013.
LSE seminar: London in recession The Recession – London’s experience and some policy implications Duncan Melville, Senior Associate 27 May 2009.
Evaluation of Labour Market Policies: The Use of Data-Driven Analyses in Ireland Elish Kelly Economic and Social Research Institute National Development.
Employment, unemployment and economic activity Coventry working age population by disability status Source: Annual Population Survey, Office for National.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Evaluation methods and tools (Focus on delivery mechanism) Jela Tvrdonova, 2014.
Measuring Impact: Experiments
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)
Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.
1 POPULATION PROJECTIONS Session 8 - Projections for sub- national and sectoral populations Ben Jarabi Population Studies & Research Institute University.
Centre for Market and Public Organisation Using difference-in-difference methods to evaluate the effect of policy reform on fertility: The Working Families.
1 The Need for Control: Learning what ESF achieves Robert Walker.
Management & Development of Complex Projects Course Code MS Project Management Risk Management Framework Lecture # 22.
Welfare Regimes and Poverty Dynamics: The Duration and Recurrence of Poverty Spells in Europe Didier Fouarge & Richard Layte Presented by Anna Manzoni.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
The Impact of Training Programme Type and Duration on the Employment Chances of the Unemployed in Ireland Group 5: Niall Cassidy, David Murphy Friday,
Discussion of: The Impact of a Temporary Help Job on Participants in Three Federal Programs by Carolyn J. Heinrich, Peter H. Muser and Kenneth R. Troske.
Development and Reform Research Team University of Bologna Assessing Active Labor Market Policies in Transition Countries: Scope, Applicability and Evaluation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Applying impact evaluation tools A hypothetical fertilizer project.
Employment, unemployment and economic activity Coventry working age population by ethnicity Source: Annual Population Survey, Office for National Statistics.
The Impact of Training Programme Type and Duration on the Employment Chances of the Unemployed in Ireland Philip O’Connell, Seamus McGuinness & Elish Kelly.
Universal Credit Presentation 1 st September 2015.
Does the Slovenian public work program increase participants’ chances to Find a job? Milan Vodopivec Presented by:-- Aklilu Mebrahtu.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
Randomized Assignment Difference-in-Differences
Overview of evaluation of SME policy – Why and How.
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
Raising standards, improving lives
What can a CIE tell us about the origins of negative treatment effects of a training programme Miroslav Štefánik miroslav.stefanik(at)savba.sk INCLUSIVE.
1 Joint meeting of ESF Evaluation Partnership and DG REGIO Evaluation Network in Gdańsk (Poland) on 8 July 2011 The Use of Counterfactual Impact Evaluation.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Public Finance and Public Policy Jonathan Gruber Third Edition Copyright © 2010 Worth Publishers 1 of 24 Copyright © 2010 Worth Publishers.
Do European Social Fund labour market interventions work? Counterfactual evidence from the Czech Republic. Vladimir Kváča, Czech Ministry of Labour and.
Ireland’s National Employment Action Plan Preventive Strategy MISEP meeting Paris, 3 rd November 2008 Nessan Vaughan FÁS – Training and Employment Authority.
Looking for statistical twins
Measuring Impact – what can we learn from recent DWP evaluations?
Profiling the Unemployed and Second Chance Education Schemes
Improving Employment Outcomes for Disadvantaged Groups: The Irish Context Philip J. O’Connell Pobal Conference: Creating an Inclusive Labour Market 9th.
ESF EVALUATION PARTNERSHIP MEETING Bernhard Boockmann / Helmut Apel
Matching Methods & Propensity Scores
Evaluating Impacts: An Overview of Quantitative Methods
Counterfactual Impact Analysis applied in the ESF-Evaluation in Austria (period ) Contribution to the Expert-Hearing: Member States Experiences.
Estimating net impacts of the European Social Fund in England
Presentation transcript:

Labour Market Evaluation: Theory and Practice Seamus McGuinness 11 th November 2011

Why is Evaluation Necessary It assesses the extent that policy initiatives are achieving their expected targets and goals. Drawing from this the evaluator will identify the nature of any shortfalls in either program delivery or the stated objectives. Value for money from the perspective of the tax payer is also likely to prove a dominant feature of any evaluation. Fulfils a vital policy challenge role within society and helps ensure that policy is evidence based and that ineffective programmes are modified or closed. Represents a key mechanism for policy challenge.

Challenges facing evaluators Lack of an evaluation culture: Policy makers may view evaluation as a threat and actively seek a less rigorous form of assessment. Stemming from this, often little consideration given to programme evaluation at the programme implementation and design stage (often a lack of viable a control group to assess the counterfactual). Data constraints: Lack of available and “linkable” administrative datasets

What are the most common forms of labour market evaluation ? Generally labour economists tend to focus on impact evaluation (is the programme achieving its desired impacts ?). Process evaluation (is the programme being delivered as intended ?) is less common. However, in practice most impact evaluations will also consider the efficiency of programme delivery and implementation. The bulk of impact evaluations focus on labour market programmes that are designed to improve outcomes related to employment, earnings and labour market participation.

Measuring a programmes impact Not at all straightforward: There have been instances when different researchers have arrived at very different conclusions regarding a programmes impact. We basically need to know what would happen to individuals had the programme not been in place i.e. we attempt to measure the counterfactual. There are various methods used for estimating the counterfactual, however, they all generally rely on measuring the difference in outcomes between people participating in the study (the treatment group) and those eligible for the programme but not participating in it (the control group).

The Selection Problem Comparison of a treatment and control group is not straightforward as substantial differences may exist between the two groups that must be factored out as assignment to either is rarely random. Such differences can also arise as a consequence of ineffective control group construction. Non-random selection refers to the possibility that (a) programme administrators engaged in “picking winners” in order to ensure the programmes success or (b) more capable individuals are more likely to put themselves forward for intervention. Failure to account for this will result in a serious over-estimate of the programmes effectiveness.

Ineffective Control Group Construction In evaluating the National Employment Action Plan (NEAP) in 2005 Indecon consultants compared a treatment of 1000 NEAP claimants (by definition first time claimants) and a control group of 225 unemployed (non-Neap) individuals taken from the ECHPS 58 % of whom were already LT unemployed at the initial point of observation. By definition none of the NEAP treatment group will have been LT unemployed. Indecon then compared the unemployment rates of the control and treatment groups 24 months down the line and concluded that the control group faired much better and that the NEAP programme was, therefore, effective. Does this represent a like for like comparison?

Methods Used for Overcoming the Selection Problem Difference in Difference Estimator: This is a two-period estimator and requires that the treatment is introduced in a second time period. More powerful as it seems as will eradicate non-random selection based around unobserved attributes (picking winners etc). Matching Estimators: Tries to match control and treatment group members on observable characteristics (education, age, labour market history etc) to ensure a like-for-like comparison (consider the earlier NEAP example). May still be prone to unobserved influences? Other methods do exist such as the use of controlled experiments but these are rarely seen in the context of labour market evaluation.

Difference in Difference Period 1: Outcome Y (say earnings) determined by observable characteristics X (age, education, labour market experience etc) and unobservable factors that do not change over time I (innate ability, motivation etc). Period 2: Outcome variable determined as in period 1 but say a labour market training programme ( a treatment T) is now present. By differencing across the same individuals in two periods we can both isolate T and remove the impact of time invariant (and often unobserved) factors.

Difference in Difference

Example of a difference in difference approach Say we plan to introduce a new unemployment activation measure in June 2013 in the County Dublin. Our control group would be the rest of the country that were not to receive the measure (until perhaps 2014). We would estimate a model comparing exits from unemployment in Dublin w.r.t. the Rest of Ireland over both periods ( ). The extent to any change in the margin of difference in Dublin exit rates exit rates (relative to rest of Ireland) over the two periods will be interpreted as the impact of the programme.

Model Estimation dt = dummy variable for treatment group (Dublin area), will pick up any differences between the treatment and control groups prior to the policy change. T is a dummy variable for time period 2 and measures the extent to which the value of y increased or fell in period 2 independent of anything else. dt*T will be = 1 for those individuals in the treatment group receiving intervention in the second period. It is therefore a measure of the impact of the policy.

Difference in Difference Really powerful tool in eradicating unobserved bias “picking winners” self-selection etc. Required little data. Requires that policy be implemented in a rolled out fashion e.g. across regions across time « not always appreciated by policy makers. Is it sufficient to deal with selection bias on observables?

Propensity Score Matching This technique allows us to deal explicitly with the problem of differences in the characteristic make-up of the control and treatment group that have the potential to bias our estimate of the programme impact. For example, say we have an active labour market programme aimed at reducing unemployment and the control group contains a higher proportion of LT unemployed. Failure to control for this will upwardly bias the estimated programme impact as the control group, by definition almost, have lower likelihoods of labour market success even before the impacts of any programme are started. Basically, chances are that if you compare the proportions in employment of both groups, at a future point in time, in the absence of any labour market programme, the treatment group will have performed better. Thus the problem we must confront is that the estimated programme impact is simply being driven up, or entirely attributable, to differences in the characteristic make up of our control and treatment groups.

What does PSM do It is a method that allows us to match both the treatment and control groups on the basis of observable characteristics to ensure we are making a like for like comparison. After matching has been completed, we simply compare the mean outcomes (e.g. employment rates) of the control and treatment groups to see which is highest.

How do we match We estimate a probit (1,0) model on treatment group membership. This identifies that main characteristics that separates the control from the treatment group. Every member of the control and treatment group is then given a probability of their likelihood of being assigned to the treatment group based on their characteristics. Each member of the treatment group is then “matched” with a member of the control group with a similar probability score. It can be shown that matching on probability score is equivalent to matching on actual characteristics. This process ensures that the treatment and control groups are similar in terms of their observable characteristics.

Matching Clearly again a powerful tool and the most effective for tackling the sample selection problem. Requires a lot of data and additional checks to ensure that matching was successful and all observable differences between the control and treatment groups were eradicated. Does not deal with unobserved bias.

Carrots without sticks: an evaluation of active labour market policy in ireland Seamus McGuinness, Philip O'Connell & Elish Kelly

Overview This study focuses on assessing the effectiveness of the Job Search Assistance (JSA) component of the National Employment Action Plan (NEAP). The NEAP is Irelands principal tool for activating unemployed individuals back into the labour market. Under the NEAP, individuals registering for unemployment benefit are “automatically” referred to FÁS for an interview after 13 weeks on the system. The FAS interview is aimed at helping claimants back into work through advice and placement and referring others for further training. Individuals with previous exposure to NEAP – i.e. those with a previous history of unemployment – are excluded and will not be referred to FÁS for a second time. NEAP was distinct in an international sense in that it was characterised by an almost complete absence of monitoring and sanctions. Unusually, it did not appear to hinge on the principal of mutual obligation.

Evaluations Objectives To assess the extent to which individuals participating in the NEAP were more likely to find employment relative to non- participants To assess the extent to which individuals in receipt of both interview and training had enhanced employment prospects relative to those in receipt of interview only (impact of training). We are going to focus on the effectiveness of the referral and interview process.

Problem 1: No control group? Selection under the NEAP is automated and universal. If all claimants are automatically sent for interview at week 13 of their claim then how can we construct a counterfactual. i.e. remember counterfactual assesses what happens to individuals in the absence of the programme. The only people not exposed to the programme are those already in employment by week 13. This rules difference-in-difference out for a start. Problem illustrates very clearly that the need for proper evaluation was not a major consideration in the programmes design or implementation.

What can we do? Only option is to utilise the fact that individuals with previous exposure to NEAP can’t access it again (totally counter-intuitive rule as basically those most in need of support were being excluded from the outset). We take as an initial control group individuals who had previous exposure to NEAP more than two years prior to the study who’s contact was limited to a FAS interview. Given the time lapse and changing macroeconomic conditions any advice received by the control group should have declined in relevance allowing some assessment of the impact of the programme. Still even if the above were true we are still left with a selection problem as, prior to the study, all of the control group will have had a previous unemployment spell of at least 13 weeks whereas none of the treatment group will. This difference cannot be eradicated by matching and our estimates are unlikely to be free of bias.

Construction of The Evaluation data Weekly Population of Live Register Claimants Weekly Population of Live Register Claimant Closure Files Profiling Questionnaire InformationforClaimant Population Issued June to September 2006 Live Register Claimant Population (September 2006– June 2008) Dataset for NEAP Evaluation FAS Events Histories

New Control Group Found? On linking the data we found that around 25 % of new claimants were not being referred by DSP to FAS after 13 weeks unemployment duration, despite these individuals having no previous exposure to the NEAP. We need to establish what is going on here, are we missing something in terms of the referral process and, if not, what are the factors driving the omission and are they random. A list containing the PPS numbers of our potential new control group was sent to DSP for validation.

Validation checks DSP confirm that individuals had fallen through the net. No concrete explanation found. Most likely that individuals were not referred when number of referrals in DSP office exceeded slots in local FAS office and had been subsequently overlooked when slots became available. Even before we begin we have uncovered major problems with programme processes i.e. 25 % of potential claimants excluded and a further 25% missed. Clear example of how process evaluation becomes a component of an impact evaluation.

the control group A natural experiment?

Why 20 weeks and not 13? Neap activation is dependant on the individual being on the register for at least 13 weeks. To ensure a like for like comparison we should ensure that our control groups have also been on the register for a similar period. Given that backlogs are likely to exist in large organisations, we raise the threshold to 20 weeks to ensure that individuals are not being assigned to a control group simply because they have not yet received an interview yet.

Data and methods In terms of econometrics, we estimate probit and matching models augmented by additional checks for unobserved heterogeneity bias. All models contain a wide range of controls for educational attainment, health, location attributes, access to transport, age marital status, labour market history etc, that we available to us as a consequence of the profiling data.

How random are our control groups? Is there a selection problem?

Total Sample Treatment Group Control Group I Control Group II Employment History: Employed in Last Month Employed in Last Year Employed in Last 5 Years Employed Over 6 Years Ago Never Employed

What are the descriptive telling us ? The treatment group and control group I look very similar which would suggest that the “process” that generating control group I was random in nature. There are more substantial differences between the treatment group and control group II in that the latter tends to be more disadvantaged in terms of their observable characteristics. Potential for selection bias here.

Kaplan-meier survival estimate

Regular Probit These will give us an initial estimate, that may or may not be biased, of the effectiveness of NEAP. We want to see that the data is sensible and that relationships move in the correct direction. This is important both in terms of the probit estimate and the reliability of any subsequent matching. Provides us with an assurance that there is nothing weird happening with our data. Note: The models measure the impact of variables w.r.t. The claimants probability of exiting the live register before 12 months.

Model 1 Both C G Model 2 Control Group I Model 3 Control Group II NEAP Intervention: FÁS Referral plus Interview -0.07***-0.16***0.02 (0.013)(0.017)(0.001) Personal and Family Characteristics: Male 0.06***0.08***0.06*** (0.014)(0.016)(0.015) Age Reference Category: Aged Age *** ** (0.018)(0.023)(0.020) Age ***-0.09***-0.10*** (0.020)(0.026)(0.022) Age ***-0.08*** (0.023)(0.029)(0.025) Age 55+ Years -0.22***-0.26***-0.18*** (0.022)(0.026)(0.025) Health Reference Category: Bad/Very Bad Health Very Good Health 0.15** (0.065)(0.082)(0.070) Good Health 0.11* (0.067)(0.084)(0.073) Marital Status Reference Category: Single Married *-0.02 (0.021)(0.025)(0.023) Cohabits (0.030)(0.035)(0.032) Separated/Divorced -0.06**-0.07**-0.06* (0.030)(0.037)(0.032) (0.075)(0.083)(0.084) Children -0.04*** (0.010)(0.012)(0.011)

Model 1 Both Control Groups Model 2 Control Group I Model 3 Control Group II Spousal Earnings Reference Category: None Spouse Earnings € ***0.13***0.16*** (0.036)(0.041)(0.040) Spouse Earnings €251-€ (0.090)(0.099)(0.094) Spouse Earnings €351 and Above -0.05**-0.06**-0.05* (0.023)(0.027)(0.025) Human Capital Characteristics: Education Reference Category: Primary or Less Junior Certificate (0.021)(0.026)(0.022) Leaving Certificate 0.05**0.10***0.04* (0.021)(0.026)(0.023) Third-level 0.15***0.17***0.14*** (0.023)(0.028)(0.025) Apprenticeship * (0.018)(0.022)(0.020) Literacy/Numeracy Problems -0.06***-0.06** (0.024)(0.030)(0.025) English Proficiency (0.035)(0.040)(0.038) Employment/Unemployment/Benefit History: Employment History Reference Category: Never Employed in Last Month 0.08**0.09*0.10** (0.040)(0.049)(0.043) Employed in Last Year * (0.042)(0.051)(0.046) Employed in Last 5 Years (0.042)(0.053)(0.047)

Model 1 Both Control Groups Model 2 Control Group I Model 3 Control Group II Job Duration Reference Category: Never Employed Job Duration Less than Month 0.09*0.12**0.05 (0.046)(0.057)(0.049) Job Duration 1-6 Months 0.11***0.16***0.10** (0.038)(0.048)(0.041) Job Duration 6-12 Months 0.09**0.13***0.06 (0.040)(0.051)(0.043) Job Duration 1-2 Years 0.07*0.14***0.04 (0.041)(0.051)(0.043) Job Duration 2+ Years (0.038)(0.048)(0.040) Would Move for a Job 0.05*** (0.013)(0.016)(0.014) Social Welfare Payment Type Reference Category: Jobseeker’s Benefit Jobseeker’s Allowance -0.18*** -0.17*** (0.015)(0.018)(0.016) Signing on the Live Register for 12 Months Plus -0.19***-0.18***-0.13*** (0.017)(0.034)(0.019) On CE Scheme for 12 Months Plus -0.14***-0.22***-0.12** (0.046)(0.063)(0.048) Geographic Location Information: Location Reference Category: Rural Village (0.021)(0.025)(0.023) Town (0.021)(0.024)(0.022) Large Town/City (0.021)(0.025)(0.022)

Selection Bias – (see handout)

Checking our Assumptions 1

Checking our Assumptions - II

Summary and conclusions - I Strong and consistent evidence that JSA delivered under the NEAP was highly ineffective and actively reduced transitions off the Live Register to employment. Two possibilities arise: (i) claimants received poor advice or (ii) claimants relaxed the intensity of job-search on learning of the absence of monitoring and sanctions. Advice explanation not supported by results as we would expect the negative impact to fall away in medium term models as claimants adjust behaviour.

Summary and conclusions - II We conclude that participants attending the interview quickly learnt that their prior fears with respect to the extent of job search, monitoring and sanctions were unjustified and consequently lowered their job search activity levels. Note: - The analysis was found to be robust to the influences of both sample selection and unobserved heterogeneity. - Strong negative JSA effects were also generated using a other estimation techniques (Cox Proportional Hazard Model).

How Reliable are our results ? We controlled for a wide-range of observables implying that unobserved factors should be less of a factor. Sensitivity tests seemed to confirm this. We had a highly representative control group. Still PSM framework while allowing us to test the sensitivity of estimates of unobserved bias – it does not eradicate. We are seeing the increased use of combined PSM and diff-in-diff methods of ensuring that evaluation estimates are free from both selection bias (on observables) and unobserved bias (picking winners etc).