Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.

Similar presentations


Presentation on theme: "Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs."— Presentation transcript:

1 Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs

2 2 Goals of Meeting Review design Review design --research questions and conceptual framework --research questions and conceptual framework --choice of comparison group --choice of comparison group --data sources and outcome measures --data sources and outcome measures --methods --methods --possible options for timing and sample selection --possible options for timing and sample selection Respond to comments Respond to comments Discuss proposed options and possible modifications to options Discuss proposed options and possible modifications to options

3 3 Goals of LRPs and of Evaluation To increase number of individuals conducting research in certain fields, NIH implemented 5 extramural loan repayment programs (LRPs): To increase number of individuals conducting research in certain fields, NIH implemented 5 extramural loan repayment programs (LRPs): Clinical (began in 2002) Clinical (began in 2002) Clinical for those from disadvantaged backgrounds (2001) Clinical for those from disadvantaged backgrounds (2001) Pediatric (2003) Pediatric (2003) Health disparities (2001) Health disparities (2001) Contraception and infertility (1997) Contraception and infertility (1997) Evaluation objective: assess whether programs are achieving their goals of recruiting and retaining researchers in these fields Evaluation objective: assess whether programs are achieving their goals of recruiting and retaining researchers in these fields

4 4 Evaluation Research Questions Do LRPs have a “recruitment effect” -- increase number of individuals who begin research careers in the designated LRP field? Do LRPs have a “recruitment effect” -- increase number of individuals who begin research careers in the designated LRP field? Do LRPs have a “retention effect” -- increase length of time individuals conduct research in LRP field, or in any biomedical field? Do LRPs have a “retention effect” -- increase length of time individuals conduct research in LRP field, or in any biomedical field? Do LRPs have a “productivity effect” -- make awardees more successful than they would have been without the program? Do LRPs have a “productivity effect” -- make awardees more successful than they would have been without the program?

5 5 Conceptual framework for how LRPs might affect outcomes Extramural LRPs might affect: Recruitment into research field, if individuals know about, and are motivated by, LRPs prior to choosing to pursue research career Recruitment into research field, if individuals know about, and are motivated by, LRPs prior to choosing to pursue research career Research retention in LRP field (or in any field) by relieving financial pressures that could otherwise cause individuals to leave research for higher-paying positions Research retention in LRP field (or in any field) by relieving financial pressures that could otherwise cause individuals to leave research for higher-paying positions Research productivity by enabling individuals to devote more time and focus to research Research productivity by enabling individuals to devote more time and focus to research

6 6 Choosing a comparison group To determine what would have happened to extramural LRP awardees absent the program, we need a comparison group. To determine what would have happened to extramural LRP awardees absent the program, we need a comparison group. Should comparison group be “external” (outside the applicant pool) or “internal” (from the applicant pool)? Should comparison group be “external” (outside the applicant pool) or “internal” (from the applicant pool)?

7 7 Why external comparison group not feasible Comparison group would need to be broadly defined because LRP applicants come from such a wide variety of backgrounds. Comparison group would need to be broadly defined because LRP applicants come from such a wide variety of backgrounds. Recruitment might be measured by comparing all doctoral degree recipients who were barely eligible to those who were barely ineligible according to debt-to-salary ratio. But: Recruitment might be measured by comparing all doctoral degree recipients who were barely eligible to those who were barely ineligible according to debt-to-salary ratio. But: --For MDs, sample size needed would be enormous since the portion of MDs conducting research in particular field is so small. --For PhDs, sample sizes in available data sources are not large enough to detect even maximum possible impact of the extramural LRPs. --For PhDs, sample sizes in available data sources are not large enough to detect even maximum possible impact of the extramural LRPs. Retention might be measured with external comparison group, but matching diverse backgrounds and work experiences of LRP participants would be difficult. Retention might be measured with external comparison group, but matching diverse backgrounds and work experiences of LRP participants would be difficult.

8 8 Attractive Features/Possible Concerns of Internal Comparison Group Attractive Features All applicants were interested in LRP and awardees / non-awardees have similar characteristics. All applicants were interested in LRP and awardees / non-awardees have similar characteristics. Administrative data available for full sample. Administrative data available for full sample. Possible Concerns Selection Bias: Can we control for likelihood that funded applicants are more promising researchers than non-funded applicants? Selection Bias: Can we control for likelihood that funded applicants are more promising researchers than non-funded applicants? Will sample sizes be large enough to detect program effects? Will sample sizes be large enough to detect program effects? Could recruitment be measured, since all applicants must have positions in field to be eligible? Could recruitment be measured, since all applicants must have positions in field to be eligible?

9 9 Overcoming Selection Bias If scoring process is known and measured, we can use statistical models to obtain unbiased program effects. Regression discontinuity design can be used if score cut-off point or range is used to make funding decisions. If scoring process is known and measured, we can use statistical models to obtain unbiased program effects. Regression discontinuity design can be used if score cut-off point or range is used to make funding decisions. LRP scoring process is suitable for regression discontinuity design because: Applicants are scored on the basis of their research potential according to standardized criteria. Applicants are scored on the basis of their research potential according to standardized criteria. ICs seem to fund all those above a funding cut-off point or range. (Sometimes ICs go strictly by score in determining who to fund; other times, ICs look at all scores close to “payline” and may choose applicants with lower scores who are doing research in areas of particular interest.) ICs seem to fund all those above a funding cut-off point or range. (Sometimes ICs go strictly by score in determining who to fund; other times, ICs look at all scores close to “payline” and may choose applicants with lower scores who are doing research in areas of particular interest.)

10 10 Hypothetical Effect of Extramural LRP on Length of Time in Research Career

11 11 What size program effects could be detected with available sample sizes? Sample sizes large enough to detect whether 5 LRPs collectively had effect of 10 percentage points Sample sizes large enough to detect whether 5 LRPs collectively had effect of 10 percentage points An effect of 15 percentage points could be detected for certain subgroups An effect of 15 percentage points could be detected for certain subgroups Effects of 10 to 20 percentage points could be detected for the larger LRPs (and would be able to report outcomes for all applicants in each LRP) Effects of 10 to 20 percentage points could be detected for the larger LRPs (and would be able to report outcomes for all applicants in each LRP)

12 12 Measuring Recruitment Effects Recruitment effect is difficult to measure through comparison to non-funded applicants because they must have been funded in relevant field before applying Recruitment effect is difficult to measure through comparison to non-funded applicants because they must have been funded in relevant field before applying But, retrospective survey could gauge: But, retrospective survey could gauge: -- whether applicants knew about LRP before taking research position -- extent to which LRP influenced decision -- how they gauged chances of receiving award

13 13 Outcome Measures Ideally, we could measure LRPs’ effect on whether individuals: Conducted research in LRP field (and persistence) Conducted research in LRP field (and persistence) Conducted research in any field (and persistence) Conducted research in any field (and persistence) Devoted > 50% of time to research in LRP field or any field Devoted > 50% of time to research in LRP field or any field Obtained an NIH R-01 grant Obtained an NIH R-01 grant Were PIs on NIH grant or any grant Were PIs on NIH grant or any grant Had NIH research funding or any research funding Had NIH research funding or any research funding Applied for NIH funding Applied for NIH funding Had tenured academic position Had tenured academic position Conducted research in nonprofit or government setting Conducted research in nonprofit or government setting Were peer reviewers for NIH Were peer reviewers for NIH Were peer reviewers for journals Were peer reviewers for journals Had publications Had publications Had their work cited Had their work cited

14 14 Data Sources Applicant data from OLRS Applicant data from OLRS Publications databases such as PubMed Publications databases such as PubMed Funding databases, such as NIH’s IMPAC-II Funding databases, such as NIH’s IMPAC-II Proposed survey of past applicants Proposed survey of past applicants

15 15 Reasons for Proposed Survey Critical information (e.g., field of research, non-NIH funding, being part of a research team) not readily available from secondary sources. Critical information (e.g., field of research, non-NIH funding, being part of a research team) not readily available from secondary sources. Ability to track publications without name-matching. Ability to track publications without name-matching. Data can be collected sooner (do not have to wait for publication time delays). Data can be collected sooner (do not have to wait for publication time delays). Response bias can be gauged by comparing program effects on certain outcomes (such as whether PI on NIH grant) for full sample to sample of survey respondents. Response bias can be gauged by comparing program effects on certain outcomes (such as whether PI on NIH grant) for full sample to sample of survey respondents.

16 16 Methods Regression discontinuity design will be used to obtain unbiased program effects. Regression discontinuity design will be used to obtain unbiased program effects. Primary analysis would estimate combined effects for all LRPs pooled together. Primary analysis would estimate combined effects for all LRPs pooled together. Model would control for differences between ICs and LRPs (such as scoring patterns or applicant characteristics). Model would control for differences between ICs and LRPs (such as scoring patterns or applicant characteristics).

17 17 Possible Subgroups Each of the larger LRPs Each of the larger LRPs MDs vs. PhDs MDs vs. PhDs Those who received NIH funding vs. those who did not Those who received NIH funding vs. those who did not Those who had higher vs. lower debt levels Those who had higher vs. lower debt levels Those who received their degree recently vs. longer ago Those who received their degree recently vs. longer ago

18 18 Options for Timing of Data Collection Need to strike a balance between providing information on research careers (which could take years) vs. providing timely information to policymakers. Need to strike a balance between providing information on research careers (which could take years) vs. providing timely information to policymakers. Propose measuring early outcomes 4 to 5 years from time of application. Propose measuring early outcomes 4 to 5 years from time of application. Possibly measure long-term outcomes 7 to 9 years after application. Possibly measure long-term outcomes 7 to 9 years after application.

19 19 Sample Selection Propose to include only 2003 and/or 2004 cohorts since number of non-funded applicants in 2001 and 2002 was so small. Propose to include only 2003 and/or 2004 cohorts since number of non-funded applicants in 2001 and 2002 was so small. Sample size? The larger the sample, the more likely we will detect program effects if they exist. The larger the sample, the more likely we will detect program effects if they exist. Large sample is particularly important for measuring effects of LRPs separately or for other subgroups. Large sample is particularly important for measuring effects of LRPs separately or for other subgroups. But, collecting data on large sample will be more costly. But, collecting data on large sample will be more costly.

20 20 Option 1: Include all individuals from 2003 and 2004 cohorts Able to detect smallest program impacts (about 9 percentage points for survey respondents) Able to detect smallest program impacts (about 9 percentage points for survey respondents) Most costly because it has largest sample Most costly because it has largest sample Including 2004 pool means that data collection, analysis would occur later than in Option 2 Including 2004 pool means that data collection, analysis would occur later than in Option 2

21 21 Option 2: Include only the 2003 cohort Less costly than Option 1, involving half the sample Less costly than Option 1, involving half the sample Data collection occurs a year earlier than Option 1 or 3 Data collection occurs a year earlier than Option 1 or 3 Minimum detectable effects largest among 3 options (10 to 13 percentage points) and reduces ability to measure subgroup impacts Minimum detectable effects largest among 3 options (10 to 13 percentage points) and reduces ability to measure subgroup impacts

22 22 Option 3: Clinical LRP only Middle of the three options in terms of sample size and minimum detectable effects (9 to 11 percentage points) Middle of the three options in terms of sample size and minimum detectable effects (9 to 11 percentage points) Would only have results for one LRP Would only have results for one LRP Data would be collected a year later than for Option 2 Data would be collected a year later than for Option 2

23 23 Recommendations and Issues for Consideration If OLRS desires separate estimates for large LRPs and for subgroups, implement Option 1 If OLRS desires separate estimates for large LRPs and for subgroups, implement Option 1 If subgroups not a priority and/or if timeliness is a priority, implement Option 2 If subgroups not a priority and/or if timeliness is a priority, implement Option 2 Option 3 is suitable if OLRS wants to detect relatively small program impacts but is concerned about cost of surveying full sample from all LRPs Option 3 is suitable if OLRS wants to detect relatively small program impacts but is concerned about cost of surveying full sample from all LRPs OLRS needs to consider how small an effect it needs to be able to detect. (Would 7-percent effect be so small that the program would not be considered cost-effective?) OLRS needs to consider how small an effect it needs to be able to detect. (Would 7-percent effect be so small that the program would not be considered cost-effective?)


Download ppt "Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs."

Similar presentations


Ads by Google