The LEADS Database at ICPSR: Identifying Important Social Science Studies for Archiving Presentation prepared for 2006 Annual Meeting of the IASSIST Friday, May 26, 2006
LEADS at ICPSR We would like to know the “universe” of social science data that have been collected Identification Data-PASS and ICPSR would like to know how much social science data is “at risk” of being lost or has been lost Appraisal We would also like to know what “at risk” social science data are important enough to be archived
Or…. What have we failed to catch? How big is the “one” that got away?
What is LEADS? LEADS is a database of records containing information about scientific studies that may have produced social science data LEADS contains descriptive information about various scientific studies that have been identified. LEADS also contains information that can be used to determine the “fit” and “value” of a scientific study LEADS keeps a record of all human (staff) decisions that have been made about the fit and value of a scientific study.
Sources of Records in LEADS NSF research grant awards downloaded from nsf.gov NIH research grant awards downloaded from CRISP Prospective searches of topical areas/journals Researcher nominations (self or other)
NSF Grant Awards in LEADS (pre-screening) LEADS contains 17,194 awards made by NSF LEADS spans 30 years of awards to 2005 LEADS spans 53 NSF organizations that award grants Of the 53 organizations, the 4 organizations with the most records screened (each contributing 1,000+ records) were: SES: Social and Economic Sciences BCS: Behavioral and Cognitive Sciences DMS: Mathematical Sciences IOB: Integrative and Organism Biology
Total # NSF Grant Awards by Year
Screening Criteria Social science and/or behavioral science Original or primary data collection proposed, including assembling a database from existing (archival) sources
Activity in NSF Grants (n=17,194) Type of Activity Proposed%N= Not Social Science47.98,237 Training/Workshop/Conf Social Science Primary Data Collection13.62,336 Secondary Analysis Primary & Secondary (combination) No Data Collection or Analysis No Abstract15.52,664 Flagged & Other13.02,232
Types of Research Activity NSF has Awarded by Year ***Abstracts become widely available 1987+***
Most Prevalent Social Science Primary Data Collection Awards by NSF Organization % of total for divisionn of awards Antarctic Sciences Div Arctic Science Div Behavioral & Cognitive Sciences Social & Economic Sciences Research, Evaluation & Communication Information & Intelligent Systems
Other Fields Coded During Screening Topic/Discipline Data Collection Methodology Sampling Characteristics
Topic/Discipline in NSF Awards for Primary Social Science Data Collection # of NSF Awards An additional 1,594 records coded “General Social Science”
Type of Data Collection Method/Design in NSF Awards for Primary Social Science Data Collection # of NSF Awards
NSF Awards for Social Science Primary Data: Proposed Sampling Method Percent of TotalN= Probability Sample Proposed Non-Probability Sample Proposed1.023 Not Specified/Missing93.72,190
NSF Awards for Social Science Primary Data: Type of Sampling Frame Proposed Sampling FramePercent of TotalN= U.S. - National U.S. - Regional International – Including U.S International – Excluding U.S Not Specified/Missing
NSF Awards for Social Science Primary Data: Proposed Sample Size Sample SizePercentN= 1, < Not specified/Missing85.31,994
NSF Awards for Social Science Primary Data: Race/Ethnic Distribution of Sample PercentN= Multiple Races Single Race Study Not Specified/Missing88.62,069 Any Whites2.150 Any African Americans4.297 Any Latinos3.070 Any Asians3.990 Any Other Non-Whites1.739
Gender Distribution Sampled, When Known (n=164)
Children/Adult Sampled, When Known (n=235)
Following-Up: Prospects for Data Archiving N=2,336Primary Social Science Data Collection Awards N=201 Combined Data Collection Activity and Secondary Data, Social Science Research Steps: Select ~10-20 records per week Generate updated contact information for PI Determine if “obviously” archived already (ICPSR, Roper, Odum, Murray, Sociometrics, GOOGLE) Review related citations Review other NSF awards made to PI Contact PI (Data Produced? Data Archived? Data Still Available?)
Other Qualitative Fields in LEADS Description of how the collection fits within the scope of important social science studies Description of the value of the study for archiving Priority ranking Citations PI communication
Problems archiving studies… PI unsure where data are stored Data are in an old format that we may or may not be able to recover Physical condition (storage media or documentation) has deteriorated Paper copy documentation only, incomplete documentation No English language documentation
NIH records in LEADS We screened NIH awards for (1) social science/behavioral, (2)original data & (3) quantitative All NIH Institutes ( ) NICHD, NIA, NIMH, NINR, AHRQ, NIAAA, NIDA, clinical Center, NIDCD, FIC, NCI, NHLBI, NIDDK (all years) 172,196 - total # awards screened 6,381 – selected awards
Challenges & Limitations Size and scope of this project Need for PI cooperation Screening error rate has not been quantified Addressing the ambiguous records Collaborative projects and continuation projects have not been eliminated
Conclusions NIH & NSF award databases are a valuable source of information about studies “at risk” of being lost PI grant abstracts are highly variable regarding amount of detail about research aims & methodology Preliminary results suggest that few studies have been archived; although the rate is higher for NSF The large number of unarchived studies requires us to use appraisal methods to determine a particular study’s value for archiving
People Working on LEADS NSF Darrell Donakowski Lisa Quist Jared Lyle Tannaz Sabet NIH Russ Hathaway Felicia LeClere Brian Madden James McNally JoAnne O’Rourke Kelly Zidar