Social Science Data Management & Curation Jared Lyle January 13, 2014.

Slides:



Advertisements
Similar presentations
ICPSR and the Data Seal of Approval Mary Vardigan Assistant Director, ICPSR December 10, 2012.
Advertisements

Resources for Social Sciences
Data Citation for the Social Sciences Mary Vardigan ICPSR CODATA Conference on Data Attribution and Citation August 22-23, 2011.
Developments in Data Discovery at ICPSR George Alter Director, ICPSR University of Michigan.
Clearing the Path for Data Discovery and Re-Use Thomson Reuters Panel Discussion: Libraries Taking a Leading Role in Data Curation and Preservation Elizabeth.
ICPSR and the Data Seal of Approval: A Case Study Mary Vardigan Assistant Director, ICPSR October 8, 2013.
IASSIST 2003 Changes in the Way Data Archives Process Data Data Processing at ICPSR Darrell Donakowski.
Introducing ICPSR An Electronic Brochure. Our Mission ICPSR provides leadership and training in data access, curation, and methods of analysis for a diverse.
An integrated system for handling restricted use data Felicia LeClere, Ph.D. IASSIST 2009 Tampere, Finland.
NSF Data Management Plan Requirements Alex Kanous
Archiving our Social Science Digital History ECURE 2005 March 1, 2005.
Data-PASS/NDIIPP: A new effort to harvest our history IASSIST/IFDO 2005 May, 25, 2005.
Building Partnerships Between Social Science Data Archives and Institutional Repositories Jared Lyle ICPSR University of Michigan IASSIST 2010.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
INTRODUCTION TO RESEARCH DATA MANAGEMENT Robin Desmeules Janice Kung J W Scott Health Sciences Library University of Alberta Libraries.
Information and Communication Technologies in the field of general education in Armenia NATIONAL CENTER OF EDUCATIONAL TECHNOLOGIES.
Archiving and Sharing Confidential Data in the Social Sciences George Alter Director, ICPSR.
An Applied Approach to Data Curation Training at ICPSR Jared Lyle 6 May 2013.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
Open for ^ Business Research Data Services & Data Management Planning Ryan Schryver Wendt Commons is our.
New Products for ©  2009 ANGEL Learning, Inc. Proprietary and Confidential, 2 Update Summary Enrich teaching and learning Meet accountability needs.
Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle.
Welcome! Some logistics… Ehrlicher Room and North Quad details Schedule each day and across week Wireless access Binder and electronic docs Summer Program.
DINI „Electronic Publishing Group“ DINI – Certificate Document and Publication Repositories “Electronic Publishing Group“
24 March 2010Atlanta, Georgia Passing it on: Notes on digital initiative sustainability Marty Kurth HBCU Library Alliance – Cornell University Library.
Information and Data in e-Science: Making Seamless Access a Reality Merry Bullock, Ph.D. Senior Director, Office of International Affairs, American Psychological.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Trustworthy Repositories, Organizations & Infrastructure Micah Altman, Institute for Quantitative Social Science, Harvard University Jonathan Crabtree,
An Environmental Scan for Data Services Trends that are shaping today’s environment for data services.
DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle Digital Curation Program Development Nancy Y McGovern Research Assistant.
Dr. Fran Berman, RPI Feedback from BRDI Sponsor Forum 11/11 January 29, 2012 Fran Berman.
Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Security Policies and Procedures. cs490ns-cotter2 Objectives Define the security policy cycle Explain risk identification Design a security policy –Define.
Environmental Management System Definitions
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Background Researchers and funders continue to be concerned about the lack of archiving of scientific data. Such data can be useful to researchers, educators,
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Institutional Repositories July 2007 Intellectual property management : the DISA experience Dr D Peters DISA: Digital Innovation South Africa.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Institutional data curation implementation 1st African Digital Curation Conference 12 February 2008.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
1 Confidentiality and Data Access Committee Jacob Bournazian, Chair Energy Information Administration BTS Confidentiality Seminar Series June 11, 2003.
Working with Data at its Source: Partnering with Researchers to Share Their Data for Archiving and Discovery Ron Nakao – Stanford University Libraries.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
“Data from national surveys: access, analysis, and sharing”
Social and Behavioral Science Data
Open Exeter Project Team
Data Confidentiality and the Common Good.
OceanDocs Digital Repository of Marine Science Research Outputs
Trustworthiness of Preservation Systems
Karen Dennison Collections Development Manager
Researcher Credentialing: A Proposed System for Improving Access to Restricted Data Margaret Levenstein with Linda Detterman, Peter Granda, Jared Lyle,
Sophia Lafferty-hess | research data manager
ICPSR Census Metadata Repository
Research Infrastructures: Ensuring trust and quality of data
Protecting Confidential Data
Presentation transcript:

Social Science Data Management & Curation Jared Lyle January 13, 2014

New Data

Safety Pilot Project 2800 cars, trucks, and buses with Vehicle Awareness Devices, sensors, and video ~ 1 Petabyte of data per year

MET Longitudinal Database Scale –2 academic years –6 large school districts –6 grade levels –3,000 teachers –44,500 students –24,000 videos –22,500 observation sessions –900+ observers trained by ETS to score videos –~12GB of quantitative data –~10TB of video

New Incentives & Discussions

Berman, F., and V. Cerf, Who Will Pay for Public Access to Research Data? Science, (6146): p

“There’s an attitude in the profession that collecting data is for lesser people. That it’s like janitor work; it would dirty our hands. There’s social climbing in academia. So if you write a paper computing an index, that seems low- prestige, so you don’t want to do that. …some of the best theorizing comes after collecting data because then you become aware of another reality.” -Robert Shiller (2013 Nobel Laureate, Economic Science)

Challenges remain

“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.” Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.”

Vines et al. Current Biology 24, 94–97, January 6, Image:

Griswold et al. (2013) See also:

Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.” See also: Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.”

Data Management & Curation

About ICPSR Founded in 1962 as a consortium of 21 universities to share the National Election Survey Today: 700+ members around the world Data dissemination for more than 20 federal and non-government sponsors 600,000+ visitors per year

Examples of popular data General Social Surveys, [Cumulative File] National Longitudinal Study of Adolescent Health (Add Health), Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2012 Drug Abuse Warning Network (DAWN), 2011 National Survey on Drug Use and Health, 2012 American National Election Study, Collaborative Psychiatric Epidemiology Surveys (CPES), [United States]

What we do Acquire, curate and archive social science data Distribute data to researchers Preserve data for future generations Provide training in quantitative methods Archive size 8,600+ data collections, over 60,000 data sets Grows by 300+ collections a year

Unique capabilities Curated data with rich metadata Digital preservation Bibliography and citation Confidential data Training Community

Data Management & Curation

Quality

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

Data

Documentation

Variable-level Details National Longitudinal Study of Adolescent Health (Add Health), (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook.

Processing History

practice-co-published-with-ads/

Confidentiality

Sharing confidential data Safe data: Modify the data to reduce the risk of re-identification Safe places: Physical isolation and secure technologies Safe people: Training and Data use agreements

Safe Data Suppressing unique cases Grouping values (e.g., 13-29=1, 30-49=2) Top-coding (e.g., >1,000=1,000) Aggregating geographic areas Swapping values Sampling within a larger data collection Adding “noise” Replacing real data with synthetic data

Further Resources: Safe Data Statistical Policy Working Paper 22 - Report on Statistical Disclosure Limitation Methodology The American Statistical Association, Committee on Privacy and Confidentiality - Methods for Reducing Disclosure Risks When Sharing Data ICPSR's Confidentiality and Privacy web page confidentiality/ confidentiality/

Safe Places Secure Deposit Form Secure Processing Environment (SDE) Data protection plans Virtual data enclave Physical enclave

The Virtual Data Enclave (VDE) provides remote access to quantitative data in a secure environment.

Further Resources: Safe Places ICPSR “Instructions for Preparing the Data Protection Plan” all.pdf all.pdf “Introducing ICPSR’s Virtual Data Enclave (SDE)” icpsrs-virtual-data-enclave.html icpsrs-virtual-data-enclave.html ICPSR Physical Data Enclave access/restricted/enclave.html access/restricted/enclave.html

Safe People Staff training Data use agreements –Responsible Use Statement –Research plan –IRB approval –Data protection plan –Behavior rules –Security pledge –Institutional signature

Further Resources: Safe People Example NAHDAP Restricted Data Use Agreement NAHDAP “Restricted-Use Data Deposit and Dissemination Procedures” DAP-RestrictedDataProcedures.pdf DAP-RestrictedDataProcedures.pdf “Navigating Your IRB to Share Restricted Data” Webinar

Preservation

Digital Preservation has a unique set of requirements: –Persistence –Reliability –Scalability –Preserving bits as well as the meaning –Cost Source: Yakel, 2012

Digital Preservation Challenges Vulnerabilities of digital information –Neglect –System Failure –Intention Destruction –Errors (Human and System-Induced) –Inter-dependencies (hardware, software, OS) –Context dependencies –Technology Obsolescence –Heterogeneity Source: Yakel, 2012

Digital Preservation Challenges Sustainability –Repositories –File formats –Processes –Expertise Source: Yakel, 2012

Digital Preservation Policies Digital Preservation Policy Framework –OAIS compliance; organizational capacity; technology and security Access Policy Framework –Access levels; authorization/authentication rules Collection Development Policy –Selection and appraisal criteria; areas of emphasis Disaster Planning Policy Framework –Business continuity, communications, disaster recovery

Repository Assessments TRAC/ISO Data Seal of Approval World Data System

Attribution

Title Author Date Version Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)

Example

Access

ICPSR’s Guidelines for OSTP Data Access Plan Page See also:

Tools

Manage and Curate to Share

See: Summer Program Course: Curating and Managing Research Data for Re-Use

Thank you!

Comparing variables across studies