Download presentation
Presentation is loading. Please wait.
Published byMorgan Malone Modified over 8 years ago
1
Social Science Data Management & Curation Jared Lyle January 13, 2014
2
New Data
3
http://www.flickr.com/photos/intersectionconsulting/7537238368/in/photostream/
4
http://www.census.gov/
5
Safety Pilot Project 2800 cars, trucks, and buses with Vehicle Awareness Devices, sensors, and video ~ 1 Petabyte of data per year http://www.annarbor.com/assets_c/2012/05/safety-pilot-2-thumb-590x393-112884.jpg http://www.safetypilot.us/
6
MET Longitudinal Database Scale –2 academic years –6 large school districts –6 grade levels –3,000 teachers –44,500 students –24,000 videos –22,500 observation sessions –900+ observers trained by ETS to score videos –~12GB of quantitative data –~10TB of video http://www.icpsr.umich.edu/icpsrweb/METLDB/
7
New Incentives & Discussions
8
http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research
9
Berman, F., and V. Cerf, Who Will Pay for Public Access to Research Data? Science, 2013. 341(6146): p. 616-617.
10
http://www.nytimes.com/2013/02/26/opinion/we-paid-for-the-scientific-research-so-lets-see-it.html
11
“There’s an attitude in the profession that collecting data is for lesser people. That it’s like janitor work; it would dirty our hands. There’s social climbing in academia. So if you write a paper computing an index, that seems low- prestige, so you don’t want to do that. …some of the best theorizing comes after collecting data because then you become aware of another reality.” http://www.nytimes.com/2013/10/20/business/robert-shiller-a-skeptic-and-a-nobel-winner.html -Robert Shiller (2013 Nobel Laureate, Economic Science)
12
Challenges remain
13
“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.” Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf
14
Vines et al. Current Biology 24, 94–97, January 6, 2014 http://dx.doi.org/10.1016/j.cub.2013.11.014 Image: http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdfhttp://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf
15
Griswold et al. (2013) http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf See also: http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data-sharing-worms/http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data-sharing-worms/
16
Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009 See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.” http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf See also: Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.” http://hdl.handle.net/2027.42/78307 http://hdl.handle.net/2027.42/78307
17
Data Management & Curation http://www.icpsr.umich.edu/datamanagement/
18
http://www.icpsr.umich.edu
19
About ICPSR Founded in 1962 as a consortium of 21 universities to share the National Election Survey Today: 700+ members around the world Data dissemination for more than 20 federal and non-government sponsors 600,000+ visitors per year
21
Examples of popular data General Social Surveys, 1972-2012 [Cumulative File] National Longitudinal Study of Adolescent Health (Add Health), 1994-2008 Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2012 Drug Abuse Warning Network (DAWN), 2011 National Survey on Drug Use and Health, 2012 American National Election Study, 1948-2008 Collaborative Psychiatric Epidemiology Surveys (CPES), 2001-2003 [United States]
22
What we do Acquire, curate and archive social science data Distribute data to researchers Preserve data for future generations Provide training in quantitative methods Archive size 8,600+ data collections, over 60,000 data sets Grows by 300+ collections a year
23
Unique capabilities Curated data with rich metadata Digital preservation Bibliography and citation Confidential data Training Community
24
Data Management & Curation http://www.icpsr.umich.edu/datamanagement/
25
Quality
26
A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.
27
Do no harm.
28
http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf
30
Data
31
http://www.guardian.co.uk/science/grrlscientist/2012/mar/29/1
36
Documentation http://dx.doi.org/10.3886/ICPSR31521.v1
37
Variable-level Details National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html
38
Processing History
39
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
40
http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
41
http://www.tdar.org/news/2013/05/announcing-caring-for-digital-data-in-archaeology-a-guide-to-good- practice-co-published-with-ads/
42
Confidentiality
43
Sharing confidential data Safe data: Modify the data to reduce the risk of re-identification Safe places: Physical isolation and secure technologies Safe people: Training and Data use agreements
44
Safe Data Suppressing unique cases Grouping values (e.g., 13-29=1, 30-49=2) Top-coding (e.g., >1,000=1,000) Aggregating geographic areas Swapping values Sampling within a larger data collection Adding “noise” Replacing real data with synthetic data
45
Further Resources: Safe Data Statistical Policy Working Paper 22 - Report on Statistical Disclosure Limitation Methodology http://www.fcsm.gov/working-papers/spwp22.html http://www.fcsm.gov/working-papers/spwp22.html The American Statistical Association, Committee on Privacy and Confidentiality - Methods for Reducing Disclosure Risks When Sharing Data http://www.amstat.org/committees/pc/SDL.html http://www.amstat.org/committees/pc/SDL.html ICPSR's Confidentiality and Privacy web page http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ confidentiality/ http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ confidentiality/
46
Safe Places Secure Deposit Form Secure Processing Environment (SDE) Data protection plans Virtual data enclave Physical enclave
47
The Virtual Data Enclave (VDE) provides remote access to quantitative data in a secure environment.
48
Further Resources: Safe Places ICPSR “Instructions for Preparing the Data Protection Plan” http://www.icpsr.umich.edu/files/ICPSR/access/restricted/ all.pdf http://www.icpsr.umich.edu/files/ICPSR/access/restricted/ all.pdf “Introducing ICPSR’s Virtual Data Enclave (SDE)” http://techaticpsr.blogspot.nl/2012/09/introducing- icpsrs-virtual-data-enclave.html http://techaticpsr.blogspot.nl/2012/09/introducing- icpsrs-virtual-data-enclave.html ICPSR Physical Data Enclave http://www.icpsr.umich.edu/icpsrweb/content/ICPSR/ access/restricted/enclave.html http://www.icpsr.umich.edu/icpsrweb/content/ICPSR/ access/restricted/enclave.html
49
Safe People Staff training Data use agreements –Responsible Use Statement –Research plan –IRB approval –Data protection plan –Behavior rules –Security pledge –Institutional signature
50
Further Resources: Safe People Example NAHDAP Restricted Data Use Agreement http://www.icpsr.umich.edu/files/NAHDAP/GenericRDAAgreement.pdf http://www.icpsr.umich.edu/files/NAHDAP/GenericRDAAgreement.pdf NAHDAP “Restricted-Use Data Deposit and Dissemination Procedures” http://www.icpsr.umich.edu/files/NAHDAP/NAH DAP-RestrictedDataProcedures.pdf http://www.icpsr.umich.edu/files/NAHDAP/NAH DAP-RestrictedDataProcedures.pdf “Navigating Your IRB to Share Restricted Data” Webinar http://bit.ly/Vi3RXdhttp://bit.ly/Vi3RXd
51
Preservation
52
http://www.flickr.com/photos/blude/2665906010/
54
Digital Preservation has a unique set of requirements: –Persistence –Reliability –Scalability –Preserving bits as well as the meaning –Cost Source: Yakel, 2012
55
Digital Preservation Challenges Vulnerabilities of digital information –Neglect –System Failure –Intention Destruction –Errors (Human and System-Induced) –Inter-dependencies (hardware, software, OS) –Context dependencies –Technology Obsolescence –Heterogeneity Source: Yakel, 2012
56
Digital Preservation Challenges Sustainability –Repositories –File formats –Processes –Expertise Source: Yakel, 2012
57
Digital Preservation Policies Digital Preservation Policy Framework –OAIS compliance; organizational capacity; technology and security Access Policy Framework –Access levels; authorization/authentication rules Collection Development Policy –Selection and appraisal criteria; areas of emphasis Disaster Planning Policy Framework –Business continuity, communications, disaster recovery www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/policies/
58
Repository Assessments TRAC/ISO 16363 Data Seal of Approval World Data System
59
Attribution
61
Title Author Date Version Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
62
Example
63
http://www.icpsr.umich.edu/icpsrweb/ICPSR/citations/
65
Access
66
http://icpsr.umich.edu/datamanagement/ostp.html ICPSR’s Guidelines for OSTP Data Access Plan Page See also: http://youtu.be/sWnMFEKmfnE
67
http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/
68
Tools http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/tools.html
69
http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ Manage and Curate to Share
70
See: http://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0149http://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0149 Summer Program Course: Curating and Managing Research Data for Re-Use
71
http://www.flickr.com/photos/dolescum/5964978575/
72
Thank you! lyle@umich.edu
74
Comparing variables across studies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.