Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Science Data Management & Curation Jared Lyle January 13, 2014.

Similar presentations


Presentation on theme: "Social Science Data Management & Curation Jared Lyle January 13, 2014."— Presentation transcript:

1 Social Science Data Management & Curation Jared Lyle January 13, 2014

2 New Data

3 http://www.flickr.com/photos/intersectionconsulting/7537238368/in/photostream/

4 http://www.census.gov/

5 Safety Pilot Project 2800 cars, trucks, and buses with Vehicle Awareness Devices, sensors, and video ~ 1 Petabyte of data per year http://www.annarbor.com/assets_c/2012/05/safety-pilot-2-thumb-590x393-112884.jpg http://www.safetypilot.us/

6 MET Longitudinal Database Scale –2 academic years –6 large school districts –6 grade levels –3,000 teachers –44,500 students –24,000 videos –22,500 observation sessions –900+ observers trained by ETS to score videos –~12GB of quantitative data –~10TB of video http://www.icpsr.umich.edu/icpsrweb/METLDB/

7 New Incentives & Discussions

8 http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research

9 Berman, F., and V. Cerf, Who Will Pay for Public Access to Research Data? Science, 2013. 341(6146): p. 616-617.

10 http://www.nytimes.com/2013/02/26/opinion/we-paid-for-the-scientific-research-so-lets-see-it.html

11 “There’s an attitude in the profession that collecting data is for lesser people. That it’s like janitor work; it would dirty our hands. There’s social climbing in academia. So if you write a paper computing an index, that seems low- prestige, so you don’t want to do that. …some of the best theorizing comes after collecting data because then you become aware of another reality.” http://www.nytimes.com/2013/10/20/business/robert-shiller-a-skeptic-and-a-nobel-winner.html -Robert Shiller (2013 Nobel Laureate, Economic Science)

12 Challenges remain

13 “It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.” Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf

14 Vines et al. Current Biology 24, 94–97, January 6, 2014 http://dx.doi.org/10.1016/j.cub.2013.11.014 Image: http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdfhttp://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf

15 Griswold et al. (2013) http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf See also: http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data-sharing-worms/http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data-sharing-worms/

16 Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009 See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.” http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf See also: Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.” http://hdl.handle.net/2027.42/78307 http://hdl.handle.net/2027.42/78307

17 Data Management & Curation http://www.icpsr.umich.edu/datamanagement/

18 http://www.icpsr.umich.edu

19 About ICPSR Founded in 1962 as a consortium of 21 universities to share the National Election Survey Today: 700+ members around the world Data dissemination for more than 20 federal and non-government sponsors 600,000+ visitors per year

20

21 Examples of popular data General Social Surveys, 1972-2012 [Cumulative File] National Longitudinal Study of Adolescent Health (Add Health), 1994-2008 Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2012 Drug Abuse Warning Network (DAWN), 2011 National Survey on Drug Use and Health, 2012 American National Election Study, 1948-2008 Collaborative Psychiatric Epidemiology Surveys (CPES), 2001-2003 [United States]

22 What we do Acquire, curate and archive social science data Distribute data to researchers Preserve data for future generations Provide training in quantitative methods Archive size 8,600+ data collections, over 60,000 data sets Grows by 300+ collections a year

23 Unique capabilities Curated data with rich metadata Digital preservation Bibliography and citation Confidential data Training Community

24 Data Management & Curation http://www.icpsr.umich.edu/datamanagement/

25 Quality

26 A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

27 Do no harm.

28 http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

29

30 Data

31 http://www.guardian.co.uk/science/grrlscientist/2012/mar/29/1

32

33

34

35

36 Documentation http://dx.doi.org/10.3886/ICPSR31521.v1

37 Variable-level Details National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

38 Processing History

39 http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

40 http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

41 http://www.tdar.org/news/2013/05/announcing-caring-for-digital-data-in-archaeology-a-guide-to-good- practice-co-published-with-ads/

42 Confidentiality

43 Sharing confidential data Safe data: Modify the data to reduce the risk of re-identification Safe places: Physical isolation and secure technologies Safe people: Training and Data use agreements

44 Safe Data Suppressing unique cases Grouping values (e.g., 13-29=1, 30-49=2) Top-coding (e.g., >1,000=1,000) Aggregating geographic areas Swapping values Sampling within a larger data collection Adding “noise” Replacing real data with synthetic data

45 Further Resources: Safe Data Statistical Policy Working Paper 22 - Report on Statistical Disclosure Limitation Methodology http://www.fcsm.gov/working-papers/spwp22.html http://www.fcsm.gov/working-papers/spwp22.html The American Statistical Association, Committee on Privacy and Confidentiality - Methods for Reducing Disclosure Risks When Sharing Data http://www.amstat.org/committees/pc/SDL.html http://www.amstat.org/committees/pc/SDL.html ICPSR's Confidentiality and Privacy web page http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ confidentiality/ http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ confidentiality/

46 Safe Places Secure Deposit Form Secure Processing Environment (SDE) Data protection plans Virtual data enclave Physical enclave

47 The Virtual Data Enclave (VDE) provides remote access to quantitative data in a secure environment.

48 Further Resources: Safe Places ICPSR “Instructions for Preparing the Data Protection Plan” http://www.icpsr.umich.edu/files/ICPSR/access/restricted/ all.pdf http://www.icpsr.umich.edu/files/ICPSR/access/restricted/ all.pdf “Introducing ICPSR’s Virtual Data Enclave (SDE)” http://techaticpsr.blogspot.nl/2012/09/introducing- icpsrs-virtual-data-enclave.html http://techaticpsr.blogspot.nl/2012/09/introducing- icpsrs-virtual-data-enclave.html ICPSR Physical Data Enclave http://www.icpsr.umich.edu/icpsrweb/content/ICPSR/ access/restricted/enclave.html http://www.icpsr.umich.edu/icpsrweb/content/ICPSR/ access/restricted/enclave.html

49 Safe People Staff training Data use agreements –Responsible Use Statement –Research plan –IRB approval –Data protection plan –Behavior rules –Security pledge –Institutional signature

50 Further Resources: Safe People Example NAHDAP Restricted Data Use Agreement http://www.icpsr.umich.edu/files/NAHDAP/GenericRDAAgreement.pdf http://www.icpsr.umich.edu/files/NAHDAP/GenericRDAAgreement.pdf NAHDAP “Restricted-Use Data Deposit and Dissemination Procedures” http://www.icpsr.umich.edu/files/NAHDAP/NAH DAP-RestrictedDataProcedures.pdf http://www.icpsr.umich.edu/files/NAHDAP/NAH DAP-RestrictedDataProcedures.pdf “Navigating Your IRB to Share Restricted Data” Webinar http://bit.ly/Vi3RXdhttp://bit.ly/Vi3RXd

51 Preservation

52 http://www.flickr.com/photos/blude/2665906010/

53

54 Digital Preservation has a unique set of requirements: –Persistence –Reliability –Scalability –Preserving bits as well as the meaning –Cost Source: Yakel, 2012

55 Digital Preservation Challenges Vulnerabilities of digital information –Neglect –System Failure –Intention Destruction –Errors (Human and System-Induced) –Inter-dependencies (hardware, software, OS) –Context dependencies –Technology Obsolescence –Heterogeneity Source: Yakel, 2012

56 Digital Preservation Challenges Sustainability –Repositories –File formats –Processes –Expertise Source: Yakel, 2012

57 Digital Preservation Policies Digital Preservation Policy Framework –OAIS compliance; organizational capacity; technology and security Access Policy Framework –Access levels; authorization/authentication rules Collection Development Policy –Selection and appraisal criteria; areas of emphasis Disaster Planning Policy Framework –Business continuity, communications, disaster recovery www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/policies/

58 Repository Assessments TRAC/ISO 16363 Data Seal of Approval World Data System

59 Attribution

60

61 Title Author Date Version Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)

62 Example

63 http://www.icpsr.umich.edu/icpsrweb/ICPSR/citations/

64

65 Access

66 http://icpsr.umich.edu/datamanagement/ostp.html ICPSR’s Guidelines for OSTP Data Access Plan Page See also: http://youtu.be/sWnMFEKmfnE

67 http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/

68 Tools http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/tools.html

69 http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/ Manage and Curate to Share

70 See: http://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0149http://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0149 Summer Program Course: Curating and Managing Research Data for Re-Use

71 http://www.flickr.com/photos/dolescum/5964978575/

72 Thank you! lyle@umich.edu

73

74 Comparing variables across studies


Download ppt "Social Science Data Management & Curation Jared Lyle January 13, 2014."

Similar presentations


Ads by Google