Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data management aspects in the social sciences

Similar presentations


Presentation on theme: "Data management aspects in the social sciences"— Presentation transcript:

1 Data management aspects in the social sciences
I’m going to present some examples of social science data and some characteristics of the field. Marjan Grootveld, DANS Presenting also slides by Marion Wittenberg and Peter Doorn, DANS Workshop on Active DMPs – Geneva, June 2016

2 On the agenda DANS services Social science traits Example datasets
Data management training My personal concerns Disclaimers: it’s just a selection, and I’m not even a social scientist myself.

3 DANS First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 Mission: promote and provide permanent access to digital research information

4 Data Archiving in Humanities and Social Sciences
Data collection and data processing  awareness of the value of preserving data for re-use: for validating the results of earlier research for comparative analysis for secondary analysis: answering new research questions with existing data Emergence of data archives: text archives for linguistics and literary studies university repositories; general data sharing facilities social science data archives historical data archives archaeology data archives The core and the start of the DANS EASY archive is the Steinmetz Social Science archive. IPUMS – Integrated Public Use Microdata Series. Census microdata for social and economic research 1960s 1970s 1980s 1990s 2000s 2010s ICPSR, ZA, UKDA Steinmetz Oxford Text Archive NHDA, HDS, IPUMS ADS, EDNA Dataverse, Zenodo, Figshare, B2Suite

5 Core online services DataverseNL for short- and mid-term storage
EASY: certified long-term Electronic Archiving System for self-deposit NARCIS: Gateway to scholarly information in the Netherlands

6 Data access by discipline in DANS archive
The bulk of our holdings consists of archaeological data. In recent years opened up, but a part is still only available to fellow-archaeologists, to prevent treasure hunting. The motto has always been ‘Open when possible, restriced when necessary’, and ‘open’ entailed registration, for the sake of getting management information. However, since last year data producers can deposit data with a CC0 license, and and you can download these data without being logged onto the system. We have asked all depositors whose data are Open under the original Open regime whether we shall change the regime to CC0. We already received rather positive answers and once we have received all responses, we’ll change these settings. And then the share of dark green will be somewhat larger. * Without archaeology

7 Datasets in DANS archive according to size
The long tail of research data By far the largest part of the data in our holdings is small to very small – depending on what you are used to yourself. This size is typical of social science and humanities studies. So much for our online services. DANS also provides training and consultancy, not just about the process of archiving data, but actually more about the stages preceding and following that: research data management

8 RDM support: DANS DMP brochure
A simple and tangible result is our brochure on DMPlanning. Our DMP brochure is based on consultancy projects that we have carried out for Dutch research funders NWO and ZonMw, on several other DMP templates, and obviously on what we as a sustainable archive have learned from our customers, e.g. by advising researchers during the data ingest stage. To conclude this part, let me mention one of our collaborations: RDNL material?set_language=en

9 Research Data Netherlands
Collaboration of DANS, 4TU.ResearchData and SURFsara to promote sustained access to and responsible re-use of digital research data Essentials 4 Data Support RDNL consists of three national partners who provide long-term data archives and other services. We see ourselves as back-offices in the data landscape and offer for instance a Data training for the support staff in the front offices.

10 Large players in Social Science data
DANS is also the Dutch member of the CESSDA RI. The primary goal of CESSDA is to facilitate and promote more and wider use of high-quality dat ain social, economic and political research and, in turn, to improve our understanding of ongoing societal processes.

11 Borgman: Data Scholarship in the Social Sciences
‘The social studies encompass research on human behavior in the past, present, and future’ (p.125) ‘The social sciences articulate their research methods more explicitly than do most fields’ (p. 126) ‘...characterized more by shared knowledge than by shared technical infrastructures’ (p.157) ‘diffuse data sources, fuzzy boundaries between fields, political sensitivity of topics, and the array of stakeholders’ (p.160) Information Studies at UCLA Christine L. Borgman: Big data, little data, no data – Scholarship in a networked world. MIT Press, 2015.

12 Social science traits (over-generalised!)
Quantitative research, e.g. surveys (lots of variables > codebook needed) and qualitative research, e.g. interviews and observations May involve individual people > ethical issues, informed consent forms, sensitive or anonymised data Often longitudinal research (e.g. the start of the International Social Survey Programm (ISSP) was in 1972) Mixed attitude towards sharing and reusing data, e.g. Political scientists are used to sharing data Economists often explore private third-party data (cannot be released or archived afterwards) Sociotechnical researchers cannot release or reproduce all materials (lab journals remain property of the lab) (Borgman, p. 149) For psychologists research methodology may have more value than the data Recent NL tendency (Oldenburg): publication packages along with publication: data + statistical syntax queries Beau Oldenburg: Integriteit en duurzaamheid in het digitale tijdperk. White paper DANS, (in Dutch)

13 Example dataset 1 5 MB DC metadata Information to find the data
Still needs information on how to use the data > many SS researchers and repositories use the rich DDI metadata… 5 MB

14 DDI - Data Documentation Initiative http://www.ddialliance.org/
International standard for describing data from the social, behavioral, and economic sciences Documenting data with DDI facilitates interpretation and understanding - both by humans and computers Codebook and Lifecycle See also

15 DDI-Codebook DDI-Codebook is a light-weight version of the standard, intended primarily to document simple survey data To make DDI codebooks you can make use of the NESSTAR publisher Example DANS NESSTAR server NESSTAR publisher rewrites data information in SPSS to the DDI format.

16 Example 2: inspect survey outcomes online
This is what NESSTAR looks like. It has been developed by the Norwegian Social Science Data Archive within an FP6 project, and it is for instance very popular within the CESSDA community. At the right-hand side you see that you can drill down to values and value labels that were assigned in DDI. The percentages are computed by NESSTAR from the data; they are not part of the metadata, as you will understand.

17 DDI-Lifecycle DDI-Lifecycle is designed to document and manage data across the entire life cycle, from conceptualisation to data publication, analysis and beyond. E.g. Survey Data Netherlands

18 Researchers can make their own selection of the data and download this, to analyse it with their own software. Initiatives are underway to develop question tools, but even with them (or especially with third-party tools) one needs methodological knowledge. DDI Codebook is easy to use. For an individual researcher the learning curve/ the time investment might be largish, but for a project and a team the advantages of this investment are clear: it’s easier to share the data, to publish them online and to also publish the metadata in a human-readable format. For DDI, especially for the Lifecycle version, a data manager is recommended.

19 Ex. 4: Interview project inspired DMP training
600 interviews in DANS archive Use case in Essentials 4 Data Support training The What, Why and How of Data Management Planning

20 DMP and data organisation assignments
Design a data organisation for the Veterans project (folder structure, file naming convention, …)

21 Outcome of the assignments
Writing the DMP is always a real confidence booster. Discussing the data organisation for 10 minutes gives already a lot of insight. A dataset contains more than the data… Common assumption that ALL files are either Open or Restricted. (Relevant for H2020 practice to address different subsets in the DMP.) Realisation that planning RDM is teamwork.

22 Data Availability policy
Stakeholders in RDM Institution RDM policy Facilities €$£ Research funders Publishers Data Availability policy Commercial partners

23 pecuniae investigationis curatore sed vitae facimus
Non pecuniae investigationis curatore sed vitae facimus programmas datorum procurationis (Not for the research funder but for life we make data management plans) Image by Chrause via wikimedia.org/wiki/File%3ANon_scolae.jpg

24 On a personal note In social sciences, with many long-tail data sets and small teams, using a simple and generic DMP template is a huge step forward. But to align with e-humanities, text and data mining etc.: Funders should require that (medium to) large projects comply with standards. Data management is all in a day’s work. Planning is more important than the plan, and it is a team activity. There are only so many wheels that can be re-invented on a public budget

25 http://bit.ly/28OfLIK Upcoming webinar: How to write a DMP?
July 7, CEST

26 On a personal note In social sciences, with many long-tail data sets and small teams, using a simple and generic DMP template is a huge step forward. But to align with e-humanities, text and data mining etc.: Funders should require that (mid to) large projects comply with standards. Data management is all in a day’s work. Planning is more important than the plan, and it is a team activity. - DANS archive


Download ppt "Data management aspects in the social sciences"

Similar presentations


Ads by Google