Key data management issues for social science research Data management for ESRC Research Centres and Programmes 3 November 2009, London Louise Corti UK Data Archive University of Essex
Overview Why share and how to share Challenges in re-using and sharing data Key areas in data management and sharing Data Management Policies Role of the UKDA in supporting researchers New initiatives
OECD message: widespread data sharing will enable researchers, empower citizens and convey tremendous scientific, economic, and social benefits sharing data: –facilitates research often beyond the scope of the original research –demonstrates continued usage of data –encourages scientific inquiry –avoids duplicate data collection –provides resources for education and training many research councils are now committed to a long-term strategy for data resource provision and for supporting UK researchers most research data can be shared with other researchers! Why share and how to share
Principles of Data Sharing Policies publicly funded research data are a valuable, long term resource researchers must collect data in such a way as to ensure longer- term sharing researchers must document data in such a way as to ensure longer-term usability to ensure maximum research exploitation data must be managed effectively from day 1 researchers require support for data management through the life of the project data must be made available by researchers for long-term archiving and dissemination to Research Council supported data centres
Data Management Policies Mandatory: –ESRC Data Policy –NERC Data Policy –RELU Data Management Policy –MRC Data Sharing and Preservation Policy Advised and encouraged –BBSRC Data Policy –Wellcome Trust Policy on Data Management and Sharing –British Academy
Successful data sharing policies Funder commitment: contractual obligations funded support and archiving infrastructure instilling positive attitudes towards sharing data encouraging researchers to liaise with data experts throughout the lifecycle of data creation enabling capacity to support everyone who requires support encouraging deposit of ethically and legally-shareable high quality data and documentation peers reviewers to advise on the long-term value of research data policing and penalties for “defaulters”
Support centre/archive: –good partnership and communication with funding bodies –regular updates about new data creation activities –Data Management Plans and consent forms to review –building and providing tools for research groups to share informally For award holders: –data recognised as a valid academic output -ISDN? –recognition is given for re-using data –realistic data management plans Successful data sharing policies
VALUE OF SHARING
How can research data be used? description and context comparative research, restudy or follow-up study re-analysis/secondary analysis research design and methodological advancement replication/validation of published work teaching and learning
Where can you find research data? Researchers, research groups and organisations 3universities, public and private sectors 3older material often held by default/legacy/a record/cupboard/old digital media University Archives 3strong social science research tradition 3already significant collections of older research materials Libraries and Museums 3strong historical, ethnology, oral or local history and collections Data Archives and Digital libraries 3proactive acquisition of digital research resources
Accessing local collections if someone (internal or external) wanted to use your Centre’s data? –where would they start and who would they ask? –is there an historical record of research, data and outputs –would you have support staff to deal with “users” –would you be able to find older data (PI has left)? some centres are digitising materials and records, but it is very expensive money is scarce very few would offer an access service, let alone online would probably rely on the PI having to make time to collaborate manage it well, archive it and the burden is diminished
CHALLENGES OF RE-USING DATA
Why are qualitative data not shared that much? cultural reasons – not practised by the typical qualitative researcher has never been a well-documented research method no standardised methods of data description few dedicated archives no real dissemination infrastructure no co-ordinated resource discovery issues about the nature of the ‘special relationships’ of ethnographer and participant
‘Difficulties’ cited in re-using data constraints of informed consent and ethics ownership of qualitative data...whose are they? problem of the implicit nature of qualitative data collection and analysis – context and reflexivity lack of time to get fully acquainted with research materials created by someone else insecurity about the exposure of one’s own research practice or threat of misinterpretation ‘no data to suit my needs’ - lack of publicly available research data
Ethical and consent considerations archived data should always conform to ethical and legal guidelines with respect to the preservation of anonymity when this has been requested by informants or guaranteed to them consent and agreements for sharing CAN be made at the time during the research process/ fieldwork – and afterwards various additional strategies for sharing –editing the original data –restricting access/vetting –user undertakings with legal back-up
Are these data really yours? copyright needs examining…speaker in an interview owns their words, the researcher does not. Researcher only owns the recording or transcription… actually, the funder owns copyright too as does your employer if your data has any monetary ‘value’ your employer will be the first to claim it! publicly funded data should be shared
I wasn’t there but….. yes – of course being present during fieldwork adds a richness over and above the raw data but context can be provided at many levels –audio-visual record and full transcription –description of setting, observations –details of methods, sampling, analysis –relevant macro-level details (period, events etc.) good to interview the principal investigator, research team if historians can make use of partial sources that are centuries old, why can’t sociologists?
…historians have done this for centuries!
CHALLENGES OF SHARING
Challenges cultural practice knowledge and expertise reasons and incentives – REF support
Today’s need for speed… if data are to be shared they need to be: –easily accessible – ‘take away’ –rapidly accessible - quickly –free or at least affordable –well documented –supported – by humans beyond the life of the current research the length of the research lifecycle has decreased significantly and it is far more competitive archiving is just an extra burden!
How can we help you to share data offer best practice strategies and methods for creating a shareable dataset: –managing your own data across the research lifecycle –selecting data worth keeping –describing data –“processing” data –storing data –providing ethical and legal access to data based on what we do already and have done for some 40 years early this year we released new suite of guidance as web pages and a brochure offer bespoke advisory service and training
Key areas for managing data how to share data consent, confidentiality and ethics copyright and other rights data description and metadata data formats and software data storage, back-up and security most problematic areas: –definitions and legal aspects –gaining consent –consent forms –disclosure –data sharing and confidentiality
ESDS - examples of practical advice ensure good housekeeping Model consent forms/wording anonymise data appropriately use model transcript template use model excel sheet for metadata capture ensure sufficient context described use suitable data formats e.g..rtf for text;.wav or mp3 for audio establish version controlling for group access
Website
ESDS website – process for sharing
Our work at the UK Data Archive Research Data Management Support Services (RDMSS) work –support ESDS and RELU award holders –UK Data Archive – generic advice, collaboration and worldwide training Researcher Development Initiative (RDI) bid - training and capacity building JISC/ESRC support for ESRC high-investments e.g. Centres and Programmes –Data Management Planning –training –looking for guinea pigs as case studies
What are data “worth” keeping? rich data that are well-documented - a string of yeses and nos would be …dull and unacceptable format, usability and condition of material data that have further analytic potential than the original investigation (depth; large-scale; longitudinal) relative importance or impact of the study e.g.. had a major influence in its field and/or representing the working life of a significant researcher copyright and confidentiality issues complementary to existing data holdings (series)
What is not accepted? data that do not have adequate consent for sharing data that are semi-structured without richness data that have no methodology documented all ‘rejections’ are now offered to new forthcoming UKDA-store – UKDA’s self archiving system (FEDORA) those that have NOT sought consent to share where it was felt possible are sent a warning letter and referred to Research Council
How do we archive data? specialist metadata DDI versus ISAD(G) data are ‘processed’ at the study and file/interview/object levels –error checking/validation of collection contents –check consent and confidentiality agreements met –basic reformatting of text undertaken –possibly anonymisation undertaken –creation of data context - digital user guides, variable and data lists –access conditions agreed and applied –data mounted for download system published ESDS guide to data processing techniques Simples ;-)
The paper mountain most new text-based collections are born digital but much older data in paper format is it worth digitising paper? –scan and OCR samples of key data –scan as image files to enable faster throughput –can selectively digitise ‘highlights’ what about audio? –can digitise sound bytes from analogue sources
Centralised services centralised archives have existing infrastructures offer on-line accessibility and require: –a technical infrastructure –an access control system –researcher liaison staff –acquisition, data preparation, user support and promotions staff –resource discovery tools –promotional and communications devices
Who are the staff at the UK Data Archive? most of the staff now are multi skilled – can handle qualitative and quantitative data this is important, as much data coming in is from mixed methods studies quali-centric staff suffer from being scared of numeric data, which means they are less flexible I believe that all staff should be able to handle all data types (bilingual) and metadata (trilingual), otherwise synergies are less effective technical skills are important but can be more easily bought in
Recent RC initiatives Archiving and Sharing Demonstrator scheme (QUADS) (ESRC) –develop and promote innovative methodological approaches to the archiving, sharing, re-use and secondary analysis of qualitative research and data –develop a range of new models for increasing access to qualitative data resources, and for extending the reach and impact of qualitative studies –disseminate good practice in qualitative data sharing and research archiving RELU-DSS (ESRC, NERC, BBSRC) –set up to help oversee and implement the Programme's Data Management Policy and Data Management Plan (builds on existing ESRC and NERC mandatory data policies –provides a support service for RELU researchers and staff to gain information and guidance on issues surrounding longer-term data sharing and preservation
More RC initiatives Timescapes (ESRC) –national qualitative longitudinal study –archiving funded and built in from start –repository built for “in-house” data sharing –aim to share more widely later Research Methods Programme (RMP) –ReStore – a sustainable repository of online research methods resources for RMP outputs Technology Enhanced Learning (TEL) (ESRC & EPSRC) –using new technologies to help develop learners’ skills of enquiry, analysis, synthesis, knowledge construction and collaboration –ENSEMBLE - a case-based archiving and semantic web project
Recent JISC and other initiatives Digital Curation Centre (JISC) –Data Audit Framework –Data Management Plans Research Information Network (RIN) (HEFCE, RC & Libraries) –case studies on data sharing and on the “profession” JISC Research Data Management Programme –addressing strategic requirements for UK HE to improve its data management capability and better to understand how this may be achieved –help establish the foundations for the UK research data infrastructure UK Research Data Service UKRDS(HEFCE & JISC) –Pathfinder case studies on data sharing in four universities
European picture: national mature service pilot service feas stud keen, but.. very quiet
Contacts RDMSS UK Data Archive University of Essex Colchester Essex