Kathleen Shearer, Executive Director, COAR Co-chair, RDA Long Tail for Research Data Interest Group Co-chair RDA Libraries and Research Data (soon to be.

Slides:



Advertisements
Similar presentations
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Are repositories holding back OER?
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Managing Sustainability Solutions Initiative (SSI) data Kate Beard, Steve Cousins University of Maine NERACOOS/NECOSP Data Management Workshop, Sept. 26,
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
THROUGH OR AROUND? SCIENTIFIC RESEARCH DATA AND THE INSTITUTIONAL REPOSITORY Panel Presentation for the International Conference on University Libraries.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
DASISH Final Conference Common Solutions to Common Problems.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
Formalizing Project EMANI Ithaca, July 26 th, 2002.
Introduction to metadata
VIVO and Scholarly Repositories: Synergistic Opportunities.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Current Metadata Practices for Long Tail Research Data Kathleen Shearer, Executive Director, COAR Co-chair, RDA Long Tail for Research Data Interest Group.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
SDMX IT Tools Introduction
KATRINE GASSER Meeting: Data Management projects 15/
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
RDM Survey Survey questions and polling data collected at the LIASA Research Data Management workshop Cape Town 27 March 2014 Event web page:
Breakout Session 2.2: A sustainable GEO Information System of Systems Chair: Lorenzo Bigagli Rapporteur: Greg Yetman.
CombeDay Making Data Openly Available Simon Coles.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Citation Implementation Pilot Workshop
Metadata and eGovernment The Danish Approach and Experience Palle Aagaard National IT and Telecom Agency, Denmark ERPANET training Seminar September 3-5,
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Making the Case for Curation: The Practical Experiment of DSpace Managing Digital Assets February 5-6, 2005 Charleston, SC Ann J. Wolpert, Director of.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Kathleen Shearer Data management: The new frontier for libraries.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
NRF Open Access Statement
Accessing the VI-SEEM infrastructure
The European Open Science Cloud, Libraries and the Long-Tail
Credit: Swiss National Science Foundation
Ways to upgrade the FAIRness of your data repository.
Document, Index, Discover, Access
Repository Software - Standards
ACS 2016 Moving research forward with persistent identifiers
VI-SEEM Data Repository
Scientific Data as Research Infrastructure
Overview of working draft v. 29 January 2018
Bird of Feather Session
School of Information Studies, Syracuse University, Syracuse, NY, USA
CRKN and Canadiana Update
Australian and New Zealand Metadata Working Group
Presentation transcript:

Kathleen Shearer, Executive Director, COAR Co-chair, RDA Long Tail for Research Data Interest Group Co-chair RDA Libraries and Research Data (soon to be IG) Project Coordinator, Project ARC, a library based initiative to develop a national network for RDM in Canada Wagging the Long Tail of Research Data

About The Confederation of Open Access Repositories (COAR) Over 100 institutional members from around the world on five continents Mission: to create a global network of open access repositories in support of research Community of practice & an international voice for the OA repository community Major issue is interoperability (repository-repository AND repository-other systems) To date, mainly focused on institutional role in managing and providing open access to publications These services are evolving/expanding to include the management of research data

“Big data” is all the rage

(Image from Chuck Humphrey: OpenAIRE-COAR Conference, Athens 2014) But, the majority of datasets produced through research are part of the “Long Tail of Research Data”

Characteristics of Long Tail Research Data HeadTail HomogeneousHeterogeneous LargeSmall Common standardsUnique standards IntegratedNot-integrated Central curationIndividual curation Disciplinary repositoriesInstitutional, general or no repositories Adapted from: Shedding Light on the Dark Data in the Long Tail of Science by P. Bryan Heidorn. 2008

Long Tail of Research Data: small (…sometimes) The 2011 survey by Science, found that 48.3% of respondents were working with datasets that were less than 1GB in size and over half of those polled store their data only in their laboratories. Science 11 February 2011: Vol. 331 no pp DOI: /science

Long Tail of Research Data: heterogeneous A review undertaken by Cornell University of over 200 data “packages” (files related to arXiv papers) deposited into the Cornell Data Conservancy with there were 42 different file extensions for 1837 files across six disciplines. The Dryad Repository, which is a curated, general-purpose repository that collects and provides access to data underlying scientific publications reports a huge diversity of formats including excel, CVS, images, video, audio, html, xml, as well as “many uncommon and annoying formats”. The average size of the data package which they collect is ~50 MB. According to the European Commission (EC) document, Research Data e- Infrastructures: Framework for Action in H2020, “diversity is likely to remain a dominant feature of research data – diversity of formats, types, vocabularies, and computational requirements – but also of the people and communities that generate and use the data.” infrastructure/docs/framework-for-action-in-h2020_en.pdfhttp://cordis.europa.eu/fp7/ict/e- infrastructure/docs/framework-for-action-in-h2020_en.pdf

Long Tail of Research Data: Institutional, general, domain or (often) no repositories Science 11 February 2011: Vol. 331 no pp DOI: /science

Long Tail of Research Data: some of the challenges Data quality - Determining quality and value of datasets - Standards, metadata and norms differ significantly across disciplines Discoverability - diverse datasets are less discoverable because they are not found in a “go to” domain repository Incentives -why should researchers for deposit their data? Business case - why should organizations invest in the management of this data?

Accepted as an RDA Interest Group in Summer 2013 Over 90 members from around the world Objectives To better understand the long tail To address some of the challenges involved in managing diverse datasets To share current practices, and develop best practices, for managing diverse data To work towards greater interoperability across repositories Long Tail of Research Data Interest Group

Activities-to-date Survey of discovery metadata Discussion of strategies for improving discoverability of datasets (All information is available on the interest group’s website) Future activities evidence to incentivize researchers to deposit creating environments to make it easier for researchers to deposit their data, sharing practices about discovery, interoperability across repositories preservation planning Long Tail of Research Data Interest Group

Survey of Current Practices for Discovery of Research Data

Survey of Current Practices for Discovery Metadata Purpose: to better understand the current practices in terms of discovery metadata Respondents: any repository collecting long tail data Undertaken from February 15 to March 7, 2014 Recruited respondents via RDA mailing list and other research data list serves Over 60 responses, but only 30 full responses OBVIOUSLY not a representative sample, but an indication of which way the wind is blowing

Location of repository

What are the descriptive metadata standards used? Repositories using a single schema Dublin Core (9) DataCite (3) DDI Study-level metadata cf supra. ISO19115 (Geographic Information Metadata) MARC21 MODS metadata RIF-CS Repositories using more than one schema DataCite and Dublin Core (3) Dublin Core, Darwin Core, Prism Dublin Core, EDM, ESE, QDC Dublin Core, MARC21 dc, dcterms, geo/wgs84, FOAF, own extension ontology MODS & DataCite Metadata Schema Organic.Edunet IEEE LOM

In your opinion, is the metadata used in the repository sufficient to ensure discoverability of the datasets? 88% said yes, but… Broadly speaking, and at a very high level, yes. If someone is looking for the data that supports a specific study, it is likely they will find it. However, if someone is looking for data with specific collection characteristics or other particularities then the metadata requires further enhancement. We aim to index metadata to aid discovery only. Metadata required to explore / reuse data will be stored with the data as a (non-indexed) object or stored in a separate, searchable database which links to the individual data objects in the repository (which may be at a sub-collection level). Data will also be found as the DOI will be included in publications related to the dataset.

In your opinion, is the metadata used in the repository sufficient to ensure discoverability of the datasets? 88% said yes, but… Data are discoverable within the repository because of limited repository scale, but once harvested and made available to search alongside tens of thousands of other datasets, the metadata are insufficient Precision is low because natural language metadata queries tend to entrain marginally relevant data sets due to weak associations in project descriptions and other broad fields. Fine for basic discoverability - richer discipline metadata would be nice but probably not feasible at this point

But we know, most most people use Google as their discovery tool

Strategies for improving the discoverability of datasets Linking data to publications Data citation- DOIs Build discovery layer that further describes data (landing pages) Attach or link to Data Management Plans (DMPs) Enable machine readability Data sets registries Data repository registries

Some concluding comments There is a growing interest in the management of long tail research data and institutions are recognizing they have a responsibility to manage research data Institutions can offer a sustainable, long-term solutions We already have a lot of expertise with metadata, preservation, and collaboration But, we need to work closely with data creators who have the disciplinary knowledge We have a lot to learn from the disciplinary communities about managing data We should heed the lessons learned from academic publishing (i.e. be wary of artificial measures of quality and impact)