Increase discovery of your institution’s research through SHARE

Increase discovery of your institution’s research through SHARE
Kelly Thompson Metadata Analyst Librarian, University of Minnesota SHARE Curation Associate, Library Technology Conference, March 16, 2017 Hello and welcome to this session, entitled “Increase discovery of your institution’s research through SHARE”. I’m Kelly Thompson. I’m a Metadata Analyst Librarian at the University of Minnesota Libraries, Twin Cities campus, where I was lucky to join a department called Data Management and Access this past September. I’m hear today in a slightly different role however. I’m also a member of the pilot cohort of SHARE Curation Associates, and it’s in that capacity that I’ll be speaking to you today, about SHARE, and how it might fit into your work.

Outline Conceptual foundations Practice-based: What is SHARE
Current features of SHARE Developing features of SHARE How can you participate? Question time To start off, I’d like to give you a brief roadmap of what I’ll be presenting today. This presentation will have two main parts: the first being more formal, where I will lay some of the conceptual groundwork for this project, and the second more informal with some discussions of the practical aspects of SHARE. In the first part, I’ll give you some general background on what SHARE is and where it fits in to the current scholarly communication landscape. In the second part I’ll discuss current features of SHARE, and also a little bit about some developing features. In the practice-based portion, I’ll show you some examples of how other librarians are leveraging SHARE, and give you some ideas on how your institution can participate. We’ll have some time at the end for questions, so if something pops up, make sure to jot it down for that time.

SHARE SHARE is a free, open data set of research and scholarly activities across the research life-cycle. So, what is SHARE… SHARE is a free Open Data set Of research and scholarly activities Across the research life-cycle. Now, that’s a lot, so let me break that out piece by piece.

SHARE SHARE is a free, open data set of research and scholarly activities across the research life-cycle. SHARE is a data set, comprised of metadata contributed by about 150 different contributors, and that number is growing. It’s metadata only – so there are no digital objects stored or captured, simply information about the research or scholarly activity. And, because it’s an aggregated data set, it allows for collacation in searching of a variety of repositories that would otherwise be siloed.

SHARE SHARE is a free, open data set of research and scholarly activities across the research life-cycle. SHARE is free – it costs nothing, you don’t need to buy a subscription – And it’s open. You are free to use the data without having to license it. All of the software and infrastructure are built with code that is freely distributed on github. This is important because it means you don’t have to pay to access or use the data, and you can use it for activities that might otherwise be restricted under license agreements we as libraries typically sign with vendors, such as agreements that you won’t use a vendor’s data for data mining or similar purposes. SHARE doesn’t have any of those restrictions, so if you think of an incredible use for it, you are free to make your geeky data dreams come true.

Who is SHARE Making SHARE free and open was a very intentional choice.
SHARE is a partnership between the Association of Research Libraries, or ARL, and the Center for Open Science, or COS. The SHARE intiative was founded in 2013 by the Association of American Universities (AAU) and the Association of Public and Land-grant Universities (APLU), both of which continue to have ex officio representation on the project’s Advisory Board. The project is also underwritten in part by generous funding from the Institute of Museum and Library Services (IMLS grants) and the Alfred P. Sloan Foundation. All of these organizations explicitly value developments in the library, information, and science fields which have broad benefits to as many stakeholders as possible, and by making share free and open, it is more able to support that value.

SHARE SHARE is a free, open data set of research and scholarly activities across the research life-cycle. And, finally, SHARE is a free, open data set of research and scholarly activities across the research life-cycle.

Research Lifecycle The research lifecycle can be abstracted in many ways - this is the model that the Open Science Framework uses. It starts with some kind of intellectual input, a problem, or a documented gap in understanding. A researcher synthesizes this information, hypothesizes about possible answers to the research question, and designs some type of study to test these hypotheses. The researcher goes through the steps of carrying out the study (acquiring materials, which could be data or physical materials; collecting data; storing that data somewhere; and eventually analyzing that data.) The researcher interprets the findings from their study, and, as our current custom, writes it up in a report of some sort, which is published or shared in some way. This research output is consumed by other researchers, and ignites further inquiry by the questions it generates or leaves unanswered, and the cycle starts again. Now, in the traditional academic culture, this stage at 11:00 there, the part where you publish the report, has historically been given the most weight in the evaluation of whether or not a researcher has been successful in their trip around the cycle. This is what has driven tenure, metrics like impact factors, etc. all these years. But in the modern research landscape, this is not the case anymore.

Collecting the Research Lifecycle
There are outputs at every stage of the lifecycle. Presentations, study pre-registrations, grant proposals, data management plans, research protocols, data bases and spreadsheets, code that was used to process and analyze data, figures, graphs, and charts, author manuscripts, pre-prints, peer-reviewer comments, articles, book chapters, posters, conference presentations, class lectures, etcetera. SHARE seeks to improve the discoverability of research across the lifecycle, not just when a final version of an article is published. This is tricky, because in our current scholarly communication landscape, each type of output tends to live in a different repository from the others. By aggregating data from all of these disparate sources, SHARE seeks to piece these important pieces back together. So, why is this important? Why isn’t the journal article good enough anymore? Well, we know that there has been a lot of buzz about reproducibility in science. Can I take the same data you used to come to your conclusions, and analyze it, and get the same outcome? In some cases, papers have been retracted because the researchers couldn’t produce the supporting data to back up their claims. Because most disciplines do not publish “negative results” or results of studies that do not confirm a hypothesis, there is pressure at the end of the research lifecycle to extrapolate something from the data, so that the expensive investment (in time, and resources) was not a “waste”. And, in an environment where most journals are closed, toll-access publications, many people can not access the published final article, leaving much of the research corpus inaccessible, slowing down the pace of research and new breakthroughs. So, these extra materials that help to give context to the research. They support the interest in replicating studies, and performing new studies and analyses using existing data sets. To collect and making accessible these materials is to act on the notion that failed or negative experimental results might save someone time or as funders, resources, if it doesn’t have to be repeated because someone didn’t publish a journal article about it. This is what is fueling the adoption of open science practices, publications, and tools within the research community. It’s also helping to build the understanding that the published report is both a summary of the science and the end of the research process and that there is a fundamentally unnecessary inefficiency in waiting for that published report to engage potential collaborators or advance a finding.

Everett Opie The New Yorker Collection/The Cartoon Bank, 1976
This is a cartoon that Jeff Spies, from the Center for Open Science, likes to share. The caption reads, “I see by the current issue of Lab News,” Ridgeway, that you’ve been working for the last twenty years on the same problem I’ve been working for the last twenty years.” Now this is from 1976, so, before Al Gore invented the internet as we know it [joke]. How much better are we doing now? I worked in scientific research before I went into libraries, so I understand the fear that your research will get “scooped”. But I think that, by re-designing the way the research lifecycle is documented, this whole anxiety looses a bit of its teeth, since you can document and establish your ideas and your intellectual pursuit at the earliest stages of the research cycle. An emerging practice is study registration. This is where you register your research question, methods, and the statistical analyses you plan to perform before collecting any data. This also allows for peer review of your research methods before you go to the trouble and expense of conducting any experiments. ClinicalTrials.gov is an example of a system collecting these kinds of outputs. When this kind of openness about workflows is integrated throughout the research process, it is sometimes called “Open Notebook Science”, where the entire research process is completely transparent (or as much as it can be in cases where there is sensitive data involved in the research). So, what tools and what infrastructure do researchers have to overcome this, frankly, inefficient, situation?

Open science and open scholarship
One tool that SHARE developers and the SHARE curation associates have been using heavily is the Open Science Framework, which I’ll talk a little bit about later in this presentation, but for now will just mention that it is a system to support the entire research lifecycle, from pre-registering your study idea and methodology, to providing one interface that embeds and links to all of your project documents whether they’re in Google Docs, on GitHub, uploaded to the OSF, or in your project page’s wiki. Data repositories are proliferating and seeing broad adoption as researchers work to comply with federal funder mandates…. and discipline-specific pre-print servers such as MLA Commons CORE now supplement institutional repositories as places where communities of scholars can deposit their working papers, and versions of their manuscripts they have publisher permission to share, even if the final publisher version is not open access. There has been a lot of discussion of open peer review, which I don’t think anyone can agree on a definition of currently, but has been used to mean everything from peer review where the identity of the reviewers are not a secret, to having peer reviews published alongside of the manuscript versions themselves, to post-publication peer review where the articles is published first and reviewed after.

The Lifecycle Approach
“Rather than focusing on acquiring the products of scholarship, the library is now an engaged agent supporting and embedded within the processes of scholarship.” --21st Century Library Collections: Calibration of Investment and Collective Action (ARL 2012) So, why is SHARE, and why are libraries for that matter, interested in the full research lifecycle, not just data and publications, which are typically the focus of mandates and policy? And, truthfully, when the data and the publications themselves are hard enough to tackle? For transparency, reproducibility, and reusability To tell a full story about scholarship, and about the range of potential contributions to scholarship as a project might include people who create different aspects of this work. The article is a late indicator of a person’s work.

Mission and Values SHARE’s mission is to maximize research impact by making research widely accessible, discoverable, and reusable. SHARE is developing services to gather and freely share information about research and scholarly activities across their life cycle. Making research and scholarship freely and openly available encourages innovation and increases the diversity of innovators. Okay. So now that we’ve talked about the landscape, and the conceptual foundations of open science, we can talk about SHARE specifically as a tool.

How does it work?

SHARE dataset 150+ data sources Registries (e.g. CrossRef, DataCite)
Disciplinary repositories and preprint services Data repositories Institutional repositories Agency repositories (e.g. DOE SciTech Connect) As of Monday, there were 150 data sources which had contributed to SHARE. (This number is continously growing, and the number is updated next to the search bar on whatever version of SHARE you are searching.) This is about 17.8 million metadata records. These contributors cover a wide swath of the research lifecycle ouputs. There are persistent identifier registries, such as CrossRef and DataCite, which mint and manage DOIs). There are Disciplinary repositories and preprint services (such as arXiv, RePEc or the Repository of Papers in Economics, PubMed Central) Data Repositories (Dryad) And Institutional repositories from dozens of universities and research organizations (public, governmental, and private) Everyone from Harvard to the Department of Energy to

Metadata records by type
Data Set Patent Poster Presentation Publication (Article, Book, Conference Paper, Dissertation, Preprint, Project, Registration, Report, Thesis, Working Paper) Repository Retraction Software

Institutional Dashboard
API Feeds Application Components ... Integrations, Applications, Widgets, Other Solutions... Institutional Dashboard Discovery ... SHARE Notify

SHARE API powers new discovery services
This is a prototype. There will be additional polish to the interface. UCSD and SHARE UI developer Manual clean-up & enhancement of the data was necessary. Hired grad students to manually collect bib/lab/people data.

Developing features of SHARE
Enhanced data model Institutional dashboard Aggregating more [varied] sources Curation Associates work (including interface for curation)

SHARE Curation Associates
Program Goal: to support associates to leverage curation expertise to enhance the SHARE dataset, and to lead projects that provide benefits locally Assessment Clean-up Alignment with SHARE dataset as a whole

34 professional librarians in inaugural pilot
Information extraction through application programming interfaces (API) and OAI-PMH feeds Building content harvesters Basic programming in Python Methods and tools to automate data cleaning and metadata enhancements (using programming scripts and/or OpenRefine) All tools used will be open source. I was working at Iowa State University when I was selected to participate in the SHARE curation associates program, so I am represented on this map by the purple dot in the middle of Iowa, instead of a purple dot in the middle of Minnesota.

Curation program Local data curation Project teams
Training, education, and outreach Dataset curation (forthcoming) Components Local data curation OR Project teams

Open Science Framework https://osf.io
Increasingly, we are looking at this opportunity for integration using the OSF, which can work with both funders and institutions to provide a platform that can integrate with and provide computational services, archival services, curation opportunities and preservation. There is very exciting work going on at Notre Dame.

Local enhancement activities
First 6 months: Metadata review Gap analysis Digital preservation review Draft plan Upcoming: Implement plan

? Metadata quality Title Title Author Author 1; Author 2 Author 1
Volume Pages Explanation of why the bepress OAI-PMH endpoint isn’t a good data source from SHARE even though your repository metadata might be very rich. [Crosswalking 101] Coverage

Project groups Potential data sources from re3data.org repositories
Populating an open access institutional repository with SHARE data Graduate student researcher profiles etc.

How can you participate in SHARE?

Register your repository!
1. Scroll down, click on “Source Registration” 2. Create an OSF account 3. Fill out the form 4. a. If your repository has an OAI-PMH endpoint, the SHARE team will take care of everything from there! b. Otherwise they will work with you on what you’ll need to submit.

Use the API! https://osf.io/bygau
Getting started documentation, sample scripts! More documentation on the SHARE website.

Get more information Get involved! www.share-research.org/updates

Question time

Thanks to Cynthia Hudson-Vitale for creating some of the slides used in this presentation! And to all the SHARE folks including Jeff Spies, Judy Ruttenberg, Rick Johnson, Erin Braswell, and Amy Eshgh.

Increase discovery of your institution’s research through SHARE

Similar presentations

Presentation on theme: "Increase discovery of your institution’s research through SHARE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Increase discovery of your institution’s research through SHARE

Similar presentations

Presentation on theme: "Increase discovery of your institution’s research through SHARE"— Presentation transcript:

Similar presentations

About project

Feedback