Preservation and Curation of University Research Data: Curation Pilots for UC San Diego David Minor Director for Digital Preservation Initiatives Interim.

Slides:



Advertisements
Similar presentations
VCC3 Proposal Organisation of the tasks Sophie David, Jean-Luc Minel 28 th -29 th August 2012, Dublin.
Advertisements

Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Visualizing Fitness for Purpose Bob Groman and Dicky Allison Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution.
A Tale of Two Collaborations Mary Linn Bergstrom UC San Joan Starr California Digital
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
© 2005 by Prentice Hall Appendix 2 Automated Tools for Systems Development Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
Lifecycle perspectives A “big picture” of research data today Monday, August 9,
Pilots to Program: UC San Diego Research Data Curation Pilots and the Library Research Data Curation Program Mary Linn Bergstrom Matt Critchlow Arwen Hutt.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
1 CCLI Proposal Writing Strategies Tim Fossum Program Director Division of Undergraduate Education National Science Foundation Vermont.
GeoData 2011 Workshop Data Life Cycle Break Out #3 Wednesday, 2 March 2011 Moderator: Mohan Ramamurthy, Unidata.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
Good practice in Research Data Management Module 6: Tools, training and support.
Presenter: Karla Strieb Assistant Executive Director Transforming Research Libraries June 3, 2010 Supporting E-science: Progress at Research Institutions.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
24 March 2010Atlanta, Georgia Passing it on: Notes on digital initiative sustainability Marty Kurth HBCU Library Alliance – Cornell University Library.
SCIENCE, RESEARCH DATA, AND PUBLISHING Stewart Wills Editorial Director, Web & New Media, Science 26 February 2013.
One Body, Many Heads for Repository-Powered Digital Content Applications Hydra Europe Symposium, Trinity College, Dublin, 7 th April 2014 Chris Awre Head.
Research Data Management Victoria University Context Lyle Winton Adrian Gallagher Julie Gardner.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Michael Witt Interdisciplinary Research Librarian & Assistant Professor Purdue Libraries & Distributed Data Curation Center (D2C2) Eliciting.
Agency Requirements: NSF Data Management Plans Ruth Duerr National Snow and Ice Data Center Version 1.0 October 2012 Section: The Case for Data Stewardship.
Data Management and Accessibility S.M. Kaye PPPL Research Seminar 12/16/2013.
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
OCLC Western Service Center Practical Digital Data Curation Gayle Palmer, Digital & Preservation Services Manager OCLC Western Service Center January 2006.
Data Practices across Disciplines: Informing Collections & Curation Carole L. Palmer Melissa H. Cragin, Tiffany Chao, & Nic Weber Center for Informatics.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Data in the NEES Data Repository Conditions for Current and Future Use and Re-Use Quake Summit 2012, Boston, Massachusetts July 12, 2012 Stanislav Pejša.
The Role of Academic Libraries in the Digital Data Universe Break-Out Session: New Partnership Models Bob Hanisch and Brian Schottlaender Co-Leaders ARL.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Science Data in the Science Mission Directorate (SMD) Jeffrey J.E. Hayes Program Executive for MO & DA, Heliophysics Division August 17, 2011.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
Data Management Lesley A. Brown Director of Proposal Development.
N EXT - GENERATION R ESEARCH A ND THE U NIVERSITY OF C ALIFORNIA WHAT THE UC LIBRARIES BRING TO THE EQUATION Brian E. C. Schottlaender The Audrey Geisel.
GT Research Data Project Team Original Charge: to investigate, evaluate, assess, and communicate Georgia Tech researchers’ data practices, processes, and.
Office of Science Statement on Digital Data Management Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Digital curation practices and professions Alyce L. Scott Lecturer, San Jose State University SAA: SJSU Student Chapter, March 16, 2016.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
A. D. SMITH – SEPTEMBER 28, 2011 DATA CURATION PROFILE.
Digital Asset Management: E-Science Life-Cycle Anthony D. Smith Ocean Teacher Academy Training Course, 30 September - 4 October 2013, Mombasa, Kenya.
Jeff Moon Data Librarian &
Integrated infrastructure for UQ researchers
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Summit 2017 Breakout Group 2: Data Management (DM)
Putting All The Pieces Together: Developing a Cyberinfrastructure at the Georgia State University Library Tim Daniels, Learning Commons Coordinator Doug.
Workflows in archaeology & heritage sciences
ESciDoc Introduction M. Dreyer.
Research Data Management
Bird of Feather Session
Research data lifecycle²
Presentation transcript:

Preservation and Curation of University Research Data: Curation Pilots for UC San Diego David Minor Director for Digital Preservation Initiatives Interim Head, Research Data Curation Program

Data Curation on campus Part of a campus-wide suite of services – Research Cyberinfrastructure (RCI) Data Curation Centralized Storage High Speed Networking Computing Colocation

Data Curation defined in planning documents The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will also involve maintaining links with annotation and other published materials. Source: From Data Deluge to Data Curation Philip Lord, Alison Macdonald, Liz Lyon, and David Giaretta appraisal accession arrangement description storage preservation access

Data Curation Pilots Two year pilot process with selected researchers (started September 2011) Targeted domains representing campus Explicitly required researcher participation

Curation Pilot Story The curation pilot goals: Investigate what it means to make a variety of research data discoverable and reusable Investigate current UC San Diego tools for accomplishing this work Learn how researchers, information technologists, and librarians work together with data Recommend production services Develop budget and cost models

The pilot participants …

The Brain Observatory Preserve and curate the digital version of the brain of patient HM, the most studied neuropsychological patient in modern medicine.

NSF OpenTopography Facility OpenTopography facilitates community access to high- resolution, Earth science-oriented, topography data, and related tools and resources.

Levantine Archaeology Laboratory Focuses on archaeological investigations concerning the evolution of societies in the southern Levant from the Neolithic to Islamic periods.

Scripps Institution of Oceanography Geological Collections The Sediment Core collection contains samples collected from as early as The Cored Sediment Collection is a growing archive of sea-floor samples and associated data supporting a diverse variety of scientific research.

The Laboratory for Computational Astrophysics Dedicated to advancing the state-of-the-art of astrophysical simulation through the development and dissemination of community codes, and through large- scale simulations of astrophysical and cosmological systems.

What does discoverable and reusable look like?

Discovery and Access Resources The UC San Diego Library's Digital Asset Management System (DAMS) is a digital object and linked data access and discovery tool. In existence for nearly ten years, the newest version (informed by RCI Curation) has been extended to support complex research data. The California Digital Library’s Online Archive of California (OAC) provides free public access to detailed descriptions of primary resource collections maintained by contributing institutions throughout California, including the 10 University of California (UC) campuses. The OAC contains more than 20,000 online collection guides and 220,000 digital images and documents.

Discovery and Access Resources Data Flow SDSC Storage DAMS Researcher Chronopolis OAC

UCSD Library DAMS

As other collections are added, they will be listed here. Cross-collection discoverability is key. Complex research collections will be “mixed in” with regular digital collections.

18 Complex components and imagery Full metadata records

19

20

UCSD DAMS Infrastructure for Research Data

Online Archive of California

“Same Data” presented in different format.

OpenTopography

We’re storing all metadata for semantic completeness. DOI Data download

Findings

Over-arching statements Considering the data lifecycle holistically is key to the interaction of RCI services. Curation is not solely a technology enterprise. Human driven Judgment needed There are social aspects to sharing data outside the research group

Finding 1 Researchers see the data lifecycle as a single workflow. All of the researchers asked some version of all of these questions: – Where do I put my data? – How do I get my data there? – How can I access and use my data? – How do I analyze and visualize my data? – How do I share my data with other people? – How do I display my data? – How do I reference my data? – Who’s going to keep my data after my grant funding ends? The needs articulated in the questions above cross all of the RCI service boundaries. Typically, however, these services are not offered in an integrated way.

Finding 2 It is expensive to do curation “after the fact” – i.e., at the end of the data lifecycle. The majority of the work involved helping the data owners organize and annotate massive amounts of historical data. Analyzing and organizing data during the collection/creation phase makes using and sharing them subsequently much more effective and efficient for the researchers. Researchers said to Research Data Curation Program (RDCP) staff, “I wish we had talked to you sooner.”

Finding 3 There are no standard definitions of “dataset” or “object.” – What are the units of organization? – What does collection mean? – What are the boundaries of a data set? – What are the discrete units of discussion? Data types, notions of a collection, and how data are organized are not consistent across disciplines.

Finding 4 Researchers want tools and best practices to help them manage their data. Consistent with Findings 1-4, researchers said that they wanted tools to help with data collection, organization and storage. We heard this regularly from the researchers, including in a cross-pilot gathering.

Finding 5 There is lack of clarity about: – which data are appropriate for long-term stewardship – who is responsible for this stewardship – who pays to store data for the long-term (10 years +) Who is the steward: The researcher? The University? Who decides?

Next Steps

To Do for the Curation Program Move from a boutique, one-off service to a more scalable series of processes. Work with additional researchers in same domains. Work with new domains. Broaden lifecycle management mindset on campus.

Recommendation – Curation Program Consultation services – Metadata – Grant support, offered in conjunction with the other RCI teams as part of the data lifecycle service – “Matchmaking” with a data repository Other curation services – Getting DOIs for data reference – Chronopolis preservation – Full curation process (what we did in the pilots) – Research and development

Recommendation (Above our pay grade) The campus should create a Data Lifecycle Advisory Council, to include campus administrators, researchers, and librarians. The Council should be tasked with advising the campus on: What data the university should steward for the long term. Who should pay at each stage of the data lifecycle. How intellectual property rights should be determined.

Thank you! David Minor –