Building a CMMI Data Infrastructure DAY 2 – Breakout Summaries Arlington, VA February 6-7, 2017
Sustaining Repositories Define the key benefits / value to users Visualizations, citations, curation, etc. Describe the organizational entity ideal to manage the data repository and the governance structure Non-profit, university, etc. Board of directors, volunteer steering committee, government group Explain the funding model to sustain operations Individual - Pay to put data in, pay to take data out, per use, subscription Organization – long-term funding?
Incentivizing Data Sharing Need to develop, implement, and refine rules aimed at enforcing data sharing on federally funded work wherever possible using a combination of requirements from federal agencies, requirements from publishers/editors, and community (e.g., professional societies) developed and agreed practices. Need to develop, implement, and refine incentives aimed at encouraging data sharing that might include improved productivity, enhanced data longevity and utility, more citations compared to traditional publishing, rewards/recognition from peers, and more funding from federal agencies. Portfolio of case studies illustrating the benefits of data sharing.
Innovative Data Creation and Data Fusion Approaches The main discussion of this group were on new research topics and approaches needed to tackle many of the challenges with emerging forms of data, and in building repositories and data services using such data. New approaches to doing science – crowdsourcing, grand challenges, multidisciplinary research – for problem-understanding, generating solutions and knowledge discovery Research and applications around capturing, discovering emerging forms of data particularly geospatial data and the privacy and information security issues around sharing such data
Metadata, Vocabulary, & Workflow Tools for Discovery Address need for commonly used words and describe approach for building and evolving terms as well as their relationships. Incorporate electronic lab notebooks with complementary tools to capture and associate the full workflow with data and publications. Employ approaches to search and query for specific parameters across multiple distributed data.
Using Data Management Plans and existing NSF data centers Can (or should) existing NSF data centers, or other data repositories, be used in this regard? There are possibilities, but currently limited Not really a NSF based center solution for material sciences Perhaps something like EarthCube is the solution here NHERI based center (DesignSafe) working with XSEDE (UT-TACC) is a potential solution Address some data issues (storage and work space; potentially (?) addressing integration, meta data standards, confidentiality, proprietary data) May not be a solution for all areas of infrastructure Are such data centers a necessary condition for formulating a reasonable data management plan? No, but clearly would be helpful Range of possibilities: Repository function Data preparation and processing for the community Data integration and linkage