DSET Overview WAG Meeting, Aug. 10, 2017 Matt Mayernik mayernik@ucar.edu https://ncar.ucar.edu/data-stewardship-engineering-team-dset
DSET Vision Satisfy user needs by developing complete organization-wide data discovery and access for scientific research community data.ncar.edu data.ucar.edu Single front door to ALL community data
Working Definition Data Digital assets intended for scientific community use, including files and metadata, publications, reports, images, software (visualization, analysis, model codes), and related data services. Why do we say “science”. Our best chance for success is to limit the scope, and satisfy our “primary” users and their needs. It is not to say the the development stops there. Better serving the public is important for NCAR/UCAR – this is a future objective. Just trying be practical and not over reaching at the start. Services include: helping with NSF DMP, EOL project planning, supporting scientific inquiry, consulting (“help me find this”), things like the “Climate Data Guide”,
Why is this important? Users and colleagues “in the know” are well served New and diverse community of users is easily frustrated Many services sprinkled across the organization Not well coordinated, no over arching consulting Current websites are not comprehensive Need to respond to new requirements from funding agencies and scientific journals
DSET Membership Lab Representatives Leadership Team Rebecca Centeno Elliott (HAO, rce@ucar.edu) Linda Cully (EOL, cully@ucar.edu) Louisa Emmons (ACOM, emmons@ucar.edu) Abby Jaye (MMM, jaye@ucar.edu) Don Kolinski (HAO, kolinski@ucar.edu) Ryan May (Unidata/UCP, rmay@ucar.edu) Matt Mayernik (Lib/UCP, mayernik@ucar.edu) Tor Mohling (RAL, tor@ucar.edu) Eric Nienhouse (CISL, ejn@ucar.edu) David Schneider (CGD, dschneid@ucar.edu) Steven Worley (CISL, worley@ucar.edu) Dan Ziskin (ACOM, ziskin@ucar.edu) Leadership Team Liaison to NCAR Executive Committee: Bill Mahoney (RAL) DSET Chair: Steven Worley Organizing Committee: Steven Worley, Linda Cully, Abby Jaye, Don Kolinski, Matt Mayernik, Eric Nienhouse
DSET Guiding Principles 1) Cross-organizational participation - include all NCAR laboratories in the process, and keep a close relationship with the NCAR executive leadership 2) Science & user-centric development - ensure that the committee includes strong representation from scientists and technical experts 3) Document our processes - keep records of activities, findings, and decisions, for our own benefit and to share with other organizations
Funding for DSET activities Support for DSET Funding for DSET activities Meeting attendance for members Metadata Facilitator - Sfw Engr (Don Stott, EOL) Scientific Data Mgt Development Team (CISL SAGE) One-time lab funds for metadata development Data Stewardship Coordinator/Liaison (Sophie Hou, CISL)
DSET Accomplishments (FY15-FY17) Organized DSET team Inventory of digital assets Metadata evaluation and development Developed cross-cutting Search and Discovery system (in beta now) Developing requirements toward new data repository
Digital Asset Services Hub (DASH) - www2.cisl.ucar.edu/dash Provided by the Data Stewardship Engineering Team (DSET) Initiative DASH Metadata ISO standard for NCAR dialect NMDEdit metadata tool Bulk metadata ingest Lab WAF on GitHub CKAN metadata harvesting Metadata validation DASH Consulting Data Management Plans Preparation Help Samples Digital Object IDs (DOIs) Training seminars In-person assistance DASH DASH Search Built on CKAN Driven by metadata User interface (UI) Cross organization asset search and discovery Application Programming Interface (API) External metadata sharing DASH Repository Under development Trustworthy requirements NCAR governance Operational procedures Technical features User & provider functions Dimmed text indicates future features
DASH Consulting https://www2.cisl.ucar.edu/dash
DASH Metadata Define a NCAR metadata dialect Not a new local standard, blending of two existing standards that serve two distinct purposes DataCite, enables DOI assignment, publication, and general discovery ISO 19115, enables faceted (detailed) data discovery, highly used in the geosciences Challenge: Having standards doesn’t mean everybody interprets or implements the standards in the same way.
DASH Metadata Result: Two separate metadata categories Minimum Required Metadata Basic element to support simple discovery and enough to register a DOI Enhanced Metadata Will support more system features, e.g. more detailed searching and browsing
DASH Search (beta)
Developing the DASH Search How do you establish what the system must do, i.e. capabilities? Defining the system features User stories that imply requirements Personas: data providers and data consumers Synching: user needs with system and software requirements
Ranked System Features 1 Free text search 10 Real time data infrastructure 2 Search result ranking 11 Natural term definitions 3 Human consulting 12 Long term preservation 4 Asset attribution 13 Asset self archiving 5 Faceted search 14 Native metadata translation 6 Use metrics 15 Access control 7 Data format translation 16 Directed download workflows 8 Visual data asset browse 17 Metadata sharing 9 General purpose storage 18 Relationship display This list evolves based on DSET discussions and engineering work.
Search and Discovery (CKAN) DASH Search - Architecture Metadata Entry NMDEdit (Now) GitHub Metadata Repository (Old ISO) Web Tool (Future) Search and Discovery (CKAN) NCAR Repos OpenSky, EOL, RAL, CGD, etc. GitHub Metadata Repository (New ISO) https://github.com/NCAR/dset-web-accessible-folder-iso19115-3-dev https://github.com/NCAR/dset-web-accessible-folder-iso19115-3-prod
DASH Search Technology Considerations Metadata schema support & flexibility Usable by community of users (personas) Contributor community for open source technology Active support organization and documentation Sustainable development cost Sustainable operational cost Peer organization use Technology permanence and longevity
DASH Repository – Currently Developing Requirements Establishing a trusted repository: Governance Collections Scope Acquisition Workflows Lifecycle Storage Monitoring
WAG Participation Web technology best practices Portals, GitHub, Analytics, etc Usability testing/ Interface feedback Repository use cases