Data Discovery Paradigms Interest Group Report on Activities and Outputs Anita de Waard, Siri Jodha Singh Khalsa Fotis Psomopoulis Mingfang Wu.

Slides:



Advertisements
Similar presentations
Geoscience Information Network Stephen M Richard Arizona Geological Survey National Geothermal Data System.
Advertisements

SIF Status to ADC Co-Chairs
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Internal Auditing and Outsourcing
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
What is Business Analysis Planning & Monitoring?
W w w. i l u m i n a – d l i b. o r g iLumina: A Digital Library of Educational Resources for Science & Mathematics National Science Digital Library All-Projects.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Challenges of Discovery Tools Challenges of Discovery Tools Shelly Shen-Aridor Younes & Soraya Nazarian Library Haifa university, Israel Session
OpenURL Link Resolvers 101
Search Update April 1-3, 2009 Joshua Ganderson Laura Baalman.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
UK Repository Search Project Phase II Project Overview Phil Cross Vic Lyte September 2006.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
OOI-CYBERINFRASTRUCTURE OOI Cyberinfrastructure Education and Public Awareness Plan Cyberinfrastructure Design Workshop October 17-19, 2007 University.
Collaborative Portal for the IOOS Super-Regional Modeling Testbed Sara Graves, Manil Maskey, Ken Keiser Information Technology & Systems Center University.
1 Jennifer Bowen University of Rochester Chicago, IL May 9, 2007 Working Group on the Future of Bibliographic Control Second.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
ESO and the CMR Life Cycle Process Winter ESIP, Jan 2015 ESDIS Standards Office (ESO) Yonsook Enloe Allan Doyle Helen Conover.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
Global Water Information Interest Group meeting RDA 7 th Plenary, 1 st March 2016, Tokyo Global Water Information Interest Group Welcome to the inaugural.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
WP 2: Ontology & Metadata Models ITD
RDA Plenary 5 Big Data (Analytics) IG Session
Overview of WGs, IGs and BoFs
Summon® 2.0 Discovery Reinvented
Towards REF 2020 What we know and think we know about the next Research Excellence Framework Dr. Tim Brooks, Research Policy & REF Manager, RDCS Anglia.
The Components of Information Systems
User Characterization in Search Personalization
TRSS Terminology Registry Scoping Study
Usage scenarios, User Interface & tools
eInfraCentral Portal User requirements and features
Susanna-Assunta Sansone, Rebecca Lawrence and Simon Hodson
Susanna-Assunta Sansone, Rebecca Lawrence and Simon Hodson
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
knowledge organization for a food secure world
Object oriented system development life cycle
Toward FAIR Semantic Resources
Maintaining the Clinical & Nonclinical Study Data Reviewer’s Guides
The Components of Information Systems
Pilot project training
WGISS-WGCV Joint Session
FDA Objectives and Implementation Planning
ESS roadmap on Linked Open Data State of play
Exploring Scholarly Data with Rexplore
EOSC services architecture
The JISC IE Metadata Schema Registry
Linked Open Data Project
From Observational Data to Information (OD2I IG )
Archives and Records Professionals for Research Data IG
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Module 8 ACTION PLANS.
United Nations Statistics Division
Introduction to the MIABIS SOP Working Group
Interlinking standards, repositories and policies
Bird of Feather Session
MSDI training courses feedback MSDIWG10 March 2019 Busan
Information System Building Blocks
Data Discovery Paradigms
Lab 2: Information Retrieval
Data Discovery Paradigms Interest Group 4 April, 2019 RDA 13th Plenary Meeting, Philadelphia Siri Jodha Singh Khalsa Fotis Psomopoulos Mingfang Wu.
Presentation transcript:

Data Discovery Paradigms Interest Group Report on Activities and Outputs Anita de Waard, Siri Jodha Singh Khalsa Fotis Psomopoulis Mingfang Wu

Purpose and planned outcomes/aims Explore common elements and shared issues faced by those who search for data, and who build systems supporting search Engage with broadest possible spectrum of stakeholders to investigate, report, and make recommendations regarding these issues User scenarios or use cases the IG wishes to address: Data search engines are interested in developing components and practices to connect resources and results Data repositories are interested in improving and expanding search on their platforms Users are interested in better interfaces and fewer places to look for data Data creators are interested in a shared set of data metrics for all search engines Data search builders are interested in sharing knowledge and tools about ranking, relevance and content enrichment.  

Accomplishments to date Four Task Forces formed Use Cases, Prototyping Tools and Test Collections Best Practices for Making Data Findable Relevancy Ranking Metadata Enrichment Three manuscripts generated, intended for publication One submitted, one ready, one in progress Use Cases, Prototyping Tools and Test Collections: Identify a set of common data search use cases, leading to a set of requirements Meant to be useable by all data discovery services Best Practices for Making Data Findable Three key perspectives: Data Provider, Data Seeker, Data Repositories Relevancy Ranking: Choose appropriate technologies for search functionality Sharing experiences with relevancy ranking Metadata Enrichment: Survey current methods and tools for auto- generating metadata Document value of enriched metadata for improving search and automated workflows

Issues, challenges, problems Members: 100; Doers: 8 – 10 3 – 6 doers per task force Interactions with other groups minimal Also, no time to reach out to other groups

No modifications needed Are these issues sufficient to require modification of the outcome, schedules, scope? No modifications needed The TFs managed to generate outputs within the 18 mo. WG timeframe Congratulations, your Developing Shared Paradigms for Data Discovery IG charter has been reviewed and approved by TAB, and is being passed along to the RDA Council for final endorsement!

Plan for completion/progress for the coming 6-12 months Metadata enrichment TF reinvigorated Recent CFP elicited encouraging responses Plan to develop survey on current practices and tools in use Spinning up TF on optimizing discoverability by commercial search engines Other New activities suggested in session at RDA P10: Cataloging and Analysing Common Data Discovery APIs; Data Discovery for Institutional Repositories - test recommendations, explore new insights gained through using new discovery technologies; Analysis of Search logs, possible follow-up activity for the existing relevancy ranking task force; Collection and Analysis of Data Needs - identify what people usually want to find;

Relation to/coordination with other WG/IGs Although there are obvious connections with metadata, research data collections, domain focused interoperability, etc. we have had little direct interactions Participating in joint session at P11 w/ IG Agricultural Data IG ELIXIR Bridging Force WG BioSharing Registry IG Repository Platforms for Research Data

Fit with the RDA mission Discovery is central to mission of making it easier for researchers to share data Data producers, data repositories, infrastructure developers all have roles

Supporting Material

Use Cases Users want better interfaces and fewer places to look for data. Data creators need guidance on improving the findability of their data. Builders of data search engines are interested in sharing knowledge and tools to improve search services.

Ranked requirements from Use Cases REQ1: Indication of data availability REQ2: Connection of data with person, institution, pub, citations, grants REQ3: Fully annotated data REQ4: Filtering of data based on multiple fields at the same time REQ5: Cross-referencing of data REQ6: Visual analytics / inspection of data / thumbnail preview REQ7: Sharing data in a collaborative environment REQ8: Accompanying educational / training material REQ9: Functionality similar to that of other established academic portals

Ten Recommendations for Data Repositories REC 1: Provide a range of query interfaces to different various data search styles. REC 2: Provide multiple access points to find data (e.g keyword, browse, facets). REC 3: Make it easier to judge relevance, accessibility and reusability. REC 4: Make individual metadata records readable and analysable. REC 5: Be able to output bibliographic references. REC 6: Provide feedback about data usage statistics. REC 7: Be consistent with other repositories. REC 8: Identify and aggregate records that describe the same data object. REC 9: Make records easily indexed and searchable by major web search engines. REC 10: Follow API search standards and community adopted vocabularies.

Combined outputs Requirements derived from use cases Recommendations for data repositories Combined into manuscript submitted to Library & Information Science Research

Best Practices for Data Seekers Ten Simple Rules for Finding Research Data Manuscript submitted to PLOS Authors: Kathleen Gregory, Siri Jodha Khalsa, Bill Michener, Fotis Psomopoulos, Anita de Waard, and Mingfang Wu

Ten Simple Rules Think about the data you need and why you need them. Select the most appropriate resource. Construct your query. Make the repository work for you. Refine your search. Assess data relevance and fitness-for-use. Save your search and data source details. Look for data services, not just data. Monitor the latest published data. Give back.

Mapping between the REQ, the REC and the Ten Rules

Relevancy Ranking Survey - Objectives Help data repositories choose appropriate technologies when implementing or improving search functionality. Capture the aspirations, successes and challenges. Provide a forum for sharing experiences with relevancy ranking.

Survey Design (33 Questions) Repository characteristics (5) System configurations (7) Evaluation methods and benchmarks (10) Methods used to boost searchability to web search engines (2) Other technologies or system configurations (5) Wish list for future activities for the RDA relevance task force (2)

What’s Next? New activities suggested in session at RDA P10: Cataloging and Analysing Common Data Discovery APIs; Data Discovery for Institutional Repositories - test recommendations, explore new insights gained through using new discovery technologies; Analysis of Search logs, possible follow-up activity for the existing relevancy ranking task force; Collection and Analysis of Data Needs - identify what people usually want to find; Making research data more discoverable by search engines  Presentation 2 November by Natasha Noy of Google Research Task Force in process of forming - Granularity, domain-specific cross-domain issues - De-duplication of search results - Using upper-level ontologies - Search personalisation