Data Discovery Paradigms RDA Interest Group
Goals of the DDPIG Founding Co-Chairs Goal Stakeholders Siri Jodha Khalsa, Univ. of Colorado Anita de Waard, Elsevier Goal Identify, study and make recommendations concerning issues related to improving data discovery Stakeholders Data producers, data repositories, data seekers
Activities 23 topics identified in Kickoff meeting at RDA#8 74 people signed up for the group Later, these topics refined and voted on, leading to 5 top picks 1. Best Practices for making data findable 2. Use cases, prototyping tools and test collections 3. Metadata enrichment 4. Cataloging common API's 5. Relevancy ranking Task forces were formed and leads identified 1, 2 and 5 got to work immediately Leads of 3 and 4 have been slower to start Two very productive TF leads were asked to become co-chairs Mingfang Wu, Australian National Data Service Fotis Psomopoulos, Aristotle University of Thessaloniki
IG Session at RDA P9 Attendance ~40 First three Task Forces presented their progress and proposed next steps Metadata Enrichment Task Force was formed with new leads Agreed follow-up actions leading to P10: Relevancy ranking: Sending out questionnaire, collect and prioritise collaborative projects; decide on platform for testbed Use cases: Rank use cases, rewriting document, provide examples of platforms, write final report Best practices: further edits on document, combine into a white paper, submit for publication Metadata enrichment: start regular telecons to plan next steps.
Best Practices for making data findable Co-Leads: Anita de Waard Jeffrey Grethe William Michener Mingfang Wu Members: 26 Scope Explore current practices of making data finable and to recommend best practices to the data community Activity to Date: Drafted 3 documents Best practices for Data Producers Best practices for Data Repositories Best practices for Data Seekers Plan to submit to journal for publication
Use cases, prototyping tools and test collections Leads: Anita de Waard Antica Culina Fotis Psomopoulos Jens Klump Mingfang Wu Members: 15 Scope Identify key requirements evident across data discovery use-cases from various scientific domains Activity to Date: Collected >60 use cases in the form of: “As a” (i.e. role), “Theme” (i.e. scientific domain/discipline), “I want” (i.e. requirement, missing feature, supported function), “So that” (i.e. what can be accomplished when the user need has been addressed), “Comments”
Relevancy ranking Leads: Members: 11 Scope Activity to Date: Peter Cotroneo Mingfang Wu SiriJodha Khalsa Members: 11 Scope Help with selection of appropriate technologies for improving search functionality Provide a means or forum for sharing experiences/tools/test collections related to relevancy ranking. Work with data search community to explore what are realistic and yet reliable ways for data repositories to carry out relevancy ranking comparison and evaluation tasks Activity to Date: Preparation of survey on relevancy ranking systems to be sent to large list of repositories
Metadata enrichment Leads: Members: TBD Activity to Date: Beth Huffer Ilya Zaslavsky Members: TBD Activity to Date: Two telecons since P9 to discuss scoping Metadata enrichment: Bill Michener (dropped out) / Margaret Spyker (never responded?)
Outreach to other RDA Groups Prior to P8, emails were sent to these RDA Groups inviting feedback on interim outputs from the first 3 task forces active-data-management-plans data-versioning domain-repositories education-and-training-handling-research-data health-data libraries-research-data metadata national-data-services pid preservation-e-infrastructure rdacodata-materials-data-infrastructure-interoperability rdawds-certification-digital-repositories rdawdsx-publishing-data repository-platforms-research-data Research Data Alliance DDPIG Interim Outputs for review and comment send March 23 We wish to share with you the draft outputs created by three of the Task Force teams of the RDA Data Discovery Paradigms Interest Group. We think one or more of these outputs are relevant to the work your IG is doing. Your thoughts and feedback on the three interim documents will be greatly appreciated: • Relevancy Ranking Task Force. See, in particular, items 2 and 3 under "Progress" • Use Cases, Prototyping Tools and Test Collections Task Force. See, in particular, the Google Spreadsheet of captured use cases under "Deliverables" • Best Practices for Making Data Findable Task Force. See, in particular, the three draft best practices documents under "Deliverables". We will also be discussing these outputs at the 9th RDA Plenary in Barcelona, which will take place Tuesday, April 5, 16:00 - 17:30 https://www.rd-alliance.org/ig-data-discovery-paradigms-rda-9th-plenary-meeting. We hope you can join us there! 1 reply - invitation to participate in geospatial IG
Potentially Fruitful Collaborations Sharing of approaches for improving fundability among domain repositories Contributing data discovery use cases to a common database of use cases Providing a testbed for experimentation with retrieval/ranking algorithms. Have offers, suggestions from: US NDS ANDS Elsevier’s AWS EC2 Certainly NDS Labs is an option. Peter also suggested Elsevier can provide AWS EC2 instances for a relevancy test bed. The Elsevier team could probably clone the machines that they used during the recent bioCADDIE Challenge. I will check with our ANDS service director to see if ANDS could provide a corpus of all those metadata records published to Research Data Australia, and ask other repositoris in the survey if they could to the same as well. This will be a good topic for the Plenary: there are testbeds, tech stacks and corpora available, hopefully we will also have a list of relevancy activities from the survey; we can call people to participate activities such as twisting ranking algorithms and parameters, building test collections (developing search topics + relevance assessment), and evaluation.