Dataset Search 2018.10.11 王夏霞
Background Thousands of data repositories on the web Visualization aims Thousands of data repositories on the web Easy access to datasets need Visualization Scientists, data journalists live and breath data Dataset search engine Interactive search Datahub, data.gov, … … Google Dataset Search ——Google, 2018.09 “Google Scholar for Data” LODAtlas —— project-team ILDA at Inria, CNRS and Université Paris-Sud
“Google Scholar for data” https://toolbox.google.com/datasetsearch
Architecture Provided by Datasets’ Providers Processing and Enriching
Technologies Backend: Frontend: 1, Using Structured Metadata from Data Providers ——open standards (schema.org, W3C DCAT, JSON-LD, etc.) 2, Connecting Replicas of Datasets 3, Reconciling to the Google Knowledge Graph 4, Linking to other Google Resources Frontend: Search and Ranking of Results (Google Web ranking + Additional signals)
Weakness 1, Rely on Metadata and providers ——“not showing up” problem 2, Data citation are approximate ——lack a good model 3, Ranking algorithms need to be improved 4, Lack of visualization ——e.g. Snippets ……
Another Dataset Search Engine——LODAtlas http://lodatlas.lri.fr —— project-team ILDA at Inria, CNRS and Université Paris-Sud
Achievements Two means to browse datasets: using keyword/URI Search using faceted navigation Refine the results Interactively: Charts and timelines Visual Summaries of a Dataset’s Contents: Dataset’s ID card Interactive RDF summary visualization Vocabularies Analytics tab
Examples of Use 1, Performing advanced searches ——Combination of Metadata & contents 2, Monitoring datasets ——e. g. DBPedia 3, Spotting noteworthy events 4, Comparing & contrasting the contents ——RDFQuotients-based visual summaries
Architecture Frontend Backend Database Metadata Extract classes… Summaries Java App
Workflow
Weakness 1, Each submission will be manually checked prior to inclusion 2, Some entries might be missing information …… Future Work 1, Considering additional catalogs 2, Show partial views on vocabulary definitions based on solutions ……
Google Dataset Search: Reference Google Dataset Search: https://ai.googleblog.com/2018/09/building-google-dataset-search-and.html https://www.blog.google/products/search/making-it-easier-discover-datasets/ LODAtlas: https://link.springer.com/content/pdf/10.1007%2F978-3-030-00668-6_9.pdf