Presentation is loading. Please wait.

Presentation is loading. Please wait.

Journal of Web Semantics 55 (2019)

Similar presentations

Presentation on theme: "Journal of Web Semantics 55 (2019)"— Presentation transcript:

1 Journal of Web Semantics 55 (2019)
Characterising Dataset Search – an Analysis of Search Logs and Data Requests Journal of Web Semantics 55 (2019)

2 Introduction Background of Dataset Search Goal
the generation of metadata needs to be done on a property by property basis, which represents a cost for data publishers current open data portal solutions base their metadata search on indexing free text descriptions of datasets and applying document modelling and search techniques Goal to advance towards the understanding of the most important properties of a dataset description from the point of view of data consumers by analysing how people search for data on current portals reduces the time and effort advanced search functionalities

3 Contribution A systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search.

4 Related Work Web Search Dataset Search General Web Search
Vertical Search (e.g. search, people search in Facebook) Dataset Search Relatively unexplored area compared to document search Many portals are based on CKAN -> Solr -> Lucene -> TF-IDF, but in structured documents main topic or the key concepts might be mentioned only once

5 Related Work Metrics for general web search query log analysis
Query Length and Distribution Query Types Classification User and Session Statistics Query Structure Topics

6 Methodology Used three types of data in experiments
Internal search logs Queries issued directly to the internal search capacity of a data portal into the search box. External search logs Queries issued through a general web search engines search as that lead to a page of the data portal. Data requests Data requests are a representation of information needs submitted by users of a data portal in order to get a specific dataset that they usually could not find.

7 Findings - Users Location Devices
An overwhelming majority are desktop computers (85% on average) Time of access Users are mostly active during weekdays, and weekends is approximately half or a third of that during week days. Channels the majority of users reach portals through the result page of a web search engine

8 Findings - Users Browsers
a higher share of IE users by almost 10% compared to general web browser usage New and returning users returning users view on average more pages and engage in longer sessions. Search exits and refinements much higher than that of another government website

9 Findings – Internal & External Queries
Query length Internal queries External queries

10 Findings – Internal & External Queries
Query types

11 Findings – Internal & External Queries
Query topics

12 Findings – Data Requests
Data Attributes Geospatial information (n = 77.5%) Temporal information (n = 44%) Restriction (n = 26.5%) Granularity (n = 24.5%)

13 Findings – Data Requests
Request Context Representation and structure Expected outcome Rationale Quality

14 Conclusion Dataset queries are generally short.
Dataset search seems to occur mostly in a work-related environment. There is a difference in topics, length and structure between dataset queries issued directly to data portals and dataset queries issued to web search engines. Data requests describe the data by using boundaries and restrictions about location, temporality, specific data type and/or specific granularity The prioritary properties to describe datasets are temporal and geospatial coverage, with varying levels of granularity.

Download ppt "Journal of Web Semantics 55 (2019)"

Similar presentations

Ads by Google