Journal of Web Semantics 55 (2019)

Slides:



Advertisements
Similar presentations
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Advertisements

Chapter 5: Introduction to Information Retrieval
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Search Engines and Information Retrieval
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Adaptive Hypermedia Meets Provenance Evgeny Knutov Paul De Bra Mykola Pechenizkiy GAF project: Generic Adaptation Framework (project is supported byNWO.
Information Retrieval
Overview of Search Engines
The Federal Enterprise Architecture A Way Ahead on Information Sharing Bryan Aucoin Chief Technical Officer Enterra Solutions
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Search Engines and Information Retrieval Chapter 1.
Authors: Maryam Kamvar and Shumeet Baluja Date of Publication: August 2007 Name of Speaker: Venkatasomeswara Pawan Addanki.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Subject (Exam) Review WSTA 2015 Trevor Cohn. Exam Structure Worth 50 marks Parts: – A: short answer [14] – B: method questions [18] – C: algorithm questions.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Advanced Technical Writing 2006 Session #13. Today In Class ► The third analytic perspective: workflows & production models ► Thinking about “metadata”
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
JOSE GONZALEZ WINTER 2010 Excessive Texting. Project Overview Introduction Descriptive Statistics  Histogram  Line Chart  Box Plot Results Conclusion.
Discovery and Metadata March 9, 2004 John Weatherley
Introduction to Digital Analytics Keith MacDonald Guest Presentation.
Essex Insight Introduction to Essex Insight Training Guide Source: Research and Analysis Unit v4.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Information Retrieval in Practice
Information Retrieval in Practice
OARE Module 5A: Scopus (Elsevier)
BIO1130 Lab 2 Scientific literature
The Semantic Web By: Maulik Parikh.
ArcGIS Data Reviewer: Assessing Positional Accuracy
Information Retrieval (in Practice)
Multimedia Content-Based Retrieval
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
HSC Legal Studies.
E-Commerce Lecture 8.
Discover How Your Business Can Benefit from a Facebook Fanpage
Search Engine Architecture
Improvements to Search
Personalized, Interactive Question Answering on the Web
Web IR: Recent Trends; Future of Web Search
Click on the Create Student Account Link
Information Retrieval

Indicator structure and common elements for information flow
BIO1130 Lab 2 Scientific literature
Unit# 5: Internet and Worldwide Web
Steps in accessing E-journals (Emerald & Ebscohost)
International Marketing and Output Database Conference 2005
Web archives as a research subject
Towards Unified Management
Information Retrieval and Web Design
EERQI Innovative Indicators and Test Results
Introduction Dataset search
Measuring Learning During Search: Differences in Interactions, Eye-Gaze, and Semantic Similarity to Expert Knowledge Florian Groß Mai
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Journal of Web Semantics 55 (2019) Characterising Dataset Search – an Analysis of Search Logs and Data Requests Journal of Web Semantics 55 (2019)

Introduction Background of Dataset Search Goal the generation of metadata needs to be done on a property by property basis, which represents a cost for data publishers current open data portal solutions base their metadata search on indexing free text descriptions of datasets and applying document modelling and search techniques Goal to advance towards the understanding of the most important properties of a dataset description from the point of view of data consumers by analysing how people search for data on current portals reduces the time and effort advanced search functionalities

Contribution A systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search.

Related Work Web Search Dataset Search General Web Search Vertical Search (e.g. e-mail search, people search in Facebook) Dataset Search Relatively unexplored area compared to document search Many portals are based on CKAN -> Solr -> Lucene -> TF-IDF, but in structured documents main topic or the key concepts might be mentioned only once

Related Work Metrics for general web search query log analysis Query Length and Distribution Query Types Classification User and Session Statistics Query Structure Topics

Methodology Used three types of data in experiments Internal search logs Queries issued directly to the internal search capacity of a data portal into the search box. External search logs Queries issued through a general web search engines search as that lead to a page of the data portal. Data requests Data requests are a representation of information needs submitted by users of a data portal in order to get a specific dataset that they usually could not find.

Findings - Users Location Devices An overwhelming majority are desktop computers (85% on average) Time of access Users are mostly active during weekdays, and weekends is approximately half or a third of that during week days. Channels the majority of users reach portals through the result page of a web search engine

Findings - Users Browsers a higher share of IE users by almost 10% compared to general web browser usage New and returning users returning users view on average more pages and engage in longer sessions. Search exits and refinements much higher than that of another government website

Findings – Internal & External Queries Query length Internal queries External queries

Findings – Internal & External Queries Query types

Findings – Internal & External Queries Query topics

Findings – Data Requests Data Attributes Geospatial information (n = 77.5%) Temporal information (n = 44%) Restriction (n = 26.5%) Granularity (n = 24.5%)

Findings – Data Requests Request Context Representation and structure Expected outcome Rationale Quality

Conclusion Dataset queries are generally short. Dataset search seems to occur mostly in a work-related environment. There is a difference in topics, length and structure between dataset queries issued directly to data portals and dataset queries issued to web search engines. Data requests describe the data by using boundaries and restrictions about location, temporality, specific data type and/or specific granularity The prioritary properties to describe datasets are temporal and geospatial coverage, with varying levels of granularity.