Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

An Ontology Creation Methodology: A Phased Approach
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Information Retrieval in Practice
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Vector Space Model CS 652 Information Extraction and Integration.
Chapter 19: Information Retrieval
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Overview of Search Engines
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Search Engine Architecture
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Information Retrieval in Practice
Search Engine Architecture
RE-Tree: An Efficient Index Structure for Regular Expressions
Personalized Social Image Recommendation
Computing Full Disjunctions
Associative Query Answering via Query Feature Similarity
Visualization of Web Search Results in 3D
Information Retrieval
Magnet & /facet Zheng Liang
Bidirectional Query Planning Algorithm
Information Retrieval and Web Design
Links Liang Zheng
Presentation transcript:

Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data

Wolf Siberski2 Overview ■Motivation ■Approaches in keyword search on structured data ■QUICK – Query Intent Construction for Keywords ■User interaction ■Algorithm ■Evaluation ■Conclusion

Wolf Siberski3 The Information Search Process What is my search objective? What exactly do I want to know? How do I express my search request? Which result satisfies my information need? Sutcliffe/Ennis: Towards a cognitive theory of information retrieval

Wolf Siberski4 IMDB Example – Keyword search In which movies did they both act? Brad PittAngelina Jolie Have they been working together? Brad Pitt Angelina JolieIMDb Brad Pitt Angelina Jolie

Wolf Siberski5 IMDB Example – Database search In which movies did they both act? Brad PittAngelina Jolie Are they working together, too? SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHERE A1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id AND R1.MovieId = R2.MovieId AND M.Id = R1.MovieId M.TitleM.Year 101 Biggest Celebrity Oops2004 Mr. & Mrs. Smith2005 Stars on Trial2005 The 72nd Academy Awards2000 …

Wolf Siberski6 Context ■Trend: general information captured as structured data (DBpedia, LinkedData, etc.) ■Limited support for complex information needs ■Keywords: Limited expressivity, but user-friendly ■Structured Queries: High expressivity, but difficult to master  New ways to access this data required

Wolf Siberski7 IR on Structured Data (Incomplete) ■Not a new idea (Universal Relation, 1984) 1.Relevance Notion for structured data ■Extract data subgraphs (tuple joins) matching the query ■Rank results according to relevance score ■BANKS,DISCOVER, SPARK, EASE, etc. ■Can serve the ‚head‘ of user distribution, but not the long tail ■Low quality of relevance judgements [Coffmann/Weaver, CIKM10] 2.Form builder ■Enable visual construction of user-defined query forms ■Requires exploration of database schema

Wolf Siberski8 QUICK – Keyword Search on Databases ■User starts with keyword search ■QUICK guides user through query construction process ■Combines ■Ease-of-use of keyword search ■Expressivity of database queries G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl: From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier,

Wolf Siberski9 QUICK Search Process User Keywords Compute possible query intentions QUICK Compute selection options Refined Interpretation Selection options Select intended interpretation Select intended query Query Compute results Results Is “Brad” part of a movie title? Is “Brad” part of an actor name? … Brad Pitt Angelina Jolie “Brad” is part of an actor name Find movies where both Brad Pitt and Angelina Jolie are actors Evaluate results M.TitleM.Year 101 Biggest Ce…2004 Mr. & Mrs. Smith2005 Stars on Trial2005

Wolf Siberski10 QUICK – Concepts ■RDF Schema ■Query Template ■Query pattern on the schema ■Contains only free variables ■Semantic Query ■Interpretation of a keyword query ■Produced from query template by binding keywords

Wolf Siberski11 ■Query Hierarchy ■Semantic queries ordered by sub-query relationship ■Query Guide ■Graph including paths to all possible queries Query Guide

Wolf Siberski12 QUICK Example: Construction Options

Wolf Siberski13 QUICK Example: Query List

Wolf Siberski14 QUICK Example: Results

Wolf Siberski15 Query Guide Construction – Offline Stage ■Generate all Query Templates ■Start with one-variable queries ■Produce all possible combinations ■Repeat until max. join path length reached ■Build Inverted Index ■Terms -> Attributes ■Enables fast keyword-query mapping at runtime

Wolf Siberski16 Query Guide Construction – Online Stage ■Identify possible queries (leafs of query guide) ■Extract partial query graph from template graph ■Problem: query space can be very large  Find minimal query guide ■Cost function: # of steps+ # of inspected suggestions ■Minimal guide: smallest maximum cost ■Depth/width tradeoff: Too flatToo deep Optimum: ln(n) split

Wolf Siberski17 Greedy Query Guide Construction ■Finding Minimal Guide: NP-Hard  ■Use approach similar to set cover approximation ■Determine nodes (=refinement options) top-down ■Greedily select node leading to the lowest cost –Cost estimation: minimally incurred cost ■Repeat until all nodes are covered

Wolf Siberski18 Evaluation – Experiment Settings ■IMDB database ■Semantic Web representation ■Queries from AOL query log ■Selection criteria –Movie-related –2-5 keywords –Refers to at least 2 entities ■Manual assessment of query intention ■Search process ■Manual input of keywords ■Selection of correct option according to query intention

Wolf Siberski19 Evaluation – Guide Quality ■Intended construction option usually among top 3 ■Usually 3-5 clicks needed to construct query ■Effective also for large query spaces

Wolf Siberski20 Conclusion ■Query construction with QUICK ■Highly effective construction process ■All intentions can be constructed ■No query language or schema knowledge required ■Further directions ■Combine with relevance heuristics (IQ P ) ■More flexible user interaction –Use facets for keyword bindings –Better multi term support ■Optimized query guide generation –Exploit entity notion (QUnits) –Progressive query guide creation ■Connect to QbE/Query Form Creation

Wolf Siberski21 Evaluation – Performance No. of termsInitialization time (ms) Response time (ms) ,7971,035 >431,8383,290 All3, ■Initialization takes too much time for long queries ■RDF store as bottleneck (creation of query hierarchy) ■After initialization, response time is ok

Wolf Siberski22 Optimizations ■Identification of semantic queries ■Index template subsets by attribute to enable fast filtering of queries without results ■Enable fast disjunction of template subsets (e.g., ‚and on bitsets) ■QCG generation ■Parallel subquery computation ■Caching of frequent subqueries

Wolf Siberski23 Misc Ideas ■Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback,

Wolf Siberski24 Cross Connections ■Thomas Gottron: Traditional features (e.g. TF) not useful for very short text ■Hinrich Schütze: entity related queries often ambigouous ■Michael Granitzer: cycle of refinement/exploration ■Norbert Fuhr: generate clusters based on possible queries and let users select the right cluster