Download presentation
Presentation is loading. Please wait.
Published byMaud Sutton Modified over 9 years ago
1
Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data
2
Wolf Siberski2 Overview ■Motivation ■Approaches in keyword search on structured data ■QUICK – Query Intent Construction for Keywords ■User interaction ■Algorithm ■Evaluation ■Conclusion
3
Wolf Siberski3 The Information Search Process What is my search objective? What exactly do I want to know? How do I express my search request? Which result satisfies my information need? Sutcliffe/Ennis: Towards a cognitive theory of information retrieval
4
Wolf Siberski4 IMDB Example – Keyword search In which movies did they both act? Brad PittAngelina Jolie Have they been working together? Brad Pitt Angelina JolieIMDb Brad Pitt Angelina Jolie
5
Wolf Siberski5 IMDB Example – Database search In which movies did they both act? Brad PittAngelina Jolie Are they working together, too? SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHERE A1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id AND R1.MovieId = R2.MovieId AND M.Id = R1.MovieId M.TitleM.Year 101 Biggest Celebrity Oops2004 Mr. & Mrs. Smith2005 Stars on Trial2005 The 72nd Academy Awards2000 …
6
Wolf Siberski6 Context ■Trend: general information captured as structured data (DBpedia, LinkedData, etc.) ■Limited support for complex information needs ■Keywords: Limited expressivity, but user-friendly ■Structured Queries: High expressivity, but difficult to master New ways to access this data required
7
Wolf Siberski7 IR on Structured Data (Incomplete) ■Not a new idea (Universal Relation, 1984) 1.Relevance Notion for structured data ■Extract data subgraphs (tuple joins) matching the query ■Rank results according to relevance score ■BANKS,DISCOVER, SPARK, EASE, etc. ■Can serve the ‚head‘ of user distribution, but not the long tail ■Low quality of relevance judgements [Coffmann/Weaver, CIKM10] 2.Form builder ■Enable visual construction of user-defined query forms ■Requires exploration of database schema
8
Wolf Siberski8 QUICK – Keyword Search on Databases ■User starts with keyword search ■QUICK guides user through query construction process ■Combines ■Ease-of-use of keyword search ■Expressivity of database queries G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl: From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009. http://dx.doi.org/10.1016/j.websem.2009.07.005
9
Wolf Siberski9 QUICK Search Process User Keywords Compute possible query intentions QUICK Compute selection options Refined Interpretation Selection options Select intended interpretation Select intended query Query Compute results Results Is “Brad” part of a movie title? Is “Brad” part of an actor name? … Brad Pitt Angelina Jolie “Brad” is part of an actor name Find movies where both Brad Pitt and Angelina Jolie are actors Evaluate results M.TitleM.Year 101 Biggest Ce…2004 Mr. & Mrs. Smith2005 Stars on Trial2005
10
Wolf Siberski10 QUICK – Concepts ■RDF Schema ■Query Template ■Query pattern on the schema ■Contains only free variables ■Semantic Query ■Interpretation of a keyword query ■Produced from query template by binding keywords
11
Wolf Siberski11 ■Query Hierarchy ■Semantic queries ordered by sub-query relationship ■Query Guide ■Graph including paths to all possible queries Query Guide
12
Wolf Siberski12 QUICK Example: Construction Options
13
Wolf Siberski13 QUICK Example: Query List
14
Wolf Siberski14 QUICK Example: Results
15
Wolf Siberski15 Query Guide Construction – Offline Stage ■Generate all Query Templates ■Start with one-variable queries ■Produce all possible combinations ■Repeat until max. join path length reached ■Build Inverted Index ■Terms -> Attributes ■Enables fast keyword-query mapping at runtime
16
Wolf Siberski16 Query Guide Construction – Online Stage ■Identify possible queries (leafs of query guide) ■Extract partial query graph from template graph ■Problem: query space can be very large Find minimal query guide ■Cost function: # of steps+ # of inspected suggestions ■Minimal guide: smallest maximum cost ■Depth/width tradeoff: Too flatToo deep Optimum: ln(n) split
17
Wolf Siberski17 Greedy Query Guide Construction ■Finding Minimal Guide: NP-Hard ■Use approach similar to set cover approximation ■Determine nodes (=refinement options) top-down ■Greedily select node leading to the lowest cost –Cost estimation: minimally incurred cost ■Repeat until all nodes are covered
18
Wolf Siberski18 Evaluation – Experiment Settings ■IMDB database ■Semantic Web representation ■Queries from AOL query log ■Selection criteria –Movie-related –2-5 keywords –Refers to at least 2 entities ■Manual assessment of query intention ■Search process ■Manual input of keywords ■Selection of correct option according to query intention
19
Wolf Siberski19 Evaluation – Guide Quality ■Intended construction option usually among top 3 ■Usually 3-5 clicks needed to construct query ■Effective also for large query spaces
20
Wolf Siberski20 Conclusion ■Query construction with QUICK ■Highly effective construction process ■All intentions can be constructed ■No query language or schema knowledge required ■Further directions ■Combine with relevance heuristics (IQ P ) ■More flexible user interaction –Use facets for keyword bindings –Better multi term support ■Optimized query guide generation –Exploit entity notion (QUnits) –Progressive query guide creation ■Connect to QbE/Query Form Creation
21
Wolf Siberski21 Evaluation – Performance No. of termsInitialization time (ms) Response time (ms) 2982 399319 416,7971,035 >431,8383,290 All3,659314 ■Initialization takes too much time for long queries ■RDF store as bottleneck (creation of query hierarchy) ■After initialization, response time is ok
22
Wolf Siberski22 Optimizations ■Identification of semantic queries ■Index template subsets by attribute to enable fast filtering of queries without results ■Enable fast disjunction of template subsets (e.g., ‚and on bitsets) ■QCG generation ■Parallel subquery computation ■Caching of frequent subqueries
23
Wolf Siberski23 Misc Ideas ■Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback, http://sites.google.com/site/massiciara/)
24
Wolf Siberski24 Cross Connections ■Thomas Gottron: Traditional features (e.g. TF) not useful for very short text ■Hinrich Schütze: entity related queries often ambigouous ■Michael Granitzer: cycle of refinement/exploration ■Norbert Fuhr: generate clusters based on possible queries and let users select the right cluster
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.