Download presentation
Presentation is loading. Please wait.
Published byOwen Glenn Modified over 9 years ago
1
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
2
Outline Motivation The framework and problem definition Proposed solution Discussions Experiments
3
Heating up discussion We basically know how web search engines work. –Having web crawlers collecting web-page information, index and rank them. How do we define searching in a relational database –Free-style search v.s. SQL + predicates ? –What’s the expected outcome? –How do we rank results?
4
Motivation Searching over a relational database –information scattered in different relations
5
Motivation Full text search, aggregation already supported by RDBMS –What else do we need in order to perform good searching?
6
Related work Information Retrieval (full text searching) Researches in Text Databases Explore database via foreign key-primary key –DBExplorer (ICDE 2002) –BANKS (ICDE 2002) –DISCOVER (VLDB 2002) What are related work missing –Target objects don’t contain keywords –Lack of scoring function for query results –Not utilizing aggregates to put together search results for multiple keywords
7
Contributions Introduce an interesting problem domain Define “Object Finder” (OF) queries Propose scoring functions Propose a solution to process OF query –Return top K ranked results –Efficient early termination property
8
System Overview
9
Scoring functions Scoring Matrixes and row- column- marginal's
10
Scoring semantics All Query Keywords Present in each document –can be too restrictive All Query Keywords Present in Set of Related Documents –can not use MIN as row-marginal scoring Pseudo-document Approach: –enlarged searching space
11
Problem definition Object finder problem:
12
Process OF query as Top-K query Top-K query incorporates ranking. Results are total ordered if we process strong top-K A good algorithm can utilize early termination to avoid processing of results that are not in top-K
13
Top-K query processing General framework: Supporting Ad-hoc Ranking Aggregates SIGMOD 2006 ( presented in May) *SELECT* ga_1,..ga_n, F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n----group def. *ORDER BY* F ----ordering func. an aggregate *LIMIT* k ----Top-k setting
14
Top-K query processing For OF query, it is select TOId, TOValue, score(TOId) from TargetTable T, R, L1,...,LN where R.TOId = T.TOID and R.DocId=Li.DocID (i=1..N) group by TOId, TOValue order By score(TOId) limit k
15
My work is done (please try to recall my last talk)
16
Algorithm : Generate-Prune Phrase I : Compute top-K candidates
17
Algorithm Overview
18
Algorithm Phrase II Compute exact top-K
19
Discussions In this work –Choice of aggregation function –ranking function in general –How do you think of this work Not limited –Impact of more complicated schema –Impact of selectivity of the query
20
Experiment Results Faster than SQL Faster than Generate-Only Robust to # of keywords and selections Intuitive Results
21
Experiments Faster than SQL
22
Experiments Faster than Generate-Only
23
Experiments Robust to # of keywords and selections
24
Thank you Questions to discuss?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.