Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Outline Motivation The framework and problem definition Proposed solution Discussions Experiments

Heating up discussion We basically know how web search engines work. –Having web crawlers collecting web-page information, index and rank them. How do we define searching in a relational database –Free-style search v.s. SQL + predicates ? –What’s the expected outcome? –How do we rank results?

Motivation Searching over a relational database –information scattered in different relations

Motivation Full text search, aggregation already supported by RDBMS –What else do we need in order to perform good searching?

Related work Information Retrieval (full text searching) Researches in Text Databases Explore database via foreign key-primary key –DBExplorer (ICDE 2002) –BANKS (ICDE 2002) –DISCOVER (VLDB 2002) What are related work missing –Target objects don’t contain keywords –Lack of scoring function for query results –Not utilizing aggregates to put together search results for multiple keywords

Contributions Introduce an interesting problem domain Define “Object Finder” (OF) queries Propose scoring functions Propose a solution to process OF query –Return top K ranked results –Efficient early termination property

System Overview

Scoring functions Scoring Matrixes and row- column- marginal's

Scoring semantics All Query Keywords Present in each document –can be too restrictive All Query Keywords Present in Set of Related Documents –can not use MIN as row-marginal scoring Pseudo-document Approach: –enlarged searching space

Problem definition Object finder problem:

Process OF query as Top-K query Top-K query incorporates ranking. Results are total ordered if we process strong top-K A good algorithm can utilize early termination to avoid processing of results that are not in top-K

Top-K query processing General framework: Supporting Ad-hoc Ranking Aggregates SIGMOD 2006 ( presented in May) *SELECT* ga_1,..ga_n, F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n----group def. *ORDER BY* F ----ordering func. an aggregate *LIMIT* k ----Top-k setting

Top-K query processing For OF query, it is select TOId, TOValue, score(TOId) from TargetTable T, R, L1,...,LN where R.TOId = T.TOID and R.DocId=Li.DocID (i=1..N) group by TOId, TOValue order By score(TOId) limit k

My work is done (please try to recall my last talk)

Algorithm : Generate-Prune Phrase I : Compute top-K candidates

Algorithm Overview

Algorithm Phrase II Compute exact top-K

Discussions In this work –Choice of aggregation function –ranking function in general –How do you think of this work Not limited –Impact of more complicated schema –Impact of selectivity of the query

Experiment Results Faster than SQL Faster than Generate-Only Robust to # of keywords and selections Intuitive Results

Experiments Faster than SQL

Experiments Faster than Generate-Only

Experiments Robust to # of keywords and selections

Thank you Questions to discuss?

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Similar presentations

Presentation on theme: "Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

Similar presentations

Presentation on theme: "Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al."— Presentation transcript:

Similar presentations

About project

Feedback