A DDING S TRUCTURE TO T OP -K: F ORM I TEMS TO E XPANSIONS Date : Source : CIKM’ 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh 1
I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 2
I NTRODUCTION Keyword based search interfaces are extremely popular. 3
I NTRODUCTION Google search Query → What’s the weather today? Results include ‘what’, ’weather’, ’today’. Lack of semantic. Del.icio.us Search results → Using a faceted interface. Expansions → A fixed set of tags. 4
I NTRODUCTION Motivated by these drawbacks of current search result interfaces, considering a search scenario in which each item is annotated with a set of keywords. Don’t need to assume the existence of pre-defined categorical hierarchy Want to automatically group query result items into different expansions of the query corresponding to subsets of keywords. 5
I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 6
P ROBLEM D EFINITION 7 t i.a j : normalized to [0,1] Author(0.3)Click(0.6) t t t t u(t i ) 0.6* *0.8= * *0.2= * *0.3= * *0.4=0.51
P ROBLEM D EFINITION Group items into different expansions of Q and return high quality expansions. A subset of keywords e ⊆ K − Q. (K : all keywords) Subset-of relationship for K-Q={k 1,k 2,k 3,k 4 } 8
D ETERMINING I MPORTANCE OF A N E XPANSION 9 S k1 S k1,k2 S k2,k3 t 1 (k 1 )0.4XX t 2 (k 1,k 2 )0.60.5X t 3 (k 3 )XX0.6 g(S e )
I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 10
N AÏVE A LGORITHM TopExp-Naïve algorithm 11 Access items in the non- increasing order of their attribute value For each matching item accessed, enumerate all possible expansions and update their lower bound and upper bound utility value; Round-robin
I MPROVED A LGORITHM 12 LKLK L
I MPROVED A LGORITHM 13
I MPROVED A LGORITHM TopExp-Lazy algorithm 14 Access items in the non- increasing order of their attribute value
I MPROVED A LGORITHM To count how many expansions correspond to the same set of items. Use the classical inclusion-exclusion principle. 2 |e| − count − 1 count += 2 |e’| -1 E.g. e = {k 1,k 2,k 3 } → 8 (2 |e| ) e’ = {k 1,k 2 },{k 3 } → 4 (count) 8 – 4 – 1 = 3 ({k 1, k 2, k 3 }, {k 1, k 3 } and {k 2, k 3 }). 15
I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 16
W EIGHTING E XPANSIONS 17
P ATH E XCLUSION BASED A LGORITHM 18
P ATH E XCLUSION BASED A LGORITHM 19 Assume weights are equal 1. H1H1 H2H2 G
P ATH E XCLUSION BASED A LGORITHM Top-PEkExp algorithm 20 Generate necessary expansions using TopExp-Lazy R G ←GreedyMWIS( L ); Etopk ←k expansions in L which have the largest upper bound utilities;
I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 21
E XPERIMENTS Synthetic datasets Generated 5 synthetic datasets with size from 8000 to Efficiency Scalability Memory saving Real datasets The ACM Digital Library. Demonstrate the quality of the expansions returned. 22
E XPERIMENTS Fixed N=10 and k=10 23
E XPERIMENTS Fixed number of items=10000, N = 10 24
E XPERIMENTS Fixed number of items=10000, k = 10 25
E XPERIMENTS Queries : “xml” “histogram” “privacy” Attributes : The average author publication number The citation count. Keywords : The title Keywords list Abstract 26
27
C ONCLUSION They studied the problem of how to better present search/query results to users. Proposed various efficient algorithms which can calculate top-k expansions. Not only demonstrated the performance of the proposed algorithms, also validated the quality of the expansions returned by doing a study on a real data set. 28