Download presentation
Presentation is loading. Please wait.
Published byAmy Russell Modified over 9 years ago
1
NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider dj at position posi in Li; 3) 3) E(dj) := E(dj) Є {i}; highi := si(q,dj); 4) 4) bestscore(dj) := aggr{x1,..., xm) with xi := si(q,dj) for i Є E(dj), highi for i Є E(dj); 5) 5) worstscore(dj) := aggr{x1,..., xm) with xi := si(q,dj) for i Є E(dj), 0 for i Є E(dj); 6) 6) top-k := k docs with largest worstscore; 7) 7) threshold := bestscore{d | d not in top-k}; 8) 8) if min worstscore top-k ≥ threshold then exit;
2
item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 item 83 [0.9, 2.1] item 17 [0.6, 2.1] item 25 [0.6, 2.1] worst score best-score Min top-2 score : 0.6 Threshold (Max of unseen tuples): 2.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? List 1 List 2 List 3 Candidates 0.6+0.6+0.9=2.1 NRA
3
item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score Min top-2 score : 0.9 Threshold (Max of unseen tuples): 1.8 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? item 17 [1.3, 1.8] item 83 [0.9, 2.0] item 25 [0.6, 1.9] item 38 [0.6, 1.8] item 78 [0.5, 1.8] List 1 List 2 List 3 Candidates NRA
4
item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score item 83 [1.3, 1.9] item 17 [1.3, 1.9] item 25 [0.6, 1.5] item 78 [0.5, 1.4] Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.3 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue List 1 List 2 List 3 Candidates NRA
5
item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue item 17 1.6 item 83 [1.3, 1.9] item 25 [0.6, 1.4] List 1 List 2 List 3 Candidates NRA
6
item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 Min top-2 score : 1.6 Threshold (Max of unseen tuples): 0.8 Pruning Candidates: Min top-2 < best score of candidate item 83 1.8 item 17 1.6 List 1 List 2 List 3 Candidates NRA
7
NRA performs only sorted accesses (SA) (No Random Access) Random access (RA) lookup actual (final) score of an item costlier than SA (100 – 100,000 times), cR/cS := (cost of RA)/(cost of SA) often very useful CA (Combined Algorithm), (Fagin et al., 2001) one RA after every cR/cS SAs total cost of SA ~ total cost of RA Measure of effectiveness (access cost): #SA + cR/cS x #RA Full-merge: compute scores for all items followed by partial sort simple and efficient important baseline for any top-k algorithm Problems with NRA, CA high bookkeeping overhead for “high” values of k, gain in even access cost not significant NRA
8
References IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut f ü r Informatik Saarbr ü cken, Germany Joint work with Holger Bast, Ralf Schenkel, Martin Theobald, Gerhard Weikum Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.