Download presentation
Presentation is loading. Please wait.
Published byJeremy Francis Modified over 9 years ago
1
Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar, Caracas, Venezuela {mgoncalves,mvidal}@usb.ve Universidad Simón Bolívar
2
Page 2 Motivating Example «There are two Open Faculty Positions» « Candidates will be evaluated in terms of: Degree, Publications, Experience » « Criteria to select the best Candidates : higher academic degree, maximum number of publications and maximum years of experience » « Ties will be broken by using the GPA » Solutions: Skyline and Top-k
3
Page 3 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA Query: Candidates with the best academic degree, number of publications and experience Answer: None of the candidates is better in all criteria simultaneous. Motivation 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 23.5 7BEng 5 14
4
Page 4 4 Skyline Query: Select the candidates with better degree, number of publications and experience 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 User Criteria (Equally Important!) Degree Maximum Publications Maximum Multicriteria Function Experience Maximum Skyline selects candidates 1,2,3 and 4. i.e., multi-criteria induce a partial order, and ties need to be broken
5
Page 5 Top-k Select two candidates with the best GPA 1 Post Dr 92 3.75 3PhD 12 23.75 IdDegreePublicationsExperienceGPA 5BEng 7 34.5 2Post Dr 1014 7BEng 5 14 4 MsC13 43.6 6BEng 6 33.5 Top-k identifies candidates 5 and 2, but these candidates have not the best academic merit necessarily User Criteria (Score Function!) GPA Maximum
6
Page 6 Preference based Queries Select two candidates with higher GPA between the candidates with better degree, number of publications and Experience. –Cases: Skyline produces the candidates with better degree, number of publications and Experience –Skyline may be very huge and a post-processing over the Skyline is required to select k. Top-k identifies the two candidates with better GPA –False answers –Loss of results Top-k selects two candidates with good GPA Skyline selects four candidates in equality of conditions So… A combined approach is required!!
7
Page 7 Answer: The two candidates with the highest value in score function between the candidates preselected in terms of multicriteria function` Top-k Skyline Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 Top-k Skyline Top-k Skyline Top-k Skyline selects candidates 1 and 2 with the highest GPAs among the ones with similar academic records
8
Page 8 Outline Related Work Our Approach Top-k Skyline Evaluation Experimental Study Conclusions and Future Work
9
Page 9 Poor Ranking Capabilities Multi-criteria-based approaches Score-based Approaches SKYLINE High Ranking capabilities Combined Approaches BNL, SFS, LESS Top-k Top-k Skyline MPro, Upper, TA, FA, NRA. BMORTKS, BDTKS Metrics: Skyline Frequency Related Work Answers can be huge! Answers may be incomplete Neither Skyline nor Top-k provides high expressivity and high ranking capabilities. Existing Techniques of Top-k Skyline completely build the Skyline. Techniques to efficiently evaluate ranking approaches are required.
10
Page 10 Our Challenge Efficient Implementation of Top−k Skyline operator: Build the Top-k Skyline set minimizing the non-necessary probes. A probe p of functions m or f is necessary if and only if p is evaluated on an object o that belongs to the Top-k Skyline. 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 Non-Necessary Probes (Evaluations of multi-criteria or score function)! Goal: Only identify the elements of the Skyline that belongs to the answer
11
Page 11Pagina Top-k Skyline Evaluation Indexed Solutions –BDTKS (Basic Distributed Top-k Skyline) –BMORTKS (Basic Multi-Objective Retrieval for Top-k Skyline) –TKSI (Top-K SkyIndex)
12
Page 12 BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng Final Object! Index 1Index 2 Index 3
13
Page 13 2Post Dr 1014 BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and Experience 4 MsC13 43.6 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 3PhD 12 23.75 Partial Scanning of database (the final object is found) But, BDTKS completely builds the Skyline.
14
Page 14 BMORTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 PostDr,?,?PostDr,13,4PostDr,13,?PostDr,12,4 PhD,12,3PostDr,12,3 PostDr,13,4 PhD,10,3 MsC,10,3 MsC,9,3 Virtual (Last score seen): Index 1Index 2 Index 3
15
Page 15 2Post Dr 1014 BMORTKS Top-k Skyline Evaluation Query: Select the two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC13 43.6 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 3PhD 12 23.75 Partial Scanning of database (until a seen object dominates the final object) But, BMRTKS also completely builds the Skyline
16
Page 16 TKSI (Top-K SkyIndex) Top-k Skyline Evaluation 1 3.75 33.75 IdGPA 54.5 2424 7474 4 3.6 63.5 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 Partial Scanning of database (until k incomparable objects are found) TKSI partially builds the Skyline, and minimizes the non-necessary probes Index 1Index 2Index 3Index 4
17
Page 17Pagina Dataset and Queries –100.000 Random data: Value Domain: Float between 0 and 1 Data Distribution: Uniform, Gaussian and Mixed –Sixty random queries. Multi-criteria dimensions range between 2-6. Plataform –SunFire V440, OS SunOS 5.10, two processors Sparcv9 of 1.281 MHZ, 16 GB of RAM and four disks Ultra320 SCSI of 73 GB. –Java 1.5 and Oracle 9i. Experimental Study
18
Page 18Pagina Average Skyline Size & Probes Experimental Study Data DistributionAverage Skyline Size (60 queries) Uniform2405 Gaussian2477 Mixed2539 Skyline size can be up to 2.6% of the input data! Probes BDTKSBMORTKS 23,749,79627,201,877 Probes on virtual object increase the number of probes of multi-criteria function!
19
Page 19Pagina BDTKS and TKSI Experimental Study BDTKS executes less probes and requires less evaluation time than BMORTKS. For small k, TKSI outperforms BDTKS!
20
Page 20 TKSI builds the Skyline until it has calculated the k objects. Our experimental results show that TKSI executed less probes and consumed less evaluation time. In the Future, we plan to extend TKSI over Web data sources, and incorporate the TKSI into an existing DBMS. Conclusions and Future Work
21
Thanks! Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.