Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar, Caracas, Venezuela Universidad Simón Bolívar
Page 2 Motivating Example «There are two Open Faculty Positions» « Candidates will be evaluated in terms of: Degree, Publications, Experience » « Criteria to select the best Candidates : higher academic degree, maximum number of publications and maximum years of experience » « Ties will be broken by using the GPA » Solutions: Skyline and Top-k
Page 3 4 MsC BEng IdDegreePublicationsExperienceGPA Query: Candidates with the best academic degree, number of publications and experience Answer: None of the candidates is better in all criteria simultaneous. Motivation 1 Post Dr Post Dr PhD BEng BEng 5 14
Page 4 4 Skyline Query: Select the candidates with better degree, number of publications and experience 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 User Criteria (Equally Important!) Degree Maximum Publications Maximum Multicriteria Function Experience Maximum Skyline selects candidates 1,2,3 and 4. i.e., multi-criteria induce a partial order, and ties need to be broken
Page 5 Top-k Select two candidates with the best GPA 1 Post Dr PhD IdDegreePublicationsExperienceGPA 5BEng Post Dr BEng MsC BEng Top-k identifies candidates 5 and 2, but these candidates have not the best academic merit necessarily User Criteria (Score Function!) GPA Maximum
Page 6 Preference based Queries Select two candidates with higher GPA between the candidates with better degree, number of publications and Experience. –Cases: Skyline produces the candidates with better degree, number of publications and Experience –Skyline may be very huge and a post-processing over the Skyline is required to select k. Top-k identifies the two candidates with better GPA –False answers –Loss of results Top-k selects two candidates with good GPA Skyline selects four candidates in equality of conditions So… A combined approach is required!!
Page 7 Answer: The two candidates with the highest value in score function between the candidates preselected in terms of multicriteria function` Top-k Skyline Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 Top-k Skyline Top-k Skyline Top-k Skyline selects candidates 1 and 2 with the highest GPAs among the ones with similar academic records
Page 8 Outline Related Work Our Approach Top-k Skyline Evaluation Experimental Study Conclusions and Future Work
Page 9 Poor Ranking Capabilities Multi-criteria-based approaches Score-based Approaches SKYLINE High Ranking capabilities Combined Approaches BNL, SFS, LESS Top-k Top-k Skyline MPro, Upper, TA, FA, NRA. BMORTKS, BDTKS Metrics: Skyline Frequency Related Work Answers can be huge! Answers may be incomplete Neither Skyline nor Top-k provides high expressivity and high ranking capabilities. Existing Techniques of Top-k Skyline completely build the Skyline. Techniques to efficiently evaluate ranking approaches are required.
Page 10 Our Challenge Efficient Implementation of Top−k Skyline operator: Build the Top-k Skyline set minimizing the non-necessary probes. A probe p of functions m or f is necessary if and only if p is evaluated on an object o that belongs to the Top-k Skyline. 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 Non-Necessary Probes (Evaluations of multi-criteria or score function)! Goal: Only identify the elements of the Skyline that belongs to the answer
Page 11Pagina Top-k Skyline Evaluation Indexed Solutions –BDTKS (Basic Distributed Top-k Skyline) –BMORTKS (Basic Multi-Objective Retrieval for Top-k Skyline) –TKSI (Top-K SkyIndex)
Page 12 BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience IdPublications IdExperience MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng Final Object! Index 1Index 2 Index 3
Page 13 2Post Dr 1014 BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and Experience 4 MsC IdDegreePublicationsExperienceGPA 1 Post Dr PhD Partial Scanning of database (the final object is found) But, BDTKS completely builds the Skyline.
Page 14 BMORTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng IdPublications IdExperience PostDr,?,?PostDr,13,4PostDr,13,?PostDr,12,4 PhD,12,3PostDr,12,3 PostDr,13,4 PhD,10,3 MsC,10,3 MsC,9,3 Virtual (Last score seen): Index 1Index 2 Index 3
Page 15 2Post Dr 1014 BMORTKS Top-k Skyline Evaluation Query: Select the two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC IdDegreePublicationsExperienceGPA 1 Post Dr PhD Partial Scanning of database (until a seen object dominates the final object) But, BMRTKS also completely builds the Skyline
Page 16 TKSI (Top-K SkyIndex) Top-k Skyline Evaluation IdGPA MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng IdPublications IdExperience Partial Scanning of database (until k incomparable objects are found) TKSI partially builds the Skyline, and minimizes the non-necessary probes Index 1Index 2Index 3Index 4
Page 17Pagina Dataset and Queries – Random data: Value Domain: Float between 0 and 1 Data Distribution: Uniform, Gaussian and Mixed –Sixty random queries. Multi-criteria dimensions range between 2-6. Plataform –SunFire V440, OS SunOS 5.10, two processors Sparcv9 of MHZ, 16 GB of RAM and four disks Ultra320 SCSI of 73 GB. –Java 1.5 and Oracle 9i. Experimental Study
Page 18Pagina Average Skyline Size & Probes Experimental Study Data DistributionAverage Skyline Size (60 queries) Uniform2405 Gaussian2477 Mixed2539 Skyline size can be up to 2.6% of the input data! Probes BDTKSBMORTKS 23,749,79627,201,877 Probes on virtual object increase the number of probes of multi-criteria function!
Page 19Pagina BDTKS and TKSI Experimental Study BDTKS executes less probes and requires less evaluation time than BMORTKS. For small k, TKSI outperforms BDTKS!
Page 20 TKSI builds the Skyline until it has calculated the k objects. Our experimental results show that TKSI executed less probes and consumed less evaluation time. In the Future, we plan to extend TKSI over Web data sources, and incorporate the TKSI into an existing DBMS. Conclusions and Future Work
Thanks! Q&A