Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

1 Top-K Algorithms: Concepts and Applications by Demetris Zeinalipour Visiting Lecturer Department of Computer Science University of Cyprus Department.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search Evica Milchevski , Avishek Anand ★ and Sebastian Michel.
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
1 The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Information Retrieval in Practice
Seung-won Hwang, Kevin Chen-Chuan Chang
Preference Query Evaluation Over Expensive Attributes
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Spatio-temporal Pattern Queries
Similarity Search: A Matching Based Approach
Time Relaxed Spatiotemporal Trajectory Joins
Efficient Processing of Top-k Spatial Preference Queries
Relax and Adapt: Computing Top-k Matches to XPath Queries
Efficient Aggregation over Objects with Extent
Presentation transcript:

Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar, Caracas, Venezuela Universidad Simón Bolívar

Page  2 Motivating Example «There are two Open Faculty Positions» « Candidates will be evaluated in terms of: Degree, Publications, Experience » « Criteria to select the best Candidates : higher academic degree, maximum number of publications and maximum years of experience » « Ties will be broken by using the GPA » Solutions: Skyline and Top-k

Page  3 4 MsC BEng IdDegreePublicationsExperienceGPA Query: Candidates with the best academic degree, number of publications and experience Answer: None of the candidates is better in all criteria simultaneous. Motivation 1 Post Dr Post Dr PhD BEng BEng 5 14

Page  4 4 Skyline Query: Select the candidates with better degree, number of publications and experience 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 User Criteria (Equally Important!) Degree Maximum Publications Maximum Multicriteria Function Experience Maximum Skyline selects candidates 1,2,3 and 4. i.e., multi-criteria induce a partial order, and ties need to be broken

Page  5 Top-k Select two candidates with the best GPA 1 Post Dr PhD IdDegreePublicationsExperienceGPA 5BEng Post Dr BEng MsC BEng Top-k identifies candidates 5 and 2, but these candidates have not the best academic merit necessarily User Criteria (Score Function!) GPA Maximum

Page  6 Preference based Queries  Select two candidates with higher GPA between the candidates with better degree, number of publications and Experience. –Cases: Skyline produces the candidates with better degree, number of publications and Experience –Skyline may be very huge and a post-processing over the Skyline is required to select k. Top-k identifies the two candidates with better GPA –False answers –Loss of results Top-k selects two candidates with good GPA Skyline selects four candidates in equality of conditions So… A combined approach is required!!

Page  7 Answer: The two candidates with the highest value in score function between the candidates preselected in terms of multicriteria function` Top-k Skyline Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 Top-k Skyline Top-k Skyline Top-k Skyline selects candidates 1 and 2 with the highest GPAs among the ones with similar academic records

Page  8 Outline  Related Work  Our Approach  Top-k Skyline Evaluation  Experimental Study  Conclusions and Future Work

Page  9 Poor Ranking Capabilities Multi-criteria-based approaches Score-based Approaches SKYLINE High Ranking capabilities Combined Approaches BNL, SFS, LESS Top-k Top-k Skyline MPro, Upper, TA, FA, NRA. BMORTKS, BDTKS Metrics: Skyline Frequency Related Work Answers can be huge! Answers may be incomplete Neither Skyline nor Top-k provides high expressivity and high ranking capabilities. Existing Techniques of Top-k Skyline completely build the Skyline. Techniques to efficiently evaluate ranking approaches are required.

Page  10 Our Challenge Efficient Implementation of Top−k Skyline operator: Build the Top-k Skyline set minimizing the non-necessary probes.  A probe p of functions m or f is necessary if and only if p is evaluated on an object o that belongs to the Top-k Skyline. 4 MsC BEng IdDegreePublicationsExperienceGPA 1 Post Dr Post Dr PhD BEng BEng 5 14 Non-Necessary Probes (Evaluations of multi-criteria or score function)! Goal: Only identify the elements of the Skyline that belongs to the answer

Page  11Pagina Top-k Skyline Evaluation  Indexed Solutions –BDTKS (Basic Distributed Top-k Skyline) –BMORTKS (Basic Multi-Objective Retrieval for Top-k Skyline) –TKSI (Top-K SkyIndex)

Page  12  BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience IdPublications IdExperience MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng Final Object! Index 1Index 2 Index 3

Page  13 2Post Dr 1014  BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and Experience 4 MsC IdDegreePublicationsExperienceGPA 1 Post Dr PhD Partial Scanning of database (the final object is found) But, BDTKS completely builds the Skyline.

Page  14  BMORTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng IdPublications IdExperience PostDr,?,?PostDr,13,4PostDr,13,?PostDr,12,4 PhD,12,3PostDr,12,3 PostDr,13,4 PhD,10,3 MsC,10,3 MsC,9,3 Virtual (Last score seen): Index 1Index 2 Index 3

Page  15 2Post Dr 1014  BMORTKS Top-k Skyline Evaluation Query: Select the two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC IdDegreePublicationsExperienceGPA 1 Post Dr PhD Partial Scanning of database (until a seen object dominates the final object) But, BMRTKS also completely builds the Skyline

Page  16  TKSI (Top-K SkyIndex) Top-k Skyline Evaluation IdGPA MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng IdPublications IdExperience Partial Scanning of database (until k incomparable objects are found) TKSI partially builds the Skyline, and minimizes the non-necessary probes Index 1Index 2Index 3Index 4

Page  17Pagina  Dataset and Queries – Random data: Value Domain: Float between 0 and 1 Data Distribution: Uniform, Gaussian and Mixed –Sixty random queries. Multi-criteria dimensions range between 2-6.  Plataform –SunFire V440, OS SunOS 5.10, two processors Sparcv9 of MHZ, 16 GB of RAM and four disks Ultra320 SCSI of 73 GB. –Java 1.5 and Oracle 9i. Experimental Study

Page  18Pagina  Average Skyline Size & Probes Experimental Study Data DistributionAverage Skyline Size (60 queries) Uniform2405 Gaussian2477 Mixed2539 Skyline size can be up to 2.6% of the input data! Probes BDTKSBMORTKS 23,749,79627,201,877 Probes on virtual object increase the number of probes of multi-criteria function!

Page  19Pagina  BDTKS and TKSI Experimental Study BDTKS executes less probes and requires less evaluation time than BMORTKS. For small k, TKSI outperforms BDTKS!

Page  20  TKSI builds the Skyline until it has calculated the k objects.  Our experimental results show that TKSI executed less probes and consumed less evaluation time.  In the Future, we plan to extend TKSI over Web data sources, and incorporate the TKSI into an existing DBMS. Conclusions and Future Work

Thanks! Q&A