Download presentation
Presentation is loading. Please wait.
Published byClaire Underwood Modified over 8 years ago
1
Presented by: Dardan Xhymshiti Fall 2015
2
Type: Research paper Authors: International conference on Very Large Data Bases. Yoonjar Park Seoul National University Seoul, Kores yjpark@kdd.snu.ac.kr Jun-Ki Min Korea Univ. of Tech. & Edu Cheonan, Korea jkmin@kut.ac.kr Kyuseok Shim Seoul National University Seoul, Korea shim@kdd.snu.ac.kr
3
MapReduce: is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. Skyline operator: The Skyline operator is used in a query and performs a filtering of results from a database so that it keeps only those objects that are not worse than any other.
4
Applications that produce large volumes of uncertain data: Social networks, Data integration Sensor data management Uncertain data sources? Data randomness, Data incompleteness, Limitation of measuring equipments.
5
Need of advanced analysis queries such as the skyline for big uncertain data.
9
The skyline probability of an instance is the probability that it appears in a possible world an is not dominated by every instance of the other objects in the possible world. The skyline probability of an objects is the sum of the skyline probabilities of its all instances. Similarly for the continuous model, we define the skyline probability of an object by using its uncertainty region and pdf.
12
Propose parallel algorithms using MapReduce to process the probabilistic skyline queries for uncertain data modeled by both discrete and continuous models. 3 filtering methods to identify probabilistic non-skyline objects in advance. Development of a single MapReduce phase algorithm PS-QP-MR. Enhances algorithm PS-QPF-MR by applying the three filtering methods additionally. Presents brute-force algorithms PS-BR-MR and PS-BRF-MR with partitioning randomly and applying the filtering methods.
13
Several algorithms have been proposed for skyline queries: Nearest Neighbor (Kossman). Papadias improved NN algorithm by using the branch-and-bound strategy. Have been proposed techniques for processing uncertain queries such as probabilistic top-K. The serial algorithms for probabilistic skyline processing over uncertain data have been introduced.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.