Download presentation
Presentation is loading. Please wait.
Published byJade Harvey Modified over 8 years ago
1
Presented by: Omar Alqahtani Spring 2016
2
Authors: Publication: ICDE 2015 Type: Research Paper 2
3
Data Exploration platforms assist users to discover interesting objects within large volumes of scientific and business data. Similar to top-k and skyline, but what is it? Data diversification is to extract from a query result, a small set of non-redundant points that are diverse among themselves according to some distance measure. Current approach is process-first-diversity-next. Drawback? Motivation: the need to efficiently provide users with effective insights during data exploration. 3
4
Progressive Data Diversification (pDiverse) scheme. The main idea is to detect and prune those data points in the query result that cannot be included in the final diverse set. By utilizing partial distance computation, will reduce the amount of CPU and I/O Incurred during query diversification. Also, Progressive Greedy (pGreedy) heuristic, which forms the core of our pDiverse scheme. Extending pGreedy to work with column-store. Integrated model, which combined range query with the diversification. Optimizing pDiverse by incorporating novel techniques for ordering of dimensions and approximation of diversity 4
5
Mostly, there are three categories of diversification: Content based -- Novelty based -- Semantic coverage based Formal definition: It is NP-Hard problem, so, greedy-based heuristics are the ones most widely used. 5
6
Presented by: Omar Alqahtani Spring 2016
7
Authors: Publication: ICDE 2015 Type: Research Paper 7
8
Query execution performance of database systems depends heavily on query optimization decisions. Best possible plan, mostly, needs cost model to estimate performance of viable alternatives. Cost models rely on statistics about the data. But? As a result, commercial DBMS often assume uniform data distributions and attribute value independence, which is in reality hardly the case. Suboptimal plans Subpar performance 8
9
9
10
They define robustness in the context of query processing as: The ability of a system to efficiently cope with unexpected and adverse conditions, and deliver near-optimal performance for all query inputs. 10
11
Based on: Understanding of the data distributions is a continuous process. Also, distribution may develop throughout the execution of a query plan. Since one execution strategy might not be optimal over the entire data set. They propose: A new class of morphable operators that continuously and seamlessly adjust their execution strategy as the understanding of the data evolves. Smooth Scan Operator that morphs between an index look-up and a full table scan, which: achieves near-optimal performance regardless of the operator’s selectivity obliviously to the existing data statistics. 11
12
Some works focus on dealing with the problem at the optimizer level, but: in dynamic environments, they could bring only partial benefits as the environment keeps changing even after optimization. Orthogonal approaches on run-time adaptivity, however: They are lacking the flexibility at the level of access paths. remain sensitive to the accuracy of statistics. 12
13
Presented by: Zohreh Raghebi Spring 2016
14
Authors: Publication: ICDE 2015 Type: Research Paper 14
15
Rapid growth of event based social network services Meetup and Plancast Connects people through events Allow users to form online groups Publish and announce events to other group members 15
16
1) Which groups would a particular user like to join? 2) Which tags might a group choose when constructing its profiles? 3) Who will attend an upcoming event? To design recommendation systems for three specific tasks 16 groups to users Tags to groups Events to users
17
[1] Proposed a factorization model To exploits social and location features for event-based group recommendation [2] Introduced a topic model To solve the tag recommendation problem for groups [3] Used a simple graph-based approach To recommend users for an event Performs the information diffusion over user network 17 Lack of general solution
18
To model the interactions between multiple entities Users, Events, Groups, and Tags Analyzing the data to extract some useful temporal patterns of user behaviors Convert the recommendation problem into a node proximity calculation problem 18
19
To evaluate the node proximity Heterogeneous graph contains multiple types of entities Influence each other via different types of interactions To balance the importance of these influences for proximity calculation The importance of them may vary from one recommendation problem to another 19
20
Random Walk with Restart (RWR) to calculate node proximity for recommendations RWR is developed on univariate Markov chain for homogeneous graphs As a generalization, multivariate Markov chain (MMC) To model the random walk process in a heterogeneous graph MMC is able to explicitly model the influences between different entities 20
21
Existing MMC based methods need to manually set the influence weights between different types of entities Multiple types of entities exist Learning scheme tries to fid the optimal set of weights 21
22
A general model, to handle multiple recommendation problems in an event-based social network To avoid the issue of manual parameter assignment Propose a learning framework to find appropriate parameters for the model The values of learned parameters indicate the importance of different types of entities in different recommendation tasks Better understandings on user behavior in an event-based social network 22
23
Presented by: Zohreh Raghebi Spring 2016
24
Authors: Publication: ICDE 2015 Type: Research Paper 24
25
Knowledge is represented as a graph There is uncertainty in the presence of each edge in the graph Uncertain graphs have been used extensively Communication networks Social networks Protein interaction networks 25
26
Identification of dense substructures within a graph Clique, a completely connected subgraph Maximal clique, is a clique that is not contained within any other clique Enumerating all maximal cliques Finding overlapping communities from social networks Finding overlapping multiple protein complexes Analysis of email networks 26
27
Clique in an uncertain graph A set of vertices that has a high probability of being a completely connected subgraph Applications Finding sets of vertices help to unearth robust communities within an uncertain graph A group of proteins such that it is likely that each protein interacts with each other protein 27
28
A set of vertices U is an α-maximal clique if U is a clique with probability at least α There does not exist a vertex set S such that U ⊂ S and S is a clique with probability at least α When α = 1, we have the notion of a maximal clique in a deterministic graph 28
29
The problem of finding reliable subgraphs Finding subgraphs that are connected with a high probability In contrast, interested in finding subgraphs that are not just connected, Fully connected with a high probability Enumerating the k cliques with the highest probability of existence Focus on enumerating all α-maximal cliques in a graph 29
30
f(n, α) be the maximum number of α-maximal cliques Proofs…………… 30
31
Using depth-first-search (DFS) with backtracking Starts with a set of vertices C that is an α-clique Incrementally adds vertices to C While retaining the property of C being an α-clique The algorithm backtracks to explore other possible vertices until all possible search paths have been explored 31
32
First, To save the effort of needing to check if a new vertex v can be used to extend C Consider only those vertices that are already connected to every vertex within C This leads us to incrementally track vertices that can still be used to extend C 32
33
Second, not all vertices that extend C into a clique preserve the property of C being an α-clique. Adding a new vertex v to C decreases the clique probability By a factor equal to the product of the edge probabilities between v and every vertex in C. Incrementally maintaining this factor for each vertex v 33
34
34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.