Presented by: Omar Alqahtani Spring 2016. Authors: Publication:  ICDE 2015 Type:  Research Paper 2.

Slides:

Advertisements

Similar presentations

Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.

Advertisements

Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.

Community Detection and Graph-based Clustering

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Optimal Rectangle Packing: A Meta-CSP Approach Chris Reeson Advanced Constraint Processing Fall 2009 By Michael D. Moffitt and Martha E. Pollack, AAAI.

Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)

Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.

Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.

Models of Influence in Online Social Networks

A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Network Aware Resource Allocation in Distributed Clouds.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Protecting Sensitive Labels in Social Network Data Anonymization.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

Session-8 Data Management for Decision Support

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

A Novel Multicast Routing Protocol for Mobile Ad Hoc Networks Zeyad M. Alfawaer, GuiWei Hua, and Noraziah Ahmed American Journal of Applied Sciences 4:

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Marina Drosou, Evaggelia Pitoura Computer Science Department

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

Online Social Networks and Media

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.

De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.

Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.

Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.

1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.

GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.

Cohesive Subgraph Computation over Large Graphs

Finding Dense and Connected Subgraphs in Dual Networks

Greedy & Heuristic algorithms in Influence Maximization

Efficient Join Query Evaluation in a Parallel Database System

A paper on Join Synopses for Approximate Query Answering

On Efficient Graph Substructure Selection

Presentation transcript:

Presented by: Omar Alqahtani Spring 2016

Authors: Publication:  ICDE 2015 Type:  Research Paper 2

 Data Exploration platforms assist users to discover interesting objects within large volumes of scientific and business data.  Similar to top-k and skyline, but what is it?  Data diversification is to extract from a query result, a small set of non-redundant points that are diverse among themselves according to some distance measure.  Current approach is process-first-diversity-next. Drawback?  Motivation: the need to efficiently provide users with effective insights during data exploration. 3

 Progressive Data Diversification (pDiverse) scheme.  The main idea is to detect and prune those data points in the query result that cannot be included in the final diverse set.  By utilizing partial distance computation, will reduce the amount of CPU and I/O Incurred during query diversification.  Also,  Progressive Greedy (pGreedy) heuristic, which forms the core of our pDiverse scheme.  Extending pGreedy to work with column-store.  Integrated model, which combined range query with the diversification.  Optimizing pDiverse by incorporating novel techniques for ordering of dimensions and approximation of diversity 4

 Mostly, there are three categories of diversification: Content based -- Novelty based -- Semantic coverage based  Formal definition:  It is NP-Hard problem, so, greedy-based heuristics are the ones most widely used. 5

Presented by: Omar Alqahtani Spring 2016

Authors: Publication:  ICDE 2015 Type:  Research Paper 7

 Query execution performance of database systems depends heavily on query optimization decisions.  Best possible plan, mostly, needs cost model to estimate performance of viable alternatives.  Cost models rely on statistics about the data. But?  As a result, commercial DBMS often assume uniform data distributions and attribute value independence, which is in reality hardly the case.  Suboptimal plans  Subpar performance 8

9

 They define robustness in the context of query processing as: The ability of a system to efficiently cope with unexpected and adverse conditions, and deliver near-optimal performance for all query inputs. 10

Based on:  Understanding of the data distributions is a continuous process.  Also, distribution may develop throughout the execution of a query plan.  Since one execution strategy might not be optimal over the entire data set. They propose:  A new class of morphable operators that continuously and seamlessly adjust their execution strategy as the understanding of the data evolves.  Smooth Scan Operator that morphs between an index look-up and a full table scan, which:  achieves near-optimal performance regardless of the operator’s selectivity  obliviously to the existing data statistics. 11

 Some works focus on dealing with the problem at the optimizer level, but:  in dynamic environments, they could bring only partial benefits as the environment keeps changing even after optimization.  Orthogonal approaches on run-time adaptivity, however:  They are lacking the flexibility at the level of access paths.  remain sensitive to the accuracy of statistics. 12

Presented by: Zohreh Raghebi Spring 2016

Authors: Publication:  ICDE 2015 Type:  Research Paper 14

 Rapid growth of event based social network services  Meetup and Plancast  Connects people through events  Allow users to form online groups  Publish and announce events to other group members 15

 1) Which groups would a particular user like to join?  2) Which tags might a group choose when constructing its profiles?  3) Who will attend an upcoming event?  To design recommendation systems for three specific tasks 16 groups to users Tags to groups Events to users

 [1] Proposed a factorization model  To exploits social and location features for event-based group recommendation  [2] Introduced a topic model  To solve the tag recommendation problem for groups  [3] Used a simple graph-based approach  To recommend users for an event  Performs the information diffusion over user network 17 Lack of general solution

 To model the interactions between multiple entities  Users, Events, Groups, and Tags  Analyzing the data to extract some useful temporal patterns of user behaviors  Convert the recommendation problem into a node proximity calculation problem 18

 To evaluate the node proximity  Heterogeneous graph contains multiple types of entities  Influence each other via different types of interactions  To balance the importance of these influences for proximity calculation  The importance of them may vary from one recommendation problem to another 19

 Random Walk with Restart (RWR) to calculate node proximity for recommendations  RWR is developed on univariate Markov chain for homogeneous graphs  As a generalization, multivariate Markov chain (MMC)  To model the random walk process in a heterogeneous graph  MMC is able to explicitly model the influences between different entities 20

 Existing MMC based methods need to manually set the influence weights between different types of entities  Multiple types of entities exist  Learning scheme tries to fid the optimal set of weights 21

 A general model, to handle multiple recommendation problems in an event-based social network  To avoid the issue of manual parameter assignment  Propose a learning framework to find appropriate parameters for the model  The values of learned parameters indicate the importance of different types of entities in different recommendation tasks  Better understandings on user behavior in an event-based social network 22

Presented by: Zohreh Raghebi Spring 2016

Authors: Publication:  ICDE 2015 Type:  Research Paper 24

 Knowledge is represented as a graph  There is uncertainty in the presence of each edge in the graph  Uncertain graphs have been used extensively  Communication networks  Social networks  Protein interaction networks 25

 Identification of dense substructures within a graph  Clique, a completely connected subgraph  Maximal clique, is a clique that is not contained within any other clique  Enumerating all maximal cliques  Finding overlapping communities from social networks  Finding overlapping multiple protein complexes  Analysis of networks 26

 Clique in an uncertain graph  A set of vertices that has a high probability of being a completely connected subgraph  Applications  Finding sets of vertices help to unearth robust communities within an uncertain graph  A group of proteins such that it is likely that each protein interacts with each other protein 27

 A set of vertices U is an α-maximal clique if U is a clique with probability at least α  There does not exist a vertex set S such that U ⊂ S and S is a clique with probability at least α  When α = 1, we have the notion of a maximal clique in a deterministic graph 28

 The problem of finding reliable subgraphs  Finding subgraphs that are connected with a high probability  In contrast, interested in finding subgraphs that are not just connected,  Fully connected with a high probability  Enumerating the k cliques with the highest probability of existence  Focus on enumerating all α-maximal cliques in a graph 29

 f(n, α) be the maximum number of α-maximal cliques  Proofs…………… 30

 Using depth-first-search (DFS) with backtracking  Starts with a set of vertices C that is an α-clique  Incrementally adds vertices to C  While retaining the property of C being an α-clique  The algorithm backtracks to explore other possible vertices  until all possible search paths have been explored 31

 First, To save the effort of needing to check if a new vertex v can be used to extend C  Consider only those vertices that are already connected to every vertex within C  This leads us to incrementally track vertices that can still be used to extend C 32

 Second, not all vertices that extend C into a clique preserve the property of C being an α-clique.  Adding a new vertex v to C decreases the clique probability  By a factor equal to the product of the edge probabilities between v and every vertex in C.  Incrementally maintaining this factor for each vertex v 33

34