Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.

Slides:



Advertisements
Similar presentations
Md. Mahbub Hasan University of California, Riverside.
Advertisements

Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Solving Problem by Searching
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Efficient Informative Sensing using Multiple Robots
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Tag-based Social Interest Discovery
Viral Marketing for Dedicated Customers Presented by: Cheng Long 25 August, 2012.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Exemplar Queries: Knowledge Exploration using Information Graphs Davide Mottin, University of Trento August 20, RMIT University, Melbourne Department.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Optimal Partitioning of Fine-Grained Scalable Video Streams Mohamed Hefeeda.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Marina Drosou, Evaggelia Pitoura Computer Science Department
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
AnHai Doan & Alon Halevy Department of Computer Science & Engineering University of Washington Efficiently Ordering Query Plans for Data Integration.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Vasilis Syrgkanis Cornell University
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Gspan: Graph-based Substructure Pattern Mining
Lecture 3: Uninformed Search
Inferring Networks of Diffusion and Influence
Outline Introduction State-of-the-art solutions
Data Driven Resource Allocation for Distributed Learning
Spatio-temporal Pattern Queries
Algorithm design and Analysis
Structured Learning of Two-Level Dynamic Rankings
Conflict-Aware Event-Participant Arrangement
Finding Subgraphs with Maximum Total Density and Limited Overlap
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University of Trento Francesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 2 Issues with Pattern Search O OHO S Query 510 matches Too many matches Results are not grouped

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 3 Solution: Discovering Specializations O OHO S 510 matches O OHOH O S CH 3 O OHO S SH O OHO S H3CH3C O O S CH 3 O OHOH O S H S 448 matches46 matches114 matches 382 matches 46 matches Specializations

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 4 Applications Finding groups of molecules having a particular reagent Analyze a set of proteins to find diseases Workflow optimization Anomaly detection in a network 3D shape search with similar properties

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 5 Dealing with Specializations in Web and Relational Data Faceted Search present aspects of the results [Roy08] Query reformulation Modify some of the query conditions In structured databases [Mishra09] In web search [Dang10] First Study of Problem on GRAPHS

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 6 Graph Query Reformulation Results Query Specializations: query supergraphs … Exponential number of specializations

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 7 Challenges The number of reformulation is exponential Quantify the interestingness of a reformulation Finding query specializations is NP-complete

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 8 A Naïve Approach: k-most frequent super-patterns Query 480 matches 450 matches 100 matches Super Patterns 30 matches 420 matches Until k patterns are found: - Retrieve the most frequent super-pattern Until k patterns are found: - Retrieve the most frequent super-pattern Frequent ≠ Interesting !

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 9 Our Approach Graph Query Reformulation with Diversity Finds k meaningful specializations efficiently

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 10 Finding Meaningful Specializations Results Query Diversity Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ? Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ?

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 11 Diversity Matters Results Query Objective function f(Q) λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8 λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 12 Problem Graph Query Reformulation with Diversity 12 Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 13 Solution: Greedy Algorithm 13 Greedy While k-specializations are not found 1.Find the specialization leading to the maximum increment of the objective function (marginal gain) 2.Add the specialization to the results Theorem The algorithm is a ½-approximation Theorem The algorithm is a ½-approximation Finding the maximum gain is #P-complete [Valiant79] Solution Fast_MMPG: Branch and bound algorithm to efficiently find the specialization with the maximum marginal gain

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 14 The multiplicity vector Results Output set of specializations

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 15 Upper bound on the Marginal gain Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Upper bound : is the value of the objective function considering only results with multiplicity Theorem

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 16 Upper bound Results Output set of specializations 12111

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 17 Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far The Fast_MMPG Algorithm upper bound marginal gain

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 18 Experimental Setup 18 Datasets: AIDS: 10k chemical compounds Financial: 17k transaction workflows Web: 13k interactions with a recommender system Baseline algorithms: k-freq: returns top-k frequent supergraphs of a query LIndex: informative patterns index Experiments: Time and objective function value varying k, query size, λ Anecdotal Scalability

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 19 Time Comparison Number of specializations 1.k-freq runs only slighly faster 2.Time increases linearly in k 3.Fast_MMPG has real-time performance Query size 1.Fast_MMPG comparable to k- freq 2.Time decreases with query size (less reformulations) number of reformulations (k) query size

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 20 Objective function gain 20 Analysis 1.Lambda correctly moves the objective function towards diversity 2.k-freq only captures coverage

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 21 Qualitative evaluation k-freq Fast_MMPG C O O OH C O CH 3 C O Fe C O NH 2 C O CH 3 C O C O C C O CH 2 C C O NH 2 C O CH 2 C NH Query Analysis k-freq finds specialization of the same superquery Fast_MMPG returns reformulations with more diversified structures

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 22 Conclusions First study of the problem of query reformulation in graph databases Principled objective function optimizing coverage and diversity Algorithmic solutions with quality guarantees and real time responses

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 23 Questions? Thank you!