1 Substructure Similarity Search in Graph Databases R95922022 陳芃安.

Slides:



Advertisements
Similar presentations
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Advertisements

CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
University of Illinois at Urbana-Champaign Graph Indexing: Tree + Δ ≥ Graph Peixiang Zhao Jeffrey Xu Yu Philip S. Yu Peixiang Zhao Jeffrey Xu Yu Philip.
Movie theatre service on brightness and volume range leading to maximum clique graph By, Usha Kavirayani.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Graph Substructure Search Xuemin Lin School of Computer Science and Engineering University of New South Wales Sydney, Australia.
Association Analysis (7) (Mining Graphs)
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Query Relaxation for XML Database Award #: PI: Wesley W. Chu Computer Science Dept. UCLA.
Silvio Cesare Ph.D. Candidate, Deakin University.
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
MCS312: NP-completeness and Approximation Algorithms
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Fundamentals of Algorithms MCS - 2 Lecture # 7
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.
Chapter 7 Complex Similarity Topix. About this chapter Extends previous discussed methods The reader can choose to read about only specific methods, depending.
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
Graph Indexing From managing and mining graph data.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Data Mining: Principles and Algorithms Graph Pattern Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Cohesive Subgraph Computation over Large Graphs
Outline Introduction State-of-the-art solutions
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
September 19, 2018.
Graph Search with Indexing
Mining, Indexing and Searching Graphs in Biological Databases
Graph Database Mining and Its Applications
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Mining and Searching Graphs in Biological Databases
Subtree Isomorphism in O(n2.5)
Problem Solving 4.
Efficient Subgraph Similarity All-Matching
Slides are modified from Jiawei Han & Micheline Kamber
Graph Classification SEG 5010 Week 3.
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Approximate Graph Mining with Label Costs
Presentation transcript:

1 Substructure Similarity Search in Graph Databases R 陳芃安

2/25 Reference Xifeng Yan, Philip S. Yu and Jiawei Han. Substructure Similarity Search in Graph Databases. SIGMOD JW Raymond, P Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. Journal of Computer-Aided Molecular Design, 2002.

3/25 Motivation Graph Database  ChemIDplus  KEGG Searching topological structures in database Similarity search of complex structures  Exact matching is often too restrictive  Manual refinements cannot be done by users effectively

4/25 An example for similarity search

5/25 Similarity measurement Structure-based similarity measurement  Compare the topology of two graphs  It is costly to compute but more accurate Maximum Common Subgraph  It is a NP-Complete Problem For a query graph Q and a graph G in database  Let P be the MCS of Q and G  |E(P)| can be a similarity measurement  Edge deletion number can also be a similarity measurement

6/25 Graph Similarity Filtering (Grafil) A feature-based filtering algorithm Query graph → a set of features Edge deletions → the feature misses Filter graphs by the maximum allowed feature misses It doesn’t need to perform similarity computation between the query graph and each graph in the database

7/25 Structural Filtering Transform the edge deletions to the misses of indexed features May miss at most four occurrences of these features QQ1Q1 Q2Q2 Q3Q3 fafa 1100 fbfb 2011 fcfc 4322

8/25 Feature-Graph Matrix Create a feature-graph matrix for each graph in the database This matrix is easy to maintain G a can be omitted because it only has 2 feature occurrences GaGa GbGb GcGc GdGd fafa 0100 fbfb 0010 fcfc 2344 QQ1Q1 Q2Q2 Q3Q3 fafa 1100 fbfb 2011 fcfc 4322 Feature-graph matrix

9/25 Some observations of Feature-Graph Matrix The feature-based filtering is not involved with any structure similarity checking We only need to compute the upper bound of feature misses of the query graph

10/25 Framework Feature miss estimation Index Construction Query processing Query relaxation

11/25 Index construction Select some features build the feature-graph matrix for the database Feature miss estimation Index Construction Query processing Query relaxation GaGa GbGb GcGc GdGd fafa 0100 fbfb 0010 fcfc 2344

12/25 Feature miss estimation (1) We build an edge-feature matrix for a query graph fafa f b(1) f b(2) f c(1) f c(2) f c(3) f c(4) e1e e2e e3e Feature miss estimation Index Construction Query processing Query relaxation

13/25 Feature miss estimation (2) Given a query graph Q and a set of features contained in Q, if the maximum edge deletion is k, what is the maximal number of features that can be missed? The maximum number of columns that can be hit by k rows in the edge-feature matrix Set k-cover problem  It is a NP-Complete problem fafa f b(1) f b(2) f c(1) f c(2) f c(3) f c(4) e1e e2e e3e

14/25 A greedy algorithm for the set k-cover problem 1.6-approximation algorithm

15/25 An example for greedy algorithm fafa f b(1) f b(2) f c(1) f c(2) f c(3) f c(4) e1e e2e e3e k = 2 4 Ans= +2

16/25 Improvement of the greedy algorithm We can use brute force and branch and bound Select a row based on greedy method Optimal solution may/may not include this row Recursion 太多層時就用 greedy method 直接求 解

17/25 Algorithm2 for the set k-cover problem

18/25 Next Step We just talked about how to do the feature miss estimation with a given feature set Given many features, how to select a good feature set? Should we use all features together in a single filter? Feature miss estimation Index Construction Query processing Query relaxation

19/25 Filter graphs by the feature misses

20/25 Feature set selection Should we use all features together in a single filter? NO! It will cause feature conjugation  d max is the value of feature miss estimation

21/25 Selectivity Given a graph database D, a query graph Q, and a feature f The selectivity of f is

22/25 Hierarchical agglomerative clustering Merge the two closet clusters into a single cluster  By selectivity δ The new selectivity of a cluster is

23/25 Result

24/25 More details Where is the Feature Set?  Path, motif, discriminative frequent structure  X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure- based approach. SIGMOD’04, pages ,  M. Kuramochi and G. Karypis. Frequent subgraph discovery. ICDM’01, pages , Maximum Common Subgraph?  JW Raymond, P Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. Journal of Computer-Aided Molecular Design, 2002.

25/25 Thanks Any Question?