Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
Frequent Closed Pattern Search By Row and Feature Enumeration
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
University of Illinois at Urbana-Champaign Graph Indexing: Tree + Δ ≥ Graph Peixiang Zhao Jeffrey Xu Yu Philip S. Yu Peixiang Zhao Jeffrey Xu Yu Philip.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Query Specific Fusion for Image Retrieval
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Rakesh Agrawal Ramakrishnan Srikant
IGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques Jeffrey Xu Yu et. al. VLDB ‘10 Presented by Tao Yu.
Association Analysis (7) (Mining Graphs)
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Evaluating Reachability Queries over Path Collections* P. Bouros 1, S. Skiadopoulos 2, T. Dalamagas 3, D. Sacharidis 3, T. Sellis 1,3 1 National Technical.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Fast Algorithms for Association Rule Mining
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Sequential PAttern Mining using A Bitmap Representation
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Panconnectivity and Edge- Pancyclicity of 3-ary N-cubes 指導教授 : 黃鈴玲 老師 學生 : 郭俊宏 Sun-Yuan Hsieh, Tsong-Jie Lin and Hui-Ling Huang Journal of Supercomputing.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
Graph Indexing From managing and mining graph data.
A Fast and Scalable IPv6 Packet Classification Author: Xiaoju Zhou, Xiaohong Huang, Qiong Sun, Wei Yang, Yan Ma Publisher: Network Infrastructure and Digital.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Data Mining: Principles and Algorithms Graph Pattern Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Outline Introduction State-of-the-art solutions
Mining in Graphs and Complex Structures
Mining Frequent Subgraphs
Graph Search with Indexing
Slides are modified from Jiawei Han & Micheline Kamber
Incremental Maintenance of XML Structural Indexes
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Relaxing Join and Selection Queries
Chapter 11: Indexing and Hashing
Presentation transcript:

Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141

Outline Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance 2013/11/142

Ch1 Introduction 2013/11/143

Ch1 Introduction 2013/11/144

Ch1 Introduction Build graph index Path-based index is inefficient. 2013/11/145 Too many paths

Ch1 Introduction Build graph index Graph-based index is suitable. 2013/11/146 Only one result

Ch2 Preliminaries 2013/11/147

Ch2 Preliminaries 2013/11/148

Ch2 Preliminaries 2013/11/149

Ch2 Preliminaries 2013/11/1410

Ch3 Frequent Fragment 2013/11/1411

Ch3 Frequent Fragment minSup: 2 indexed 2013/11/1412

Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed 2013/11/1413

Ch3 Frequent Fragment If query Q is not frequent? 2013/11/1414

Ch3 Frequent Fragment Find the frequent subgraphs of Q! 2013/11/1415

Ch3 Frequent Fragment 2013/11/1416

Ch3 Frequent Fragment 2013/11/1417

Ch3 Frequent Fragment 2013/11/1418

Ch3 Frequent Fragment 2013/11/1419

Ch3 Frequent Fragment 2013/11/1420

Ch3 Frequent Fragment 2013/11/1421

Ch3 Frequent Fragment 2013/11/1422

Ch3 Frequent Fragment 2013/11/1423

Ch4 Discriminative Fragment 2013/11/1424

Ch4 Discriminative Fragment 2013/11/1425

Ch4 Discriminative Fragment 2013/11/1426

Ch4 Discriminative Fragment 2013/11/1427

Ch4 Discriminative Fragment 2013/11/1428

Ch5 gIndex 2013/11/1429

Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search 2013/11/1430

5.1Discriminative fragment selection 2013/11/1431

5.2Index construction 5.2.1Graph Sequentialization 5.2.2gIndex Tree 5.2.3Remark on gIndex Tree Size 5.2.4gIndex Tree Implementation 2013/11/1432

5.2.1 Graph Sequentialization Adjacency matrices DFS code 2013/11/1433

5.2.2 gIndex Tree C-C C-C-C C-C-C-C C-C C-C-C C-C-C-C … C-C-C-C C-C-C-C-C … … 2013/11/1434

5.2.3 Remark on gIndex Tree Size … K /11/1435

5.2.4 gIndex Tree Implementation 2013/11/1436

5.3 Search Apriori Pruning Maximum Discriminative Fragments 2013/11/1437

5.3.1 Apriori Pruning If a fragment is not in the gIndex tree, we need not check its super-graphs any more. A hash table H is used to facilitate the Apriori pruning. 2013/11/1438

5.3.2 Maximum Discriminative Fragments 2013/11/1439

Ch6 Experimental Result 2013/11/1440

Experimental Result The performance of gIndex is compared with that of GraphGrep GraphGrep is a path-based approach two kinds of datasets in the experiments -one real dataset -a series of synthetic datasets 2013/11/1441

Dataset The real dataset is that of an AIDS antiviral( 抗病毒藥物 ) screen dataset containing chemical compounds the dataset contains 43,905 classified chemical molecules The synthetic data generator was provided by Kuramochi et al. allows the user tospecify the number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L) 2013/11/1442

Experiment Background experiments are performed on a 1.5GHZ, 1GB- memory, Intel PC running RedHat 8.0 Both GraphGrep and gIndex are compiled with gcc/g /11/1443

AIDS Antiviral Screen Dataset 2013/11/1444

Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable 2013/11/1445

Experimental Result the size of candidate answer set Cq : | Cq | AVG(|Dq|) : the lower bound of AVG(|Cq|) An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely 2013/11/1446

Experimental Result Q4Q4 queries in Q 4 are more likely path-structured (Query answer set size 較少 ) 2013/11/1447

Experimental Result (Query answer set size 較多 ) 2013/11/1448

Experimental Result 2013/11/1449

Experimental Result The scalability of gIndex 2013/11/1450

Synthetic Dataset 2013/11/1451

Experimental Result it has 10,000 graphs and uses 1,000 seed fragments with 50 distinct labels. On average, each graph has 20 edges and each seed fragment has 10 edges 2013/11/1452

Experimental Result 2013/11/1453

Improvement Size-increasing support constraint Relationship between minSup & number of candidates: Large minSup -> less frequent fragments & pruning effect Small minSup -> less candidates, but index size dramatically increases So, we must adapt different minSup for each size of fragments Inner Support Previous idea doesn't take multiple embeddings of a feature in one graph into consideration. Inner support: number of embeddings of a subgraph. It helps remove many impossible candidates within an id list, but size of id lists doubles. Advantage of statistics For a large graph database, index generation is time-consuming. Instead, we can construct the index from sample of data. Maintenance 2013/11/1454

Maintenance Small number of insertions/deletions affects only id lists. When number of insertions increases, size of candidates indicates quality of current gIndex. When number of deletions increases, may some id lists become empty? How to keep quality of gIndex after a lot of changes? Can we adjust gIndex according to a trendency of latest queries? 2013/11/1455

Thank for your attention! 2013/11/1456 Questions?