Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141.

Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141

Outline Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance 2013/11/142

Ch1 Introduction 2013/11/143

Ch1 Introduction 2013/11/144

Ch1 Introduction Build graph index Path-based index is inefficient. 2013/11/145 Too many paths

Ch1 Introduction Build graph index Graph-based index is suitable. 2013/11/146 Only one result

Ch2 Preliminaries 2013/11/147

Ch3 Frequent Fragment 2013/11/1411

Ch3 Frequent Fragment minSup: 2 indexed 2013/11/1412

Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed 2013/11/1413

Ch3 Frequent Fragment If query Q is not frequent? 2013/11/1414

Ch3 Frequent Fragment Find the frequent subgraphs of Q! 2013/11/1415

Ch4 Discriminative Fragment 2013/11/1424

Ch5 gIndex 2013/11/1429

Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search 2013/11/1430

5.1Discriminative fragment selection 2013/11/1431

5.2Index construction 5.2.1Graph Sequentialization 5.2.2gIndex Tree 5.2.3Remark on gIndex Tree Size 5.2.4gIndex Tree Implementation 2013/11/1432

5.2.1 Graph Sequentialization Adjacency matrices DFS code 2013/11/1433

5.2.2 gIndex Tree C-C C-C-C C-C-C-C C-C C-C-C C-C-C-C … C-C-C-C C-C-C-C-C … … 2013/11/1434

5.2.3 Remark on gIndex Tree Size 0 1 2 … K-1 2013/11/1435

5.2.4 gIndex Tree Implementation 2013/11/1436

5.3 Search 5.3.1 Apriori Pruning 5.3.2 Maximum Discriminative Fragments 2013/11/1437

5.3.1 Apriori Pruning If a fragment is not in the gIndex tree, we need not check its super-graphs any more. A hash table H is used to facilitate the Apriori pruning. 2013/11/1438

5.3.2 Maximum Discriminative Fragments 2013/11/1439

Ch6 Experimental Result 2013/11/1440

Experimental Result The performance of gIndex is compared with that of GraphGrep GraphGrep is a path-based approach two kinds of datasets in the experiments -one real dataset -a series of synthetic datasets 2013/11/1441

Dataset The real dataset is that of an AIDS antiviral( 抗病毒藥物 ) screen dataset containing chemical compounds the dataset contains 43,905 classified chemical molecules The synthetic data generator was provided by Kuramochi et al. allows the user tospecify the number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L) 2013/11/1442

Experiment Background experiments are performed on a 1.5GHZ, 1GB- memory, Intel PC running RedHat 8.0 Both GraphGrep and gIndex are compiled with gcc/g++ 2013/11/1443

AIDS Antiviral Screen Dataset 2013/11/1444

Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable 2013/11/1445

Experimental Result the size of candidate answer set Cq : | Cq | AVG(|Dq|) : the lower bound of AVG(|Cq|) An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely 2013/11/1446

Experimental Result Q4Q4 queries in Q 4 are more likely path-structured (Query answer set size 較少 ) 2013/11/1447

Experimental Result (Query answer set size 較多 ) 2013/11/1448

Experimental Result 2013/11/1449

Experimental Result The scalability of gIndex 2013/11/1450

Synthetic Dataset 2013/11/1451

Experimental Result it has 10,000 graphs and uses 1,000 seed fragments with 50 distinct labels. On average, each graph has 20 edges and each seed fragment has 10 edges 2013/11/1452

Experimental Result 2013/11/1453

Improvement Size-increasing support constraint Relationship between minSup & number of candidates: Large minSup -> less frequent fragments & pruning effect Small minSup -> less candidates, but index size dramatically increases So, we must adapt different minSup for each size of fragments Inner Support Previous idea doesn't take multiple embeddings of a feature in one graph into consideration. Inner support: number of embeddings of a subgraph. It helps remove many impossible candidates within an id list, but size of id lists doubles. Advantage of statistics For a large graph database, index generation is time-consuming. Instead, we can construct the index from sample of data. Maintenance 2013/11/1454

Maintenance Small number of insertions/deletions affects only id lists. When number of insertions increases, size of candidates indicates quality of current gIndex. When number of deletions increases, may some id lists become empty? How to keep quality of gIndex after a lot of changes? Can we adjust gIndex according to a trendency of latest queries? 2013/11/1455

Thank for your attention! 2013/11/1456 Questions?

Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141.

Similar presentations

Presentation on theme: "Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph Indexing: A Frequent Structure-­based Approach 指導老師：曾新穆 教授 組員：李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期： 2013/11/14 2013/11/141.

Similar presentations

Presentation on theme: "Graph Indexing: A Frequent Structure-­based Approach 指導老師：曾新穆 教授 組員：李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期： 2013/11/14 2013/11/141."— Presentation transcript:

Similar presentations

About project

Feedback

Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141.

Presentation on theme: "Graph Indexing: A Frequent Structure-based Approach 指導老師：曾新穆教授組員：李彥寬、洪世敏、丁鏘巽、黃冠霖、詹博丞日期： 2013/11/14 2013/11/141."— Presentation transcript: