Graph Indexing: A Frequent Structure-based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141
Outline Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance 2013/11/142
Ch1 Introduction 2013/11/143
Ch1 Introduction 2013/11/144
Ch1 Introduction Build graph index Path-based index is inefficient. 2013/11/145 Too many paths
Ch1 Introduction Build graph index Graph-based index is suitable. 2013/11/146 Only one result
Ch2 Preliminaries 2013/11/147
Ch2 Preliminaries 2013/11/148
Ch2 Preliminaries 2013/11/149
Ch2 Preliminaries 2013/11/1410
Ch3 Frequent Fragment 2013/11/1411
Ch3 Frequent Fragment minSup: 2 indexed 2013/11/1412
Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed 2013/11/1413
Ch3 Frequent Fragment If query Q is not frequent? 2013/11/1414
Ch3 Frequent Fragment Find the frequent subgraphs of Q! 2013/11/1415
Ch3 Frequent Fragment 2013/11/1416
Ch3 Frequent Fragment 2013/11/1417
Ch3 Frequent Fragment 2013/11/1418
Ch3 Frequent Fragment 2013/11/1419
Ch3 Frequent Fragment 2013/11/1420
Ch3 Frequent Fragment 2013/11/1421
Ch3 Frequent Fragment 2013/11/1422
Ch3 Frequent Fragment 2013/11/1423
Ch4 Discriminative Fragment 2013/11/1424
Ch4 Discriminative Fragment 2013/11/1425
Ch4 Discriminative Fragment 2013/11/1426
Ch4 Discriminative Fragment 2013/11/1427
Ch4 Discriminative Fragment 2013/11/1428
Ch5 gIndex 2013/11/1429
Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search 2013/11/1430
5.1Discriminative fragment selection 2013/11/1431
5.2Index construction 5.2.1Graph Sequentialization 5.2.2gIndex Tree 5.2.3Remark on gIndex Tree Size 5.2.4gIndex Tree Implementation 2013/11/1432
5.2.1 Graph Sequentialization Adjacency matrices DFS code 2013/11/1433
5.2.2 gIndex Tree C-C C-C-C C-C-C-C C-C C-C-C C-C-C-C … C-C-C-C C-C-C-C-C … … 2013/11/1434
5.2.3 Remark on gIndex Tree Size … K /11/1435
5.2.4 gIndex Tree Implementation 2013/11/1436
5.3 Search Apriori Pruning Maximum Discriminative Fragments 2013/11/1437
5.3.1 Apriori Pruning If a fragment is not in the gIndex tree, we need not check its super-graphs any more. A hash table H is used to facilitate the Apriori pruning. 2013/11/1438
5.3.2 Maximum Discriminative Fragments 2013/11/1439
Ch6 Experimental Result 2013/11/1440
Experimental Result The performance of gIndex is compared with that of GraphGrep GraphGrep is a path-based approach two kinds of datasets in the experiments -one real dataset -a series of synthetic datasets 2013/11/1441
Dataset The real dataset is that of an AIDS antiviral( 抗病毒藥物 ) screen dataset containing chemical compounds the dataset contains 43,905 classified chemical molecules The synthetic data generator was provided by Kuramochi et al. allows the user tospecify the number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L) 2013/11/1442
Experiment Background experiments are performed on a 1.5GHZ, 1GB- memory, Intel PC running RedHat 8.0 Both GraphGrep and gIndex are compiled with gcc/g /11/1443
AIDS Antiviral Screen Dataset 2013/11/1444
Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable 2013/11/1445
Experimental Result the size of candidate answer set Cq : | Cq | AVG(|Dq|) : the lower bound of AVG(|Cq|) An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely 2013/11/1446
Experimental Result Q4Q4 queries in Q 4 are more likely path-structured (Query answer set size 較少 ) 2013/11/1447
Experimental Result (Query answer set size 較多 ) 2013/11/1448
Experimental Result 2013/11/1449
Experimental Result The scalability of gIndex 2013/11/1450
Synthetic Dataset 2013/11/1451
Experimental Result it has 10,000 graphs and uses 1,000 seed fragments with 50 distinct labels. On average, each graph has 20 edges and each seed fragment has 10 edges 2013/11/1452
Experimental Result 2013/11/1453
Improvement Size-increasing support constraint Relationship between minSup & number of candidates: Large minSup -> less frequent fragments & pruning effect Small minSup -> less candidates, but index size dramatically increases So, we must adapt different minSup for each size of fragments Inner Support Previous idea doesn't take multiple embeddings of a feature in one graph into consideration. Inner support: number of embeddings of a subgraph. It helps remove many impossible candidates within an id list, but size of id lists doubles. Advantage of statistics For a large graph database, index generation is time-consuming. Instead, we can construct the index from sample of data. Maintenance 2013/11/1454
Maintenance Small number of insertions/deletions affects only id lists. When number of insertions increases, size of candidates indicates quality of current gIndex. When number of deletions increases, may some id lists become empty? How to keep quality of gIndex after a lot of changes? Can we adjust gIndex according to a trendency of latest queries? 2013/11/1455
Thank for your attention! 2013/11/1456 Questions?