Download presentation
Presentation is loading. Please wait.
Published byClifton Daniels Modified over 9 years ago
1
Graph Indexing: A Frequent Structure-based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141
2
Outline Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance 2013/11/142
3
Ch1 Introduction 2013/11/143
4
Ch1 Introduction 2013/11/144
5
Ch1 Introduction Build graph index Path-based index is inefficient. 2013/11/145 Too many paths
6
Ch1 Introduction Build graph index Graph-based index is suitable. 2013/11/146 Only one result
7
Ch2 Preliminaries 2013/11/147
8
Ch2 Preliminaries 2013/11/148
9
Ch2 Preliminaries 2013/11/149
10
Ch2 Preliminaries 2013/11/1410
11
Ch3 Frequent Fragment 2013/11/1411
12
Ch3 Frequent Fragment minSup: 2 indexed 2013/11/1412
13
Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed 2013/11/1413
14
Ch3 Frequent Fragment If query Q is not frequent? 2013/11/1414
15
Ch3 Frequent Fragment Find the frequent subgraphs of Q! 2013/11/1415
16
Ch3 Frequent Fragment 2013/11/1416
17
Ch3 Frequent Fragment 2013/11/1417
18
Ch3 Frequent Fragment 2013/11/1418
19
Ch3 Frequent Fragment 2013/11/1419
20
Ch3 Frequent Fragment 2013/11/1420
21
Ch3 Frequent Fragment 2013/11/1421
22
Ch3 Frequent Fragment 2013/11/1422
23
Ch3 Frequent Fragment 2013/11/1423
24
Ch4 Discriminative Fragment 2013/11/1424
25
Ch4 Discriminative Fragment 2013/11/1425
26
Ch4 Discriminative Fragment 2013/11/1426
27
Ch4 Discriminative Fragment 2013/11/1427
28
Ch4 Discriminative Fragment 2013/11/1428
29
Ch5 gIndex 2013/11/1429
30
Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search 2013/11/1430
31
5.1Discriminative fragment selection 2013/11/1431
32
5.2Index construction 5.2.1Graph Sequentialization 5.2.2gIndex Tree 5.2.3Remark on gIndex Tree Size 5.2.4gIndex Tree Implementation 2013/11/1432
33
5.2.1 Graph Sequentialization Adjacency matrices DFS code 2013/11/1433
34
5.2.2 gIndex Tree C-C C-C-C C-C-C-C C-C C-C-C C-C-C-C … C-C-C-C C-C-C-C-C … … 2013/11/1434
35
5.2.3 Remark on gIndex Tree Size 0 1 2 … K-1 2013/11/1435
36
5.2.4 gIndex Tree Implementation 2013/11/1436
37
5.3 Search 5.3.1 Apriori Pruning 5.3.2 Maximum Discriminative Fragments 2013/11/1437
38
5.3.1 Apriori Pruning If a fragment is not in the gIndex tree, we need not check its super-graphs any more. A hash table H is used to facilitate the Apriori pruning. 2013/11/1438
39
5.3.2 Maximum Discriminative Fragments 2013/11/1439
40
Ch6 Experimental Result 2013/11/1440
41
Experimental Result The performance of gIndex is compared with that of GraphGrep GraphGrep is a path-based approach two kinds of datasets in the experiments -one real dataset -a series of synthetic datasets 2013/11/1441
42
Dataset The real dataset is that of an AIDS antiviral( 抗病毒藥物 ) screen dataset containing chemical compounds the dataset contains 43,905 classified chemical molecules The synthetic data generator was provided by Kuramochi et al. allows the user tospecify the number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L) 2013/11/1442
43
Experiment Background experiments are performed on a 1.5GHZ, 1GB- memory, Intel PC running RedHat 8.0 Both GraphGrep and gIndex are compiled with gcc/g++ 2013/11/1443
44
AIDS Antiviral Screen Dataset 2013/11/1444
45
Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable 2013/11/1445
46
Experimental Result the size of candidate answer set Cq : | Cq | AVG(|Dq|) : the lower bound of AVG(|Cq|) An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely 2013/11/1446
47
Experimental Result Q4Q4 queries in Q 4 are more likely path-structured (Query answer set size 較少 ) 2013/11/1447
48
Experimental Result (Query answer set size 較多 ) 2013/11/1448
49
Experimental Result 2013/11/1449
50
Experimental Result The scalability of gIndex 2013/11/1450
51
Synthetic Dataset 2013/11/1451
52
Experimental Result it has 10,000 graphs and uses 1,000 seed fragments with 50 distinct labels. On average, each graph has 20 edges and each seed fragment has 10 edges 2013/11/1452
53
Experimental Result 2013/11/1453
54
Improvement Size-increasing support constraint Relationship between minSup & number of candidates: Large minSup -> less frequent fragments & pruning effect Small minSup -> less candidates, but index size dramatically increases So, we must adapt different minSup for each size of fragments Inner Support Previous idea doesn't take multiple embeddings of a feature in one graph into consideration. Inner support: number of embeddings of a subgraph. It helps remove many impossible candidates within an id list, but size of id lists doubles. Advantage of statistics For a large graph database, index generation is time-consuming. Instead, we can construct the index from sample of data. Maintenance 2013/11/1454
55
Maintenance Small number of insertions/deletions affects only id lists. When number of insertions increases, size of candidates indicates quality of current gIndex. When number of deletions increases, may some id lists become empty? How to keep quality of gIndex after a lot of changes? Can we adjust gIndex according to a trendency of latest queries? 2013/11/1455
56
Thank for your attention! 2013/11/1456 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.