Download presentation
Presentation is loading. Please wait.
1
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada
2
Motivation Signature Files as Indexes Signature Graph and its Construction –Signature Graph and its Construction –Searching a Signature Graph Maintenance of Signature Graph Summary and Future Work
3
Motivation Establish Indexes to speed up query evaluation B + -trees, inverted files, signature files Signature files: simple and easy for maintenance Signature graphs: less time for searching
4
Signature Files as Indexes Definition A signature for a key word or an attribute value is hash-coded bit string. Signature construction - Important parameters: m: number of 1s in bit string F: length of bit string D: size of a block (or average number of the key words of an element) -optimal choice of the parameters: F ln2 = m D
5
Example: (constructing a signature for a word with m = 4 and F = 12) “database” letter triplets: dat, ata, tab, aba, bas, ase H(dat) = 5, H(ata) = 1, H(tab) = 8, H(aba) = 1, H(bas) = 10, H(ase) = 8. 100 010 010 100
6
Signature Files as Indexes text: … SGML … database …information …matching word signatures:queries:query signatures:results SGML010000100110SGML010000100110match with OS database100010010100XML011000100100no match with OS information 010100011000informatik110100100000false drop object signature110110111110 (OS)
7
relation: Johnmale... namesex query: John male query signature: 1010 0101 Example:
8
Signature Graph Consider a signature s i of length m. We denote it as s i = s i [1]s i [2]... s i [m], where each s i [j] {0, 1} (j = 1,..., F). We also use s i (j 1,..., j h ) to denote a sequence of pairs w.r.t. s i : (j 1, s i [j 1 ])(j 2, s i [j 2 ])... (j h, s i [j h ]), where 1 j k m for k {1,..., h}. Definition (signature identifier) Let S = s 1.s 2....s n denote a signature file. Consider s i (1 i n). If there exists a sequence: j 1,..., j h such that for any k i (1 k n) we have s i (j 1,..., j h ) s k (j 1,..., j h ), then we say s i (j 1,..., j h ) identifies the signature s i or say s i (j 1,..., j h ) is an identifier of s i.
9
Example: s 8 (5, 1, 4) = (5, 1)(1, 1)(4, 0) (*For any i 8 we have s i (5, 1, 4) s 8 (5, 1, 4). For instance, s 5 (5, 1, 4) = (5, 0)(1, 0)(4, 1) s 8 (5, 1, 4), s 2 (5, 1, 4) = (5, 1)(1, 1)(4, 1) s 8 (5, 1, 4), and so on.*) s 1 (5, 4, 1) = (5, 0)(4, 1)(1, 1) (*For any i 1 we have s i (5, 4, 1) s 1 (5, 4, 1).*)
10
Signature Graph Definition (signature graph) A signature graph G for a signature file S = s 1.s 2....s n, where s i s j for i j and |s k | = F for k = 1,..., n, is a graph G = (V, E) such that 1. each node v V is of the form (p, skip), where p is a pointer to a signature s in S, and skip is a non-negative integer i. If i > 0, it tells that the ith bit of s q will be checked when searching. If i = 0, s will be compared with s q. 2. Let e = (u, v) E. Then, e is labeled with 0 or 1 and skip(u) > 0. Let skip(u) = i. If e is labeled with 0 and i > 0, the ith bit of the signature pointed to by p(v) is 0. If e is labeled with 1 and i > 0, the ith bit of the signature pointed to by p(v) is 1. A node v with skip(u) = 0 does not have any children.
11
p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 S 1 : 1011 0110 S 2 : 1011 1001 S 3 : 1010 0111 S 4 : 0111 0110 S 5 : 0111 0101 S 6 : 0101 1100 S 7 : 1110 0100 S 8 : 1010 1011
12
Construction of signature graph: p1p1 0 p2p2 5 p1p1 0 0 1 p2p2 5 p3p3 4 0 1 p1p1 0 1 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 0 Insert s 1 Insert s 2 Insert s 3 Insert s 4
13
p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 Insert s 5 Insert s 6 Insert s 7 Insert s 8
14
Signature Graph Searching a signature graph Denote s q (i) the i-th position of s q. During the traversal of a signature graph, the inexact matching can be done as follows: (i)Let v be the node encountered and s q (i) be the position to be checked. (ii)If s q (i) = 1, we move to the right child of v (iii)If s q (i) = 0, both the right and left child of v will be visited. (iv)A search along a path stops when a node without any child node or a node is encountered for the second time.
15
Signature Graph p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 marked
16
Maintenance of Signature Graph - Insertion of a signature s into G Same as the construction of a signature graph -Deletion of a signature s from G (i)Search G from the root until a node v is encountered, which is marked or skip(v) = 0. (ii)If skip(v) = 0, Compare p(v) and s. If s matches p(v) exactly, do the following; otherwise, nothing will be done. Let v 1 ... v k-1 v k v be the path explored. Let u 1 be another child of v k (not on the path). Remove v k-1 v k, v k u 1 and v; and generate a new edge v k-1 u 1. skip(v k ) := 0.
17
Maintenance of Signature Graph - Deletion of a signature s from G (continued) (iii) If skip(v) 0, Compare p(v’s father) and s. If s matches p(v’s father) exactly, do the following; otherwise, nothing will be done. Let v 1 ... v k-1 v k v be the path explored. If v k v, replace p(v) with p(v k ). Let u 1 be another child of v k (not on the path). Let u 2 be another parent of v k (not on the path). Replace v k-1 v k with v k u 1, and replace v k v with u 2 v. Remove v k. Note that u 2 can be found by searching G from v k with the target signature being p(v k ). If v k v, replace v k v k with v k-1 u 1. Remove v k.
18
Maintenance of Signature Graph Illustration for (ii) … v v1v1 v k-1 vkvk u1u1 u2u2 … v v1v1 vkvk u1u1 u2u2 To be removed
19
p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 0 1 0 remove p 1 p2p2 5 p3p3 4 0 1 1 0 p4p4 0 1 p5p5 3 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 0 1 0 Example:
20
Maintenance of Signature Graph Illustration for (iii) … v v1v1 v k-1 vkvk u1u1 u2u2 … v v1v1 vkvk u1u1 u2u2 To be removed
21
p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 remove p 8 Example: p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1
22
… v1v1 v k-1 v u1u1 … v1v1 v u1u1 To be removed Illustration for (iii)
23
remove p 7 Example: p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p8p8 4 1 0
24
Summary and Future Work - Signature and signature file - Signature graph Construction of a signature graph Search of a signature graph Maintenance of a signature graph Future work: Apply signature techniques to evaluation of path-oriented queries in document databases.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.