Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.

Similar presentations


Presentation on theme: "Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada."— Presentation transcript:

1 Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada

2 Motivation Signature Files as Indexes Signature Graph and its Construction –Signature Graph and its Construction –Searching a Signature Graph Maintenance of Signature Graph Summary and Future Work

3 Motivation Establish Indexes to speed up query evaluation B + -trees, inverted files, signature files Signature files: simple and easy for maintenance Signature graphs: less time for searching

4 Signature Files as Indexes  Definition A signature for a key word or an attribute value is hash-coded bit string.  Signature construction - Important parameters: m: number of 1s in bit string F: length of bit string D: size of a block (or average number of the key words of an element) -optimal choice of the parameters: F  ln2 =  m  D 

5 Example: (constructing a signature for a word with m = 4 and F = 12) “database”  letter triplets: dat, ata, tab, aba, bas, ase  H(dat) = 5, H(ata) = 1, H(tab) = 8, H(aba) = 1, H(bas) = 10, H(ase) = 8.  100 010 010 100

6 Signature Files as Indexes text: … SGML … database …information …matching word signatures:queries:query signatures:results SGML010000100110SGML010000100110match with OS database100010010100XML011000100100no match with OS information  010100011000informatik110100100000false drop object signature110110111110 (OS)

7 relation: Johnmale... namesex query: John  male query signature: 1010 0101 Example:

8 Signature Graph Consider a signature s i of length m. We denote it as s i = s i [1]s i [2]... s i [m], where each s i [j]  {0, 1} (j = 1,..., F). We also use s i (j 1,..., j h ) to denote a sequence of pairs w.r.t. s i : (j 1, s i [j 1 ])(j 2, s i [j 2 ])... (j h, s i [j h ]), where 1  j k  m for k  {1,..., h}. Definition (signature identifier) Let S = s 1.s 2....s n denote a signature file. Consider s i (1  i  n). If there exists a sequence: j 1,..., j h such that for any k  i (1  k  n) we have s i (j 1,..., j h )  s k (j 1,..., j h ), then we say s i (j 1,..., j h ) identifies the signature s i or say s i (j 1,..., j h ) is an identifier of s i.

9 Example: s 8 (5, 1, 4) = (5, 1)(1, 1)(4, 0) (*For any i  8 we have s i (5, 1, 4)  s 8 (5, 1, 4). For instance, s 5 (5, 1, 4) = (5, 0)(1, 0)(4, 1)  s 8 (5, 1, 4), s 2 (5, 1, 4) = (5, 1)(1, 1)(4, 1)  s 8 (5, 1, 4), and so on.*) s 1 (5, 4, 1) = (5, 0)(4, 1)(1, 1) (*For any i  1 we have s i (5, 4, 1)  s 1 (5, 4, 1).*)

10 Signature Graph Definition (signature graph) A signature graph G for a signature file S = s 1.s 2....s n, where s i  s j for i  j and |s k | = F for k = 1,..., n, is a graph G = (V, E) such that 1. each node v  V is of the form (p, skip), where p is a pointer to a signature s in S, and skip is a non-negative integer i. If i > 0, it tells that the ith bit of s q will be checked when searching. If i = 0, s will be compared with s q. 2. Let e = (u, v)  E. Then, e is labeled with 0 or 1 and skip(u) > 0. Let skip(u) = i. If e is labeled with 0 and i > 0, the ith bit of the signature pointed to by p(v) is 0. If e is labeled with 1 and i > 0, the ith bit of the signature pointed to by p(v) is 1. A node v with skip(u) = 0 does not have any children.

11 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 S 1 : 1011 0110 S 2 : 1011 1001 S 3 : 1010 0111 S 4 : 0111 0110 S 5 : 0111 0101 S 6 : 0101 1100 S 7 : 1110 0100 S 8 : 1010 1011

12 Construction of signature graph: p1p1 0 p2p2 5 p1p1 0 0 1 p2p2 5 p3p3 4 0 1 p1p1 0 1 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 0 Insert s 1 Insert s 2 Insert s 3 Insert s 4

13 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 Insert s 5 Insert s 6 Insert s 7 Insert s 8

14 Signature Graph Searching a signature graph Denote s q (i) the i-th position of s q. During the traversal of a signature graph, the inexact matching can be done as follows: (i)Let v be the node encountered and s q (i) be the position to be checked. (ii)If s q (i) = 1, we move to the right child of v (iii)If s q (i) = 0, both the right and left child of v will be visited. (iv)A search along a path stops when a node without any child node or a node is encountered for the second time.

15 Signature Graph p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 marked

16 Maintenance of Signature Graph - Insertion of a signature s into G Same as the construction of a signature graph -Deletion of a signature s from G (i)Search G from the root until a node v is encountered, which is marked or skip(v) = 0. (ii)If skip(v) = 0, Compare p(v) and s. If s matches p(v) exactly, do the following; otherwise, nothing will be done. Let v 1 ... v k-1  v k  v be the path explored. Let u 1 be another child of v k (not on the path). Remove v k-1  v k, v k  u 1 and v; and generate a new edge v k-1  u 1. skip(v k ) := 0.

17 Maintenance of Signature Graph - Deletion of a signature s from G (continued) (iii) If skip(v)  0, Compare p(v’s father) and s. If s matches p(v’s father) exactly, do the following; otherwise, nothing will be done. Let v 1 ... v k-1  v k  v be the path explored. If v k  v, replace p(v) with p(v k ). Let u 1 be another child of v k (not on the path). Let u 2 be another parent of v k (not on the path). Replace v k-1  v k with v k  u 1, and replace v k  v with u 2  v. Remove v k. Note that u 2 can be found by searching G from v k with the target signature being p(v k ). If v k  v, replace v k  v k with v k-1  u 1. Remove v k.

18 Maintenance of Signature Graph Illustration for (ii) … v v1v1 v k-1 vkvk u1u1 u2u2 … v v1v1 vkvk u1u1 u2u2 To be removed

19 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 0 1 0 remove p 1 p2p2 5 p3p3 4 0 1 1 0 p4p4 0 1 p5p5 3 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 0 1 0 Example:

20 Maintenance of Signature Graph Illustration for (iii) … v v1v1 v k-1 vkvk u1u1 u2u2 … v v1v1 vkvk u1u1 u2u2 To be removed

21 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 remove p 8 Example: p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1

22 … v1v1 v k-1 v u1u1 … v1v1 v u1u1 To be removed Illustration for (iii)

23 remove p 7 Example: p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p7p7 2 0 1 p8p8 4 1 0 p2p2 5 p3p3 4 0 1 p4p4 1 1 0 p1p1 0 1 1 p5p5 3 0 0 p6p6 1 0 1 p8p8 4 1 0

24 Summary and Future Work - Signature and signature file - Signature graph Construction of a signature graph Search of a signature graph Maintenance of a signature graph Future work: Apply signature techniques to evaluation of path-oriented queries in document databases.


Download ppt "Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada."

Similar presentations


Ads by Google