Download presentation
Presentation is loading. Please wait.
1
Updating SF-Tree Speaker: Ho Wai Shing
2
Contents Introduction, and summary of SF-Tree
Possible SF-Tree update techniques and their limitations Future work
3
Introduction SF-Tree stands for Signature File Tree
it is originally designed for storing XML SPE selectivity it can be generalized to store an "object-to-count" mapping
4
SF-Tree Basics Divide objects into groups of the same (or similar) count(s) Finding the count of an object is equivalent to finding the group containing the object Signature files are used to summarize groups SFs are organized in a tree form
5
Signature Files Are bit vectors
Computed by hashing objects into bit positions and setting the bits Existence of an object can be checked by checking the hashed bit positions
6
Signature Files e.g., F has 10 bits, m = 3
"//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9
7
SF-Tree
8
SF-Tree
9
Introduction Advantages: Time efficient: independent of database size
Space efficient: independent of original object size Accurate: has statistical accuracy guarantee on the returned counts Flexible: can tune parameters to make trade-offs among space, accuracy, speed.
10
Introduction Disadvantages:
Not 100% accurate, since original objects are discarded All information must be available before building a Shannon-Fano SF-Tree Updates may reduce its accuracy
11
Introduction Possible applications
support count storage in stream data mining data cube storage Periodic reconstructions are not always feasible (esp. data streams), so we need good update strategy for SF-Tree
12
Updating SF-Tree basic idea: count of an object changed
the object is changed from one group to another i.e., the object is deleted in a group, and inserted into another group
13
Updating SF-Tree Problems:
No deletion algorithm for deleting objects in signature files No algorithm for creating a new group in a SF-Tree No way to compute new signatures for the existing objects
14
Updating SF-Tree Solutions to the problems For deletion:
Counter-based signature files "No-deletion" scheme Look-ahead Retrieval For insertion: Dynamic expanding signature files Precomputed signature files Negative signature files
15
Counter-Based Signature Files
Motivation: deletion of an object involves resetting some bits resetting bits may cause "false negatives" (can't retrieve some objects which are present in this group)
16
Counter-Based Signature Files
e.g., "//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9 when we remove "//buyer", "//name" can't be retrieved (since bit-2 = 0)
17
Counter-Based Signature Files
Use counters instead of bits in the signature file e.g., "//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9
18
Counter-Based Signature Files
Advantages: deleting an objects which is in F won't cause "false negatives" Disadvantages: space requirement increases deletion due to false drop still causes troubles counters may overflow
19
"No-Deletion" Scheme Motivation:
Deletion causes troubles (false negatives) No way to completely avoid it False negatives may cause big errors in counts
20
"No-Deletion" Scheme Won't reset the bits when deleting an object (i.e., no deletion) For retrieval, return the count of the group with largest count Advantages: completely remove all the troubles caused by deletion (no more false negatives)
21
"No-Deletion" Scheme Disadvantages:
may reduce the accuracy of the signature (if the signature size is unchanged) may increase the space requirement significantly (if we maintain the accuracy) can be applied only to applications that counts are monotonic increasing (decreasing)
22
"No-Deletion" Scheme Still useful:
for updates between periodic reconstructions since retrieval time is unchanged (for positive queries) more space efficient than counters if updates are not frequent (since we can use bit-based signature files)
23
Look-Ahead Retrieval Motivation:
Reduces errors due to false negatives Retrieval stops only when two consecutive levels of signature files do not contain the query Re-insert the object if we think that it's a false negative
24
Look-Ahead Retrieval e.g.,
25
Look-Ahead Retrieval Advantages:
Reduces the probability that SF-Tree is affected by false negatives The self-healing property removes some false negatives
26
Look-Ahead Retrieval Disadvantages:
slower (more signatures to be examined) false drop rate increases more 1s created in the self-healing process one less level for false drop safe-guard
27
Insertion Involves 2 parts
insertion within a signature file inserting new signature files in SF-Tree A signature file has a capacity under an error bound more objects more 1s higher false drop rate
28
Dynamically Expanding Signature Files
To maintain the error (false drop prob.) bound, the size of SFs must be able to be increased However, objects are dropped we can't recompute the signatures of previous objects Add new signatures to represent new objects
29
Dynamically Expanding Signature Files
e.g., create a new signature if the first one is full doubling the size at each creation
30
Dynamically Expanding Signature Files
Advantages: the signature files can now store arbitrary number of objects Disadvantages: less space efficient
31
Precomputed Signature Files
The previous method solves insertion within a signature file The next two methods concentrates at inserting new count groups
32
Precomputed Signature Files
Problem: adding a new count group (i.e., leaf node) may involve adding new internal nodes e.g.,
33
Precomputed Signature Files
Solution: the signature file is precomputed, i.e.,
34
Negative Signature Files
Alternative solution to the previous problem A "negative signature file" stores the deleted objects
35
Summary For deletion: For insertion: Counter-based signature files
"No-deletion" scheme Look-ahead Retrieval For insertion: Dynamic expanding signature files Precomputed signature files Negative signature files
36
Future Work Implement and check the performance of different update strategies Identify the requirements of various applications (e.g., support counting, data cube storage) and choose a suitable SF-Tree strategy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.