P-Tree Implementation Anne Denton
So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of structure open Wide variety of implementations has been tried
Tree Representation Options Pointers Tree-walks Depth-first Breadth-first Node addresses (P-trees: qids) Note: Any one tree representation will make the tree loss-less!
Issues Storage requirements Suitability to distributed processing (e.g., avoiding pointer swizzling) Ease of access to particular nodes Main issue Data structure must optimize anding speed at each node
Main Desired Property Anding through Bit-vector operations New node information New structural information Why? Parallelism: up-to 32 or 64 bits processed in parallel for single processor CPU
QID-based P-Vector representation (Example: P1V) [ ] 1001 [01] 0010 [10] 1101 [01.00]1110 [01.11]0010 [10.10]1101 Node information stored as bit-vector Structural information: Traditional relation of degree 2 Address is key
Can We Convert Address to Bit-Vectors? [ ] 1001[ ]0110 [01] 0010[01]1001 [10] 1101 [10]0010 [01.00]1110 [01.11]0010 [10.10]1101 We know this: PMV! Claim: qid is now redundant Standard conversion to bit-vectors
Does this Define Structure? Yes! Concept: Similar to Depth-First Search Mixed vector specifies existing children Slight modification: Store all children to one node sequentially Reason: address can be computed through counts on mixed
Representation of Standard Example
P-Tree Anding Start at root Pursue new (potentially) mixed children Deriving new mixed (m) and pure1 (u): u is AND of all u i m is AND of all (m i OR u i ) AND NOT u Cannot be done with either u or m alone
Fast Counting using Table Look-up How many bits are set in ? Look-up table stores “4” for index 102 Works up-to sequences of 8 bit
Finding the next 1 Which is the first bits set in ? Look-up table stores “1” for index 102 Works up-to sequences of 8 bit ( )
Finding a child Assume children are stored in sequence For mixed vector where is the child with index 5 (part of qid)? Count the children in Storage location calculated with one table look-up
Potential problems Eliminating large sub-trees slow Speeding up “and”: Introduce additional access structure Array indices as pointers Note: No lowest level due to adjacent storage of children Reduces storage by about 1/fanout (e.g., 1/16) Access structure does not need to be stored (P-tree loss-less without it)
Full Example
Summary PV1: node values stored as bit-vectors Now: tree structure stored as bit- vectors as well Benefits: Several fast bit-vector algorithms can be used Description of structure: Modified depth-first tree-walk Additional access structure efficient