1 Efficient Subtyping Tests with PQ-Encoding Jan Vitek University of Purdue work of: Yoav Zibin and Yossi Gil Technion — Israel Institute of Technology.

Slides:



Advertisements
Similar presentations
Lecture 10 Disjoint Set ADT.
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Vered Gafni – Formal Development of Real Time Systems 1 Statecharts Semantics.
Fast Algorithms For Hierarchical Range Histogram Constructions
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Chapter 4: Trees Part II - AVL Tree
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
Incremental Algorithms for Dispatching in Dynamically Typed Languages Yoav Zibin Technion—Israel Institute of Technology Joint work with: Yossi (Joseph)
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Chapter 6: Transform and Conquer
Constant-Time LCA Retrieval
Trees Chapter 8.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Data Structures – LECTURE 10 Huffman coding
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Lecture 9 Disjoint Set ADT. Preliminary Definitions A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
1 Java Subtype Tests in Real-Time Krzysztof Palacz, Jan Vitek University of Purdue Presented by: Itay Maman.
Using PQ Trees For Comparative Genomics - CPM Using PQ Trees For Comparative Genomics Gad M. Landau – Univ. of Haifa Laxmi Parida – IBM T.J. Watson.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Important Problem Types and Fundamental Data Structures
Binary Trees Chapter 6.
Randomized Algorithms - Treaps
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
A Test for the Consecutive Ones Property 1/39. Outline Consecutive ones property PQ-trees Template operations Complexity Analysis –The most time consuming.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Section 10.1 Introduction to Trees These class notes are based on material from our textbook, Discrete Mathematics and Its Applications, 6 th ed., by Kenneth.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
Two-Dimensional Bi-Directional Object Layout Yoav Zibin The Technion—Israel Institute of Technology Joint work with: Yossi Gil.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
PC-Trees & PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
PC-Trees Based on a paper by Hsu and McConnell. Talk Outline We Define the consecutive ones and circular ones problems We show PQ Trees – the traditional.
PC-Trees & PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Data Structures and Design in Java © Rick Mercer
RE-Tree: An Efficient Index Structure for Regular Expressions
Priority Queues © 2010 Goodrich, Tamassia Priority Queues 1
B+ Tree.
PC trees and Circular One Arrangements
Binary Trees, Binary Search Trees
Ch. 8 Priority Queues And Heaps
Chapter 6: Transform and Conquer
Binary Trees, Binary Search Trees
Trees.
Binary Trees, Binary Search Trees
Presentation transcript:

1 Efficient Subtyping Tests with PQ-Encoding Jan Vitek University of Purdue work of: Yoav Zibin and Yossi Gil Technion — Israel Institute of Technology

2 Outline Subtyping tests Previous work The PQ Permutation Tree and PQ encoding Results Conclusions & Future Research

3 Subtyping tests Is Sylvester a Mammal ? Catch: Sylvester is a Feline a Feline is a Mammal Given a hierarchy (T, ≺ ) T is a set of types, |T|=n ≺ is a partial order over T (reflexive, transitive and anti- symmetric) called subtype relation Encode the hierarchy so that the query, a ≺ b, can be answered efficiently. Mammal FelineCanine ?

4 Efficiency Metrics Encoding of a hierarchy: a data structure representing the hierarchy which supports subtyping tests. Metrics: Test time: answer if a ≺ b quickly preferably in constant time Space: achieve the smallest encoding length Measured in the average number of bits per type Encoding creation time The problem is most interesting for multiple inheritance hierarchies.

5 Obvious encodings Binary matrix (BM) Optimal for arbitrary hierarchies Test time is constant For n=5500 the BM size is 3.8MB Closure-encoding Stores the ancestors lists uses M log n space, but test time is O(log n) M is the number of both direct and indirect inheritance relations. DAG-encoding Stores the parents lists only m log n space, but test time is O(n) m is the number of direct inheritance relations.

6 Previous Work Constant encodings for tree hierarchies (single inheritance) Relative numbering [Schubert ’ 83] Cohen's algorithm [Cohen ’ 91] Constant encodings for general hierarchies (multiple inheritance) Packed Encoding (PE) - generalization of Cohen's algorithm [Krall, Vitek and Horspool ’ 97] (best time results) Non-constant encodings for general hierarchies Bit-vectors [Krall, Vitek and Horspool ’ 97a] (best space results) And many more, e.g., range-compression, modulation, sparse-terms, and representation using union of interval orders

7 Relative numbering (for trees only) Apply postorder numbering The ordinal of b in the postorder is denoted r b All descendants of b have consecutive numbers, this interval is denoted [l b, r b ] a ≺ b  l b ≤ r a ≤ r b

8 Packed encoding (PE) Partition the hierarchy into the smallest number of slices Two types in a slice do not have a common descendant NP-complete, good heuristic by Vitek et al a ≺ b  r a [s b ] = id b

9 Our Technique: PQ encoding (PQE) Combine the ideas of Relative Numbering with slicing as used in Packed Encoding Partition the nodes into slices. Each slice S i has an ordering π i. of all nodes in the hierarchy. Slicing property: the descendants of each node b ∈ S i are consecutive in π i.

10 Visualizing PQ Encoding

11 Pseudo code for subytping test Procedure IsSub(A,B) // return true if A < B c  slice_of(B) id  array A [c] [ ,  ]  interval of descendants of B if (id  [ ,  ]) return true else return false End The above can be encoded in 4-5 machine instructions

12 Finding a Good PQ Encoding? Main objective: minimize the number of slices. Each slice adds an entry to each one of the arrays. The main difficulty: the slicing property, i.e., that there is a consecutive ordering of all descendants of nodes in a slice. Each node in a slice imposes a constraint on the ordering. Tool: PQ-trees – a data structure which saves all the orderings which satisfy a set of such constraints.

13 PQ-trees Invented by Booth and Leuker, 1976 Used to test for the consecutive 1's property in binary matrices of size r  s, in time O(k+r+s) where k is the number of 1's in the matrix. It is called PQ tree, since it has nodes of two kinds, P- and Q-nodes. Enabled the first linear time algorithm for recognizing interval graphs (using the maximal cliques matrix) Used also to recognize (doubly) convex bipartite graphs Later used for other graph-theoretical problems on-line planarity testing maximum planar embeddings A PQ-tree  represents a set of orderings, denoted consistent (  ).

14 Constructing a PQ-tree U is the set of all nodes. A constraint is a set I  U which must appear together. Let  2 U be a set of constraints. Let Π(  ) be the collection of all orderings U such which satisfy all the constraints in . Theorem (Booth-Leuker (1976)) For every  exists a PQ-tree , and for every  exists  such that Π(  )= consistent (  ) Generating  from  :    u  u is the universal PQ-tree  reduce ( ,I) for every I ∈  reduce conducts a bottom-up traversal, at each step applying one of standard eleven PQ-tree transformation

15 Creation algorithm 1.  1 ; S[  ]  u 2. For all a ∈ T do // Find a PQ-tree consistent with type a 3. For s=1,...,  do 4. reduce (S[s],descendants(a)) 5. exit loop if reduce succeeded 6. s a  s 7. If s=  then // Start a new slice 8.  +1 ; S[  ]  u

16 Data-set 13 non-tree hierarchies used in real life programs 66-5,438 types (over 18,500 types in total) PQ works so nicely, since even dense MI hierarchies are tree like in many ways Average number of parents is always less than 2. Average number of ancestors can be high (30 in Self) Height is similar to that of balanced binary tree. Hierarchies can be broken into a core + bottom trees A type is in the core if it has a descendant with more than one parent. The median core size is 21%. core height

17 Optimizations Improving all 3 metrics: test time, space, creation time Not graph theoretic Encoding the core, and adding the bottom-trees later Specialization Length optimization and pseudo arrays Heterogeneous encoding Inlining Coalescing This optimization sometimes reduces space, albeit increases test time The new encoding is called CPQE

18 Results (Space Metric) Encoding length of different algorithms CPQE and BPE are variants of PQE and PE, respectively.

19 Conclusions & Future Research PQE improves encoding length, creation time and test time of NHE (details in the paper) The CPQE variant, tailored for object layout like the one in C++, further reduces the encoding length. Future work Incremental encoding

20 The END

21 PQ-trees cont. A PQ-tree has three kinds of a nodes a leaf which represents a member of a given set U a Q-node which represents the constraint that all of its children must occur in the order they occur in the tree or in reverse order a P-node which specifies that its children must occur together, but in any order consistent (  ) frontier (  )

22 This interval is denoted [l b, r b ] The ordinal of a in π i is denoted id a [i] Thus, a ≺ b  l b ≤ id a [s b ] ≤ r b Relative numberingPQE postorder a ≺ b  l b ≤ id a [s b ] ≤ r b

23 Previous work - Summary EncodingTest timeEncoding length Relative numberingO(1)log n Cohen's algorithmO(1) (| ≺ | log n)/n BMO(1)n ClosureO(log n) (| ≺ | log n)/n DAGO(n) (| ≺ d | log n)/n PEO(1)≥Closure Bit-vectors≈O(1)? Range-compression?≈O(1) Only for SI Obvious encodings Needs to be compared on the data-set

24 Bit-vectors Embeds the hierarchy in the lattice of subsets of {1...k}, each subset is represented as a bit-vector NP-hard to find minimal k, best heuristic is NHE a ≺ b  vec b  vec a = vec b {1,2,3,6} {1,2,3}

25  ≤ ≥ ≺ d ≺ π ≤ ∈  ≤ ≥ ≺ d ≺  ≤ ≥ ≺ d ≺ π ≤ u ∈

26 Definitions ≺ d is the transitive reduction of ≺ ≺ is the transitive closure of ≺ d Formally, a ≺ d b iff a ≺ b and there is no c such that a ≺ c ≺ b, a≠c≠b. Also, ancestors(a) ≡ {b ∈ T| a ≺ b}, descendants(a) ≡ {b ∈ T| b ≺ a} parents(a) ≡ {b ∈ T| a ≺ d b}, children(a) ≡ {b ∈ T| b ≺ d a} roots ≡ {a ∈ T| parents(a)= ∅ }, leaves ≡ {a ∈ T| children(a)= ∅ } level(a) ≡ 1+max{level(b)| b ∈ parents(a)} Single inheritance (SI) vs. multiple inheritance (MI) In SI, for each a ∈ T, |parents(a)|≤1

27 Cohen's algorithm Partition the hierarchy into levels a ≺ b  l b ≤ l a and r a [l b ] = id b l b is level(b), id b is a unique identifier within the level

28 Range compression Apply postorder on some spanning forest a ≺ b  l b [i] ≤ id a ≤ r b [i], for some i {2,5,6}{1,2,3}

29 Optimizations Creation time Encoding the core, and inserting the bottom-trees later Encoding length Length optimization reduces the range needed for the ids. Thus, all slices (except the first) only uses a single byte. Heterogeneous encoding uses BM representation for slices whose size is smaller than 8. Specialization Emitting values which depend only on the supertype into the test code, e.g., l b and r b. Also improves test time (saves load instructions).

30 Inlining optimization Uses the freedom the compiler have in placing the runtime representation of the types The first slice is inlined Instead of using id a [1] we use the pointer to the runtime representation Reduces 16 bits from the encoding length Saves one load if the supertype is from the first slice The first slice constitutes 90% of the types Using this technique in relative-numbering reduces the encoding length to zero.

31 Coalesced PQ-encoding (CPQE) When C++ had only SI, the runtime information was stored before the VTBL In MI there could be many VTBLs Implementers can either duplicate or share Sharing is done by another level of indirection In CPQE types can share their id array Since the first slice was inlined, some arrays can be coalesced The number of distinct arrays is always lower than the size of the core

32 Results cont. Encoding creation time in milliseconds (C)PQE on 266 Mhz Pentium II NHE on 500 Mhz Alpha (B)PE on 750 Mhz Pentium~III, user time in Linux

33 2-Dim encoding Idea: embed the hierarchy in the plane If not possible, use multiple slices a ≺ b  X a [s b ] ≥ X b [s b ] and Y a [s b ] ≥ Y b [s b ] 2-Dim encoding using one slice

34 Encoding creation A slice S has a pseudo 2-dimensional embedding if we can embed the hierarchy so that queries a ≺ b, b ∈ S, are answered correctly Theorem: A slice S has a pseudo 2-dimensional embedding iff dim(H S )=2