Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.

Similar presentations


Presentation on theme: "Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011."— Presentation transcript:

1 Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011 ACM SIGMOD/PODS Conference Athens, Greece PODS 2011

2 2 DB (Graph) Search XKeyword [Balmin et al.] (XML) [Achiezra, K, S] (XML) BANKS [Bhalotia & al.] (rel.) DBXploer [Agrawal et al.] Blinks [He et al.] SPARK [Lou et al.] DISCOVER [Hristidis et al.] RDF search [Tran et al.] … widom ullman search carrey williams search eu brussels search

3 3 DB Search: 2 Approaches graph algorithm e.g., top-k Steiner-trees DB query 1 query 2 query 3 schema data graph query gen. query engine SQL, XQuery, … search results DB

4 4 ExQueX [K et al.] Aided Query Formulation QUICK [Zenz et al.]

5 5 Building Integration Forms concepts (as keywords) Q System [Talukdar et al.]

6 6 The Problem (Informal) { Mary, dept } SELECT * FROM employee e, dept d WHERE e.name=‘Mary’ and e.dept=d.id { Mary, manager } SELECT * FROM employee e WHERE e.name=‘Mary’ AND e.type=‘manager’ SELECT * FROM employee e WHERE e.name=‘Mary’ AND e.type=‘manager’ { dept, dept }{ Mary, dept, dept }{ Jacob, Mary, manager } … bags of labels schema: employee eidnametypedept idnamehead (managers named Mary) SELECT * FROM employee e1, employee e2, dept d WHERE e1.name=‘Mary’ AND e1.dept=d.id AND d.head=e2.eid AND e2.type=‘manager’ SELECT * FROM employee e1, employee e2, dept d WHERE e1.name=‘Mary’ AND e1.dept=d.id AND d.head=e2.eid AND e2.type=‘manager’ (Mary’s department (Mary’s manager)

7 7 And in XML...... DTD { facility, headq } department[branch/facility][headq] department branch facility headq { facility, facility } department[branch/facility][branch/facility] department branch facility branch facility

8 8 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph Discover/XKeyword [Balmin et al.] SPARK [Lou et al.] QUICK [Zenz et al.] DBXploer [Agrawal et al.] ExQueX [K et al.] Q System [Talukdar et al.] Acyclic queries ─ tree patterns Ranking dominated by size: smaller = better

9 9 Example of Query Extraction employee eidnametypedept e22Mariemanagera e68Jacobregularb dept idnamehead adbe22 bsalese78 employee regularmanager dept regularmanager employee regularmanager employee regularmanager deptemployee regularmanager deptemployee regularmanager deptemployee

10 10 Neighborhood Constraints employee regular manager dept possible neighbors of employee #≤1 #=1 dept possible neighbors of dept employee #=1 regular ? | ( manager * ) ) dept,, ( dept

11 11 Efficiency Matters! Q System [Talukdar et al, VLDB’08] 408 relations 1366 table references …company data model with over 13,000 database tables. To manage the meta data (e.g., types and interrelationships) of these tables… Kemper et al., “Performance tuning for SAP R/3”, IEEE Data Eng. Bull., 1999. Schemas can be much larger, e.g., SAP: Speed is important since patterns are typically extracted following a user query… w/o taking constraints into account, larger K needed! Existing solutions usually experiment on tiny schemas; an exception:

12 12 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph Discover/XKeyword [Balmin et al.] SPARK [Lou et al.] QUICK [Zenz et al.] DBXploer [Agrawal et al.] ExQueX [K et al.] Q System [Talukdar et al.] No guarantee on the running time Can construct exponentially many intermediate, partial patterns, before finding even 1 tree pattern Repeated labels are not allowed No neighborhood constraints employee regularmanager dept employee

13 13 In the Paper 1 st provably efficient algorithms allowing expressive constraints (and repeated labels) Future work: extending to top-k –In the paper: subtle issues in defining top-k Existing techniques for top-k repeatedly solve top-1, w/ additional constraints –Shortest simple paths [Yen, 1971] –Smallest Steiner trees [K & S, 2006] –Most probable answers over Markov sequences [K & Ré, 2010] –…–… Finding a Minimal Tree Pattern Under Neighborhood Constraints

14 14 Abstraction: Schema (undirected) label 1 label 2... label n finite set of labels neighborhood constrains each constraint is a (possibly infinite) set of bags, in some rep. language bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56.

15 15 label 1 label 2 label n bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. Problem Definition (undirected model) bag Λ of labels e.g., { Mary, manager }, { country, country, border } input goal: goal: minimal tree T, s.t. T contains Λ T is consistent with S... schema S weights allowed label 2 bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. ∀ nodes v ∈ T : labels( nbrs( v ) ) ⊆ some bag-of-labels i of those of label( v ) v ⊆...

16 16 Simple Example journal #=1 conferencetitleauthor #≤1 journal PODS publication conference VLDB publication conference PODS VLDB publication conference publication conference author PODS VLDB PODS VLDB publication

17 17 Complexity Measures input:output:  Λ (bag of labels)  S (schema) min T s.t. Λ ⊆ T & T is consistent w/ S (constraints in rep. language ) NP-hard, already under trivial rep. languages… (even to approx.) But Λ is typically tiny! derived from a user-phrased (search) query fixed | Λ | “ efficient ” = e.g., O(| S | | Λ | ) possible under a very general condition on the rep. language (not discussed, see paper) even better: FPT, i.e. Fixed-Parameter Tractable “ efficient ” = e.g., O(2 | Λ | | S | 2 ) next slides

18 18 #≤5 Rep: Mutual-Exclusion Graphs journal #= 1 #≤1 conference conf-dateedition titleauthor #≤1 Theorem:not FPT Theorem: not FPT (under standard assumptions; FPT reduction from parameterized Independent-Set, which is W[1]-hard [Abrahamson et al., 93]) Theorem: FPT for classes of mux graphs: disjoint cliques more generally, interval graphs more generally, circular-arc graphs publication interval graph

19 19 Rep: Regular Expressions title, author*, (journal | (conference, conf-date) | edition) Theorem: FPT e := label | ε | e * | e ? | e, e | e|e publication

20 20 Proof Strategy (Algorithm) input:output:  Λ (bag of labels)  S (schema) min T s.t. Λ ⊆ T & T is consistent w/ S (nontrivial) adaptation of Dreyfus & Wagner’s algorithm for Steiner trees reduction labeled bag cover generalized minimum set cover: bags instead of sets, cover needs to satisfy a neighborhood constraint dynamic prog. for disjoint-cliques mux graphs circular-arc graphs regular expressions works for interval graphs Thanks: Ryan & Virginia Williams

21 21 Summary Plethora of tools extract minimal tree patterns from the schema (relational / XML /…) –Existing solutions: exptime and/or no repeated labels; not allowing even basic neighborhood constraints Simple & general abstraction of the problem – allowing neighborhood constraints Algorithms –In particular, our FPT alg.s find a min pattern under 2 constraint languages: regexp & mutual exclusion Next step: based on this work, top-k Questions?


Download ppt "Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011."

Similar presentations


Ads by Google