Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.

Slides:



Advertisements
Similar presentations
The Selim and Rachel Benin School of Engineering and Computer Science Keyword Proximity Search in Complex Data Graphs Konstantin Golenberg Benny Kimelfeld.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Algorithms + L. Grewe.
Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem.
Management Science 461 Lecture 2b – Shortest Paths September 16, 2008.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Lectures on Network Flows
Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University.
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Aki Hecht Seminar in Databases (236826) January 2009
Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
An Efficient Fixed Parameter Algorithm for 3-Hitting Set
Randomized Process of Unknowns and Implicitly Enforced Bounds on Parameters Jianer Chen Department of Computer Science & Engineering Texas A&M University.
Chapter 11: Limitations of Algorithmic Power
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Efficient Methods for Solving Finite Satisfiability Problems in UML Class Diagrams Mira Balaban and Azzam Maraee.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Summary Graphs for Relational Database Schemas Xiaoyan Yang (NUS) Cecilia M. Procopiuc, Divesh Srivastava (AT&T)
Information Retrieval in Practice
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms: Minimum.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
Fixed Parameter Complexity Algorithms and Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Finding Optimal Probabilistic Generators for XML Collections Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart BDA 2011.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Querying Structured Text in an XML Database By Xuemei Luo.
Presenter: Shanshan Lu 03/04/2010
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
Marina Drosou, Evaggelia Pitoura Computer Science Department
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Vasilis Syrgkanis Cornell University
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Graph Indexing From managing and mining graph data.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Greedy & Heuristic algorithms in Influence Maximization
Top 50 Data Structures Interview Questions
Greedy Technique.
Computing Full Disjunctions
Probabilistic Data Management
Lectures on Network Flows
Dissertation for the degree of Philosophiae Doctor (PhD)
1.3 Modeling with exponentially many constr.
Automatic Physical Design Tuning: Workload as a Sequence
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
REDUCESEARCH Polynomial Kernels for Hitting Forbidden Minors under Structural Parameterizations Bart M. P. Jansen Astrid Pieterse ESA 2018 August.
2/18/2019.
1.3 Modeling with exponentially many constr.
Incremental Maintenance of XML Structural Indexes
Learning to Rank Typed Graph Walks: Local and Global Approaches
Integrating Class Hierarchies
Presentation transcript:

Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011 ACM SIGMOD/PODS Conference Athens, Greece PODS 2011

2 DB (Graph) Search XKeyword [Balmin et al.] (XML) [Achiezra, K, S] (XML) BANKS [Bhalotia & al.] (rel.) DBXploer [Agrawal et al.] Blinks [He et al.] SPARK [Lou et al.] DISCOVER [Hristidis et al.] RDF search [Tran et al.] … widom ullman search carrey williams search eu brussels search

3 DB Search: 2 Approaches graph algorithm e.g., top-k Steiner-trees DB query 1 query 2 query 3 schema data graph query gen. query engine SQL, XQuery, … search results DB

4 ExQueX [K et al.] Aided Query Formulation QUICK [Zenz et al.]

5 Building Integration Forms concepts (as keywords) Q System [Talukdar et al.]

6 The Problem (Informal) { Mary, dept } SELECT * FROM employee e, dept d WHERE e.name=‘Mary’ and e.dept=d.id { Mary, manager } SELECT * FROM employee e WHERE e.name=‘Mary’ AND e.type=‘manager’ SELECT * FROM employee e WHERE e.name=‘Mary’ AND e.type=‘manager’ { dept, dept }{ Mary, dept, dept }{ Jacob, Mary, manager } … bags of labels schema: employee eidnametypedept idnamehead (managers named Mary) SELECT * FROM employee e1, employee e2, dept d WHERE e1.name=‘Mary’ AND e1.dept=d.id AND d.head=e2.eid AND e2.type=‘manager’ SELECT * FROM employee e1, employee e2, dept d WHERE e1.name=‘Mary’ AND e1.dept=d.id AND d.head=e2.eid AND e2.type=‘manager’ (Mary’s department (Mary’s manager)

7 And in XML DTD { facility, headq } department[branch/facility][headq] department branch facility headq { facility, facility } department[branch/facility][branch/facility] department branch facility branch facility

8 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph Discover/XKeyword [Balmin et al.] SPARK [Lou et al.] QUICK [Zenz et al.] DBXploer [Agrawal et al.] ExQueX [K et al.] Q System [Talukdar et al.] Acyclic queries ─ tree patterns Ranking dominated by size: smaller = better

9 Example of Query Extraction employee eidnametypedept e22Mariemanagera e68Jacobregularb dept idnamehead adbe22 bsalese78 employee regularmanager dept regularmanager employee regularmanager employee regularmanager deptemployee regularmanager deptemployee regularmanager deptemployee

10 Neighborhood Constraints employee regular manager dept possible neighbors of employee #≤1 #=1 dept possible neighbors of dept employee #=1 regular ? | ( manager * ) ) dept,, ( dept

11 Efficiency Matters! Q System [Talukdar et al, VLDB’08] 408 relations 1366 table references …company data model with over 13,000 database tables. To manage the meta data (e.g., types and interrelationships) of these tables… Kemper et al., “Performance tuning for SAP R/3”, IEEE Data Eng. Bull., Schemas can be much larger, e.g., SAP: Speed is important since patterns are typically extracted following a user query… w/o taking constraints into account, larger K needed! Existing solutions usually experiment on tiny schemas; an exception:

12 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph Discover/XKeyword [Balmin et al.] SPARK [Lou et al.] QUICK [Zenz et al.] DBXploer [Agrawal et al.] ExQueX [K et al.] Q System [Talukdar et al.] No guarantee on the running time Can construct exponentially many intermediate, partial patterns, before finding even 1 tree pattern Repeated labels are not allowed No neighborhood constraints employee regularmanager dept employee

13 In the Paper 1 st provably efficient algorithms allowing expressive constraints (and repeated labels) Future work: extending to top-k –In the paper: subtle issues in defining top-k Existing techniques for top-k repeatedly solve top-1, w/ additional constraints –Shortest simple paths [Yen, 1971] –Smallest Steiner trees [K & S, 2006] –Most probable answers over Markov sequences [K & Ré, 2010] –…–… Finding a Minimal Tree Pattern Under Neighborhood Constraints

14 Abstraction: Schema (undirected) label 1 label 2... label n finite set of labels neighborhood constrains each constraint is a (possibly infinite) set of bags, in some rep. language bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56.

15 label 1 label 2 label n bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 1 bag-of-labels 2 bag-of-labels 3. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. bag-of-labels 54 bag-of-labels 55 bag-of-labels 56. Problem Definition (undirected model) bag Λ of labels e.g., { Mary, manager }, { country, country, border } input goal: goal: minimal tree T, s.t. T contains Λ T is consistent with S... schema S weights allowed label 2 bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. bag-of-labels 17 bag-of-labels 18 bag-of-labels 19. ∀ nodes v ∈ T : labels( nbrs( v ) ) ⊆ some bag-of-labels i of those of label( v ) v ⊆...

16 Simple Example journal #=1 conferencetitleauthor #≤1 journal PODS publication conference VLDB publication conference PODS VLDB publication conference publication conference author PODS VLDB PODS VLDB publication

17 Complexity Measures input:output:  Λ (bag of labels)  S (schema) min T s.t. Λ ⊆ T & T is consistent w/ S (constraints in rep. language ) NP-hard, already under trivial rep. languages… (even to approx.) But Λ is typically tiny! derived from a user-phrased (search) query fixed | Λ | “ efficient ” = e.g., O(| S | | Λ | ) possible under a very general condition on the rep. language (not discussed, see paper) even better: FPT, i.e. Fixed-Parameter Tractable “ efficient ” = e.g., O(2 | Λ | | S | 2 ) next slides

18 #≤5 Rep: Mutual-Exclusion Graphs journal #= 1 #≤1 conference conf-dateedition titleauthor #≤1 Theorem:not FPT Theorem: not FPT (under standard assumptions; FPT reduction from parameterized Independent-Set, which is W[1]-hard [Abrahamson et al., 93]) Theorem: FPT for classes of mux graphs: disjoint cliques more generally, interval graphs more generally, circular-arc graphs publication interval graph

19 Rep: Regular Expressions title, author*, (journal | (conference, conf-date) | edition) Theorem: FPT e := label | ε | e * | e ? | e, e | e|e publication

20 Proof Strategy (Algorithm) input:output:  Λ (bag of labels)  S (schema) min T s.t. Λ ⊆ T & T is consistent w/ S (nontrivial) adaptation of Dreyfus & Wagner’s algorithm for Steiner trees reduction labeled bag cover generalized minimum set cover: bags instead of sets, cover needs to satisfy a neighborhood constraint dynamic prog. for disjoint-cliques mux graphs circular-arc graphs regular expressions works for interval graphs Thanks: Ryan & Virginia Williams

21 Summary Plethora of tools extract minimal tree patterns from the schema (relational / XML /…) –Existing solutions: exptime and/or no repeated labels; not allowing even basic neighborhood constraints Simple & general abstraction of the problem – allowing neighborhood constraints Algorithms –In particular, our FPT alg.s find a min pattern under 2 constraint languages: regexp & mutual exclusion Next step: based on this work, top-k Questions?