CIKM 2005 1 Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.

Slides:

Advertisements

Similar presentations

The Selim and Rachel Benin School of Engineering and Computer Science Keyword Proximity Search in Complex Data Graphs Konstantin Golenberg Benny Kimelfeld.

Advertisements

Introduction to Algorithms Quicksort

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

The Volcano/Cascades Query Optimization Framework

Fast Algorithms For Hierarchical Range Histogram Constructions

Greedy Algorithms Greed is good. (Some of the time)

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem.

Polynomial Time Approximation Schemes Presented By: Leonid Barenboim Roee Weisbert.

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

Combinatorial Algorithms

Lectures on Network Flows

Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University.

Los Angeles September 27, 2006 MOBICOM Localization in Sparse Networks using Sweeps D. K. Goldenberg P. Bihler M. Cao J. Fang B. D. O. Anderson.

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.

Chapter 4: Straight Line Drawing Ronald Kieft. Contents Introduction Algorithm 1: Shift Method Algorithm 2: Realizer Method Other parts of chapter 4 Questions?

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.

Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University.

Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.

V. V. Vazirani. Approximation Algorithms Chapters 3 & 22

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Algorithms for Enumerating All Spanning Trees of Undirected and Weighted Graphs Presented by R 李孟哲 R 陳翰霖 R 張仕明 Sanjiv Kapoor and.

Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,

1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.

Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.

Querying Structured Text in an XML Database By Xuemei Luo.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.

Union-find Algorithm Presented by Michael Cassarino.

Algorithms and data structures Protected by

Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.

QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.

איך עונים על שאילתה, כשהתוצאה גדולה מאד? שרה כהן בית הספר להנדסה ולמדעי המחשב ע"ש רחל וסלים בנין ע"ש רחל וסלים בנין.

1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.

Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.

Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

Routing Topology Algorithms Mustafa Ozdal 1. Introduction How to connect nets with multiple terminals? Net topologies needed before point-to-point routing.

@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.

Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Introduction to Multiple-multicast Routing Chu-Fu Wang.

Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.

Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.

Computing Full Disjunctions

Lectures on Network Flows

Red-Black Trees Motivations

Artificial Intelligence Problem solving by searching CSC 361

1.3 Modeling with exponentially many constr.

Objective of This Course

Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems

Chapter 6: Transform and Conquer

1.3 Modeling with exponentially many constr.

Major Design Strategies

Major Design Strategies

Introduction to XML IR XML Group.

Presentation transcript:

CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science האוניברסיטה העברית בירושלים The Hebrew University of Jerusalem

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 2 A paradigm for data extraction Data have varying degrees of structure –Relational databases, XML, Web sites Queries are sets of keywords −No structural constraints Keyword Proximity Search (KPS) The Goal: Extract meaningful parts of data w.r.t. the keywords

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 3 Querying Structure & Content by Keywords Keywords appear in different parts of the data Answers show occurrences of keywords, as well the associations among these occurrences Proximity of the keywords in the answer indicates a close (strong) semantic association among them Vardi Databases search …

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 4 Past Work on KPS ( Keyword Proximity Search ) DataSpotDataSpot (Sigmod 1998) Information UnitsInformation Units (WWW 2001) BANKSBANKS (ICDE 2002, VLDB 2005) DISCOVERDISCOVER (VLDB 2002) DBXplorerDBXplorer (ICDE 2002) XKeywordXKeyword (ICDE 2003) …

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 5 The Goal of this Paper Devise efficient algorithms for finding high- quality answers in keyword proximity search

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 6Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 7Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 8 Data Graphs  Structural and keyword nodes  Edges may have weights – Weak relationships are penalized by high weights

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 9Queries Q={ Summers, Cohen, coffee } Queries are sets of keywords from the data graph

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 10 Query Answers

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 11 Query Answers An answer is a directed subtree of the data graph  Contains all keywords of the query  Has no redundant edges (and nodes) The keywords of the query are the leaves The root has two or more children

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 12 Ranking: Inversely Proportional to Weight rank(A)=(weight(A)) -1 Smaller subtrees represent closer associations

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 13 Enumerating in Exact (Ranked) Order BCA BCA B C A BCA B C A B C A B C A If Then ≤ Top-k Answers B C A B C A

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 14 Enumerating in a C-Approximate Order BCA BCA B C A BCA B C A B C A B C A If Then ≤ C-Approximation of the Top-k Answers (Fagin et. al, PODS’01) C-Approximation of the Top-k Answers (Fagin et. al, PODS’01) B C A B C A C C may be a function of G and Q

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 15 Polynomial Delay Yardstick of efficiency: Polynomial delay Yardstick of efficiency: Polynomial delay BCA BCA B C A BCA B C A B C A B C A Polynomial time between generating successive answers Exponentially many answers even for 2 keywords (it is inefficient to generate all answers and then sort)

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 16Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 17 Top Answers are Steiner Trees intractableFinding the top answer in KPS (a.k.a. the Steiner- tree problem) is intractable –Therefore, one cannot enumerate all answers in ranked order with polynomial delay However, the top answer can be found efficiently under data complexity –That is, the number of keywords is fixed Approximations can be found efficiently under query-and-data complexity –There is a lot of work on Steiner-tree approximations

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 18 So What Can Be Done? ? Can answers of KPS be enumerated in the exact order with polynomial delay, under data complexity? ? Can approximations of Steiner trees be used for efficiently enumerating in an approximate order (while preserving the approximation ratio) ?

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 19 Our Results Theorem 1: Under data complexity, answers of KPS can be enumerated in the exact order with polynomial delay BCA BCA B C A BCA B C A B C A B C A

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 20 Our Results (cont’d) Theorem 2: Under query-and-data complexity, given an efficient C-approximation for finding Steiner trees, one can enumerate with polynomial delay in a (C+1)-approximate order BCA BCA B C A BCA B C A B C A B C A

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 21 The Meaning of the Results KPS is tractable under data complexity Under query-and-data complexity, an efficient enumeration in an approximate order can be done with almost the same ratios as Steiner trees All results on Steiner trees can be applied to KPS Existing approaches to KPS are heuristics –Exponential delay in the worst case –No provable nontrivial approximation ratios From a theoretical point of view, using heuristics is not the only option

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 22Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 23 Lawler’s Method We use the technique of Lawler (1972), which is an iterative method for finding the top-k answers Each iteration generates the next answer by finding the top answer under constraints Lawler’s method is designed for general (discrete) optimization problems When applying it to a specific problem, one needs to deal with the following two issues

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 24 Two Problems to Solve 1.constraints 1. What exactly are the constraints? ( That is, how can we apply Lawler’s method so that the constraints make it possible to find top answers efficiently? ) 2.efficiently ? 2. How can we find efficiently the top answer under constraints ?

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 25 Solving the First Problem subtrees Constraints are subtrees of the graph Pairwise node disjoint Their leaves are exactly the keywords of the query An answer satisfies the constraints if it supertree contains all the subtrees (i.e., a supertree)

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' constraints 1. What exactly are the constraints? ( That is, how can we apply Lawler in a way that the constraints enable finding the top answer efficiently? ) Two Problems to Solve (One Left) 2.efficiently ? 2. How can we find efficiently the top answer under constraints ?

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 27 Formulation of the Second Problem Input: Input: constraints (node-disjoint subtrees, keywords as leaves) Objective: A minimal answer satisfying the constraints (i.e., containing all the subtress) Next, an algorithm that solves “almost” this problem, namely: (Almost the same) Objective: A minimal supertree satisfying the constraints

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 28 Finding a Minimal Supertree Input: Input: G, T (constraints, i.e., subtrees) 1. Collapse each of the subtrees of T into a node 2. Find a Steiner tree T of the collapsed subtrees 3. Restore the collapsed subtrees in T (more details in the proceedings…)

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 29 (Almost the same) Objective: A minimal supertree satisfying the constraints This is not Enough! Input: Input: constraints (node-disjoint subtrees, keywords as leaves) Objective: A minimal answer satisfying the constraints (i.e., containing all the subtress) Not the same!

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 30 Query Answers Revisited An answer is a directed subtree of the data graph  Contains all keywords of the query  Has no redundant edges (and nodes) Keywords are the leaves The root has two or more children

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 31 An Example

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 32 An Example The minimal supertree satisfying the constraints The minimal answer satisfying the constraints This edge is redundant! But, it cannot be removed since it is a constraint! The minimal answer can be completely different from the minimal supertree Furthermore, there can be no answer even if there is a supertree

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 33 What if We Remove Edges of Constraints? What if we first generate a minimal supertree and if the root has only one child, then we just remove it (until an answer is obtained)? The constraints are violated, leading to a failure of Lawler’s method! That is, –Some answers will be duplicated –While other answers will not be generated at all

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 34 Our Approach Transform Min. Supertree Constraints Answer New constraints The root of this subtree has more than one child and it must be the root of the answer

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 35 Min. Supertree Min. Supertree Min. Supertree Min. Supertree Transform This Process is Repeated Constraints Up to 2 #keywords times (fixed & usually fewer) final answer The best is the final answer

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 36 About the Transformation The details of the exact transformation and the proof of correctness are intricate All can be found in the proceedings… This concludes the algorithm for enumerating in the exact order

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 37 A Different View: Chain of Reductions Enumerating answers in ranked order Finding the top answer under constraints Finding minimal supertrees Finding Steiner trees Adapting Lawler’s method Transformation of constraints Collapse and restore

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 38Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 39 Modifying the Chain of Reductions Enumeration in an approximate order Finding approximate answers under constraints Finding approximations of minimal supertrees Finding approximations of Steiner trees Similar Completely different!

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 40 Min. Supertree Min. Supertree Min. Supertree Min. Supertree Transform Constraints Exact Order Revisited Up to 2 #keywords We cannot allow it under query-and-data complexity!

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 41 The Algorithm Constraints C ≤ C times the optimum 1 ≤ 1 times the optimum A C-approximation of the minimal supertree (collapse and restore) A minimal answer for 3 or fewer constraints ( the algorithm for the exact order )

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 42 Combine the Subtrees The combined subgraph contains an answer (C+1) ≤ (C+1) times the optimum C ≤ C times the optimum 1 ≤ 1 times the optimum A C-approximation of the minimal supertree (collapse and restore) A minimal answer for 3 or fewer constraints ( the algorithm for the exact order )

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 43Contents Introduction Formal Setting The Main Results Enumerating in the Exact Order Enumerating in an Approximate Order Conclusion and Future Work

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 44 Keyword Proximity Search A common paradigm for keyword search over structured databases In the formal model: –Data are directed and weighted graphs –Queries are sets of keywords (i.e., nodes) from the data graph –Query answers are non-redundant subtrees containing the keywords of the query The goal is to find the top-k answers, where the rank is inversely proportional to the weight A stronger goal: enumeration with poly. delay

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 45 Our Results Under data complexity, answers can be enumerated in the exact ranked order with polynomial delay Under query-and-data complexity, every efficient C-approximation to the Steiner-tree problem yields an algorithm for enumerating answers with polynomial delay in a (C+1)-approximate order

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 46 Our Chain of Reductions Enumerating answers in sorted order Finding the top answer under constraints Finding minimal supertrees Finding Steiner trees Lawler’s approach The intricate part … Subtree Collapse/Restore

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 47 Other Variant of KPS Our algorithms can be adapted to other popular variants of KPS

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 48 Undirected Variant Answers are undirected trees

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 49 Strong Variant Answers are undirected trees and keywords are leaves

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 50 Open Problems ?Can we improve the space efficiency of our algorithms? Some ranking functions (e.g., height) are easier than weight when looking for the top answer (no constraints), but –The chain of reductions doesn’t work –The complexity of finding the top answer under constraints is unknown ?Can our results hold for richer queries that also have structural constraints?

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 51 Implementation Considerations Bottlenecks: Steiner-tree algorithms and approximations Thin graphs allow in-memory execution of our algorithms, even for large XML documents (e.g., DBLP) New and intuitive ranking functions that are easier to implement efficiently

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 52 Related Work: Order vs. Efficiency Exact Order Approximate Order Heuristic Order (no approx. guaranteed) No Order More Desirable More Efficient (Queries have a fixed size) This work Past work

CIKM Thank you. Questions?

CIKM Illustration of Lawler’s Method

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 55 Lawler’s Method (1972)

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Find the Top Answer In principle, at this point we should find the second-best answer But Instead…

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Partition the Remaining Answers

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Partition the Remaining Answers Each partition is defined by a distinct set of constraints

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Find the Top of each Set

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Find the Second Answer The second answer is the best among all the top answers in the partitions

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Further Divide the Chosen Partition

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 62 And so on…

CIKM Adapting Lawler’s Method

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 64 Our Constraints Node-disjoint subtrees of the data graph All the leaves are keywords An answer must contain all the subtrees Inclusion Inclusion constraints Edges of the data graph An answer must not contain any of the edges Exclusion Exclusion constraints

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 65 Partitioning a Partition (cont) A … edges(A) \ I = {e 1,…,e k } I A0A0 A0A0 E ⋃ { e 1 } I ⋃ { e 1 } A1A1 A1A1 E ⋃ { e 2 } I ⋃ { e 1,e 2 } A2A2 A2A2 E ⋃ { e 3 } I ⋃ { e 1,e 2,e 3 } A3A3 A3A3 E ⋃ { e 4 } I ⋃ { e 1,…,e k- 1 } A k-1 E ⋃ { e k } I AE

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 66 Generating Constraints (intuition) Constraints (subtrees/edges) are obtained from existing constraints of the current partition and the top answer

CIKM Collapsing Subtrees

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 68 Collapsing a Subtree

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Remove All Edges and Internal Nodes Only the root is left

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Remove Incoming Edges of Internal Nodes

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS' Add Outgoing Edges to the Root An edge that emanates from an internal node becomes an outgoing edge of the root

Finding and Approximating Top-k Answers in Keyword Proximity Search PODS'06 72 More Details When adding an outgoing edge (r,u) to the root, the weight of (r,u) is the minimal weight among all the edges from the collapsed subtree to u When restoring a subtree, each outgoing edge (r,u) of the root is replaced with an (arbitrary) original edge from the restored subtree to u, with the same weight Incoming edges of internal nodes of the subtree are never restored –Such edges cannot participate in G-supertrees