1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Algorithms + L. Grewe.
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
Combinatorial Algorithms
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 Querying Big Data: Theory and Practice Theory –Tractability revisited for querying big data –Parallel scalability –Bounded evaluability Techniques –Parallel.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
CS Lecture 9 Storeing and Querying Large Web Graphs.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
The Theory of NP-Completeness
An almost linear fully dynamic reachability algorithm.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Chapter 11: Limitations of Algorithmic Power
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
MCS312: NP-completeness and Approximation Algorithms
Fixed Parameter Complexity Algorithms and Networks.
1 QSX: Querying Social Graphs Querying Big Graphs Parallel scalability Making big graphs small –Bounded evaluability –Query-preserving graph compression.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Analysis of Algorithms
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Union-find Algorithm Presented by Michael Cassarino.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
CSE373: Data Structures & Algorithms Lecture 22: The P vs. NP question, NP-Completeness Lauren Milne Summer 2015.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
NP-Complete problems.
1 QSX: Querying Social Graphs Approximate query answering Query-driven approximation Data-driven approximation Graph systems.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Variations of the Prize- Collecting Steiner Tree Problem Olena Chapovska and Abraham P. Punnen Networks 2006 Reporter: Cheng-Chung Li 2006/08/28.
Discrete Optimization Lecture 4 – Part 1 M. Pawan Kumar
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Graphs + Shortest Paths David Kauchak cs302 Spring 2013.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Answering pattern queries using views
New Characterizations in Turnstile Streams with Applications
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
CSE 373 Data Structures and Algorithms
Lectures on Graph Algorithms: searching, testing and sorting
Presentation transcript:

1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering using views Bounded incremental query evaluation

2 How to make big graphs small Input: A class Q of queries Question: Can we effectively find, given queries Q  Q and any (possibly big) graph G, a small G Q such that Q(G) = Q(G Q )? Effective methods for making big graphs small Distributed query processing Boundedly evaluable graph queries Query preserving graph compression Query answering using views Bounded incremental evaluation Q( ) G G GQGQ GQGQ Much smaller than G

Graph pattern matching by graph simulation Input: A directed graph G, and a graph pattern Q Output: the maximum simulation relation R Using views? Incremental? 3 Maximum simulation relation: always exists and is unique If a match relation exists, then there exists a maximum one Otherwise, it is the empty set – still maximum Complexity: O((| V | + | V Q |) (| E | + | E Q | ) The output is a unique relation, possibly of size |Q||V|

Graph pattern matching using views 4 4

Answering queries using views 5 The complexity is no longer a function of |G| can we compute Q(G) without accessing G, i.e., independent of | G |? The cost of query processing: f(|G|, |Q|) Query answering using views: given a query Q in a language L and a set V views, find another query Q’ such that Q and Q’ are equivalent Q’ only accesses V ( G ) for any G, Q ( G ) = Q’( G ) Answering graph pattern queries on big social graphs: Regardless of how big G is – the cost is “independent” of G V ( G ) is often much smaller than G (4% -- 12% on real-life data) Q’( ) Q( ) V(G) G G

Querying collaborative network 6 customer developer project manager query 1 Customer developer query 2 PM 2 PM 1 customer 2developer 3developer 2 customer 2 developer 3 developer 2 customer 3 customerdeveloper project manager A collaborative pattern PM 2 PM 1 customer 2 customer 1 developer 2 developer 3 developer 1 customer 3 A collaborative (chat) network developer k customer 3 customer n … … tester expensive! Detecting Coordination Problems in Collaborative Software Development Environments, Amrit Chintan et al, Information System management, 2010 views

Answering query using views 7 query A database D database views V(D) Q(D) query result query Q A( V ) query result relational algebra 2002 XPath 2007 XML 2006 tree pattern query 1998 regular path queries RDF/SPARQL graph pattern query simulation When possible? What to choose? How to evaluate? A classical techniques, but in their infancy for graphs

When a pattern can be matched using views 8 Pattern containment: a characterization A necessary and sufficient condition

Pattern containment 9 customerdeveloper project manager customer developer project manager View 1 customer developer View 2 (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (project manager, developer)(PM 1, developer 2) (project manager, customer)(PM 1, customer 2) (developer, customer)(developer 2, customer 2) (customer, developer)(customer 2, developer 2) Query result How to determine the existence of ?

Determining Pattern containment 10 NP-complete for relational conjunctive queries, undecidable for relational algebra A practical characterization: patterns are small in practice

Pattern containment: example 11 customer developer project manager View 1 customer developer View 2 customerdeveloper project manager query as “data graph” λ customer project manager developer view matches V : the set of views; Q: query Query containment: given Q and Q’, it is to determine whether for any graph G, Q(G) is contained in Q’(G)? A classical problem. What is its complexity for pattern queries? efficient

Test: Pattern query containment Pattern query PM DBAPRG DBAPRG PM DBAPRG View 1 e1e1 e2e2 DBAPRG View 2 e3e3 e4e4 It takes 0.5 second to check containment of large cyclic patterns 12

Query evaluation using views 13 Input: pattern query Q, graph G, a set of views V and extensions in G, and a mapping λ Output: Find the query result Q(G) Algorithm ◦ Collect edge matches for each query edge e and λ(e) ◦ Iteratively remove non-matches until no change happens ◦ Return Q(G) Q(G) can be evaluated in O(|Q|| V (G)| + | V (G)| 2 ) time Recall simulation algorithm More efficient. Why?

Query evaluation using views 14 customerdeveloper query project manager customer developer project manager View 1 customer developer View 2 (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (project manager, developer){(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer){(PM 1, customer 2), (PM 2, customer 2)} (developer, customer){(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (customer, developer){(customer 2, developer 2), (customer 3, developer 3)} Query result “bottom-up” strategy Without accessing the underlying big graph G 4% -- 12% of G Are we done yet?

What views to choose? 15 customer developer project manager software tester customer software customer developer project manager customer developer software customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 3 view 4 view 5 view 6 choose all? Why do we care? efficiency

Minimum containment 16 Minimum containment is NP-complete ◦ APX-hard as optimization What can we do? Give two options

An log|Ep|-approximation 17 Idea: greedily select views V that “cover” more query edges E c : already covered To decide whether to include a particular view V Approximation: performance guarantees

Minimum containment: example 18 customer developer project manager software tester customer software customer developer project manager customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 4 view 6 view 5 customer developer software view 3 Ec Greedy: based on the metric

Minimal containment 19 Algorithm ◦ Computes view match for each view ◦ Iteratively selects a view that extends E c ◦ Repeats until Ec= Ep or return empty set O(|Q| 2 card( V ) + | V | 2 + |Q| | V |) time new addition Minimal containment is in PTIME

Minimal containment: example 20 customer developer project manager software tester customer software customer developer project manager customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 4 view 6 view 5 customer developer software view 3 Eliminate redundant views

Putting together 21 ProblemComplexityAlgorithm containmentPTIMEO(card(V)|Q| 2 +|V| 2 +|Q||V|) minimum containment NP-c/APX- hard log|E p |-approximable O(card(V)|Q| 2 +|V| 2 +|Q||V|+|Q|card(V) 3/2 ) minimal containment PTIMEO(card(V)|Q| 2 +|V| 2 +|Q||V|) evaluationPTIMEO(|Q||V(G)| + |V(G)| 2 ) characterization: sufficient and necessary condition for deciding whether a query can be answered using a set of views evaluation: how to evaluate queries using views view section: what views to choose for answering queries The study is still in its infancy for graph queries Subgraph isomorphism? View maintenance? Improvement: 23 times faster

Bounded incremental graph pattern matching 22

Incremental query answering 23 Minimizing unnecessary recomputation Incremental query processing: Input: Q, G, Q(G), ∆G Output: ∆M such that Q(G ⊕ ∆G) = Q(G) ⊕ ∆M Changes to the output New output Changes to the input Old output When changes ∆G to the graph G are small, typically so are the changes ∆M to the output Q(G ⊕ ∆G) Changes ∆G are typically small Compute Q(G) once, and then incrementally maintain it Real-life data is dynamic – constantly changes, ∆G Re-compute Q(G ⊕ ∆G) starting from scratch? 5%/week in Web graphs

Complexity of incremental problems Bounded: the cost is expressible as f(|CHANGED|, |Q|)? Optimal: in O(|CHANGED| + |Q|)? 24 Complexity analysis in terms of the size of changes Incremental query answering Input: Q, G, Q(G), ∆G Output: ∆M such that Q(G ⊕ ∆G) = Q(G) ⊕ ∆M The cost of query processing: a function of |G| and |Q| incremental algorithms: |CHANGED|, the size of changes in the input: ∆G, and the output: ∆M The updating cost that is inherent to the incremental problem itself The amount of work absolutely necessary to perform for any incremental algorithm Incremental algorithms? Incremental graph simulation: bounded G. Ramalingam, Thomas W. Reps: On the Computational Complexity of Dynamic Graph Problems. TCS 158(1&2),

Why study incremental query answering? View maintenance: in response to changes to the underlying graph Compressed graphs: maintenance in the presence of changes Indexing structure: 2-hop covers 25 An important issue Incremental query answering Input: Q, G, Q(G), ∆G Output: ∆M such that Q(G ⊕ ∆G) = Q(G) ⊕ ∆M E-commerce systems: a fixed set of (parameterized) queries –Repeatedly invoked and evaluated One of important issues for querying big graphs

|CHANGED|: the affected area Result graphs: Gr = (Vr, Er) for graph simulation 26 Q * Ann, CTO Pat, DB John, DB Bill, BioMat, Bio simulation  Vr : the nodes in G that match pattern nodes in Q  Er: the paths in G that match edges in Q Affected Area (AFF) the difference between Gr and Gr’ The size of changes in the output The complexity and boundedness analyses of incremental matching the result graph of Q(G ⊕ ∆G) |CHANGED| = |∆G| + |AFF| the result graph of Q(G)

Incremental graph pattern matching 27 Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO John, DB Bill, Bio Mat, Bio Ross, Med Tom, Bio Q * CTO DB Bio Insert e 2 G Gr ∆G Insert e 1 e2e2 John, CTO Tom, Bio e3e3 e4e4 e5e5 e1e1 Insert e 3 Insert e 4 Insert e 5 Comparing the cost of incremental matching with its batch counterpart affected area 27

Incremental simulation matching Input: Q, G, Q(G), ∆G Output: ∆M such that Q(G ⊕ ∆G) = Q(G) ⊕ ∆M 28 2 times faster than its batch counterpart for changes up to 10% in O(|AFF|) time Optimal for –single-edge deletions and general patterns –single-edge insertions and DAG patterns Incremental simulation is in unbounded O(|∆G|(|Q||AFF| + |AFF| 2 )) time General patterns and graphs; batch updates Batch updates

Semi-boundedness Incremental simulation is in 29 Semi-boundedness is good enough! Independent of | G | Semi-bounded: the cost is a PTME function f(|CHANGED|, |Q|) | Q | is small O(|∆G|(|Q||AFF| + |AFF| 2 )) time for batch updates and general patterns Independent of | G |

unit deletions and general patterns: Algorithm IncMatch optimal with the size of changes - Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Q CTO DB Bio delete e 6 G Graffected area / ∆Gr e6e6 e6e6 1. identify s-s edges 2. find invalid match 3. propagate affected area and refine matches Incremental Simulation: optimal results e = (v, v’), if v and v’ are matches Use a stack, upward propagation Linear time wrt. the size of changes

unit insertion and DAG patterns: Algorithm IncMatch optimal with the size of changes + Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Q CTO DB Bio insert e 7 G Gr candidate 1.identify cs and cc edges 2. find new valid matches 3. propagate affected area and refine matches e7e7 e7e7 e7e7 Linear time wrt. the size of changes Incremental Simulation: optimal results e = (v, v’), if v’ is a match and v a candidate e = (v, v’), if v’ and v are candidate

Incremental subgraph isomorphism Input: Q, G, M iso (Q, G), ∆G Output: ∆M such that M iso (Q, G ⊕ ∆G) = M iso (Q, G) ⊕ ∆M Boundedness and complexity Incremental matching via subgraph isomorphism is unbounded even for unit updates over DAG graphs for path patterns Incremental subgraph isomorphism is NP-complete even when G is fixed 32 Neither bounded nor semi-bounded not semi-bounded unless P = NP Input: Q, G, M(Q, G), ∆G Question: whether there exists a subgraph in G ⊕ ∆G that is isomorphic to Q What should we do?

Compress G by leveraging the equivalence relation Equivalence relation: reachability relation R e : a node pair (u,v) ∈ R e iff they have the same set of ancestors and descendants in G. for any graph G, there is a unique maximum R e, i.e., the reachability equivalence relation of G Recall reachability queries Reachability Input: A directed graph G, and a pair of nodes s and t in G Question: Does there exist a path from s to t in G? O(|V| + |E|) time 33

Incremental Reachability Preserving Compression Incremental reachability preserving compression (RCM) –unbounded even for unit update, i.e., a single edge insertion and deletion RCM is solvable in O(|AFF||Gc|) time without decompressing Gc 16 Reduction from single source reachability problem FA 1 C2C2 C1C1 FA 2 G FA 1 C1C1 FA 2 C2C2 Gr C1C1 FA 2 C2C2 Gr’ C1C1 FA 1 FA 2 C2C2 Gr’’ 1. Update topological ranking, initialize AFF FA 1 C1C1 FA 2 C2C2 2. (iteratively) split/merge nodes and update Gc Without decompressing Gc

Graph pattern matching by graph simulation Input: A directed graph G, and a graph pattern Q Output: the maximum simulation relation R 35 Bisimulation: a binary relation B over V of G, such that for each node pair (u,v) ∈ B, L(u) = L(v) for each edge (u,u’) ∈ E, there exists (v,v’) ∈ E, s.t. (u’,v’) ∈ B, for each edge (v,v’) ∈ E, there exists (u,u’) ∈ E, s.t. (u’,v’) ∈ B Equivalence relation Rb: the unique maximum bisimulation relation Compress G by leveraging the equivalence relation

Incremental simulation Preserving Compression 17 G BSA 1 MSA 2 BSA 2 … MSA 1 FA 1 FA 2 FA 3 FA 4 C1C1 C2C2 C3C3 C4C4 FA 2 C2C2 FA 1 FA 3 FA 4 … C1C1 C3C3 C4C4 MSA 2 MSA 1 BSA 1 BSA 2 GqGq Incremental pattern preserving compression (PCM) is unbounded even for unit update RCM is solvable in O(|AFF| 2 +|Gc|) time without decompressing Gc 1. Update node ranking, initialize AFF 2. Iteratively split/merge nodes in Gc and update AFF Affected area Incremental compression without recomputation

Incremental graph compression Input: G, Gc = R(G), ∆G Output: ∆Gc such that R(G ⊕ ∆G) = R(G) ⊕ ∆Gc Compressed once and incrementally maintained No need to decompress Gc Gc is computed once for all queries Q in L Boundedness and complexity unbounded even for unit updates in O( |AFF| 2 + | Gc | ) time 37

Putting together 38 Prove (semi-)boundedness: develop a (semi-)bounded incremental algorithms Disprove (semi-)boundedness: by contradiction or reduction Semi-bounded incremental algorithms for querying big data Bounded and semi-bounded incremental algorithms Incremental graph simulation: semi-bounded – Cyclic patterns and graphs – Batch updates Optimal for –single-edge deletions and general patterns –single-edge insertions and DAG patterns

Summing up 39

40 Making big data small Yes, it is doable! Parallel query processing: divide and conquer Bounded evaluable queries: dynamic reduction Query preserving compression: convert big data to small data Query answering using views: make big data small Bounded incremental query answering: depending on the size of the changes rather than the size of the original big data... Combinations of these are more effective Including but not limited to graph queries MapReduce not the only way, and it is not the best way! 5.28 years * 365 * 24 * 3600 (EB)  24 second! Improvement: times (bounded evaluability), 60% 55 times (parallel processing via partial evaluation) 23 times (query answering using views) 2.3 times faster (compression) 2 times faster for changes up to 10% (incremental)

41 Summary and review What is query answering using views? What is query containment? What is the complexity of deciding query containment for relations? For XML? Graph pattern queries via graph simulation? What questions do we have to answer for answering graph queries using views? What is incremental query evaluation? What are the benefits? What is a unit update? Batch updates? When can we say that an incremental problem is bounded? Semi-bounded? How to show that an incremental problem is bounded? How to disprove it?

42 Project (1) 42 Develop a characterization (a sufficient and necessary condition) for deciding whether subgraph queries can be answered using views. Develop an algorithm for determining whether a subgraph query can be answered using views, based on your characterization. Develop an algorithm that, given a graph G, a set V of views and a subgraph query Q that can be answered using the views, computes Q(G) by using views in V Give correctness and complexity analyses of your algorithms. Experimentally evaluate your algorithms, especially their scalability with the size of graphs A research and development project Recall graph pattern matching via subgraph isomorphism (Lecture 3),referred to as subgraph queries in the sequel.

43 Project (2) 43 Study incremental maintenance of 2-hop covers, in response to node insertion node deletion edge insertion edge deletion Develop an incremental algorithm in each of these settings. Is the incremental problem bounded in each of the settings? If so, show that your incremental algorithm is bounded; otherwise disprove the boundedness of the incremental problem Implement your algorithms, and prove their correctness Experimentally evaluate your algorithms, especially their scalability A research and development project Recall 2-hop covers for reachability queries (Lecture 2): for each node v in G, maintain 2hop(v) = (L in (v), L out (v)) such that for a node s can reach t if and only if L out (s)  L in (t)  

44 Project (3) 44 Study incremental maintenance of SSC, in response to node insertion node deletion edge insertion edge deletion Develop an incremental algorithm in each of these settings. Is the incremental problem bounded in each of the settings? If so, show that your incremental algorithm is bounded; otherwise disprove the boundedness of the incremental problem Implement your algorithms, and prove their correctness; Experimentally evaluate your algorithms, especially their scalability A research and development project Recall strongly connected components (SSC, Lecture 2).

45 W. Le, S. Duan, A. Kementsietsidis, F. Li, and M. Wang. Rewriting queries on SPARQL views. In WWW, D. Saha. An incremental bisimulation algorithm. In FSTTCS, bis-07.pdf S. K. Shukla, E. K. Shukla, D. J. Rosenkrantz, H. B. H. Iii, and R. E. Stearns. The polynomial time decidability of simulation relations for finite state processes: A HORNSAT based approach. In DIMACS Ser. Discrete, (search Google Scholar) W. Fan, X. Wang, and Y. Wu. Answering Graph Pattern Queries using Views, ICDE (query answering using views) W. Fan, X. Wang, and Y. Wu. Incremental Graph Pattern Matching, TODS 38(3), (bounded incremental query answering) Papers for you to review