Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
Mauro Sozio and Aristides Gionis Presented By:
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Great Theoretical Ideas in Computer Science.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Logic Decomposition ECE1769 Jianwen Zhu (Courtesy Dennis Wu)
Summary Graphs for Relational Database Schemas Xiaoyan Yang (NUS) Cecilia M. Procopiuc, Divesh Srivastava (AT&T)
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Graph Theory in Computer Science
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Cohesive Subgraph Computation over Large Graphs
Answering pattern queries using views
Trie Indexes for Efficient XML Query Processing
CPT-S 415 Topics in Computer Science Big Data
Probabilistic Data Management
On Efficient Graph Substructure Selection
Structure and Content Scoring for XML
Structure and Content Scoring for XML
CSE 373: Data Structures and Algorithms
Minimum spanning trees
Relax and Adapt: Computing Top-k Matches to XPath Queries
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang

Real-life graph querying is expensive 2 social scale 100B (10 11 ) Web scale 1T (10 12 ) brain scale, 100T (10 14 ) Real-life scope 100M(10 8 ) US road Human Connectome, The Human Connectome Project, NIH knowledge graph BTC Semantic Web Web graph (Google) Internet (Opte project) An NSA Big Graph experiment, P.Burkhardt, et al, US. National Security Agency, May 2013

Querying collaborative network 3 customer developer project manager query 1 Customer developer query 2 PM 2 PM 1 customer 2developer 3developer 2 customer 2 developer 3 developer 2 customer 3 “Detecting Coordination Problems in Collaborative Software Development Environments”, Amrit Chintan et al, Information System management, 2010 customerdeveloper project manager A collaborative pattern PM 2 PM 1 customer 2 customer 1 developer 2 developer 3 developer 1 customer 3 A collaborative (chat) network developer k customer 3 customer n … … tester expensive!

Answering query using views 4 query A database D database views V(D) Q(D) query result query Q A(V) query result relational algebra 2002 XPath 2007 XML 2006 tree pattern query 1998 regular path queries RDF/SPARQL graph pattern query (bounded) simulation (our work) When? What to choose? How to evaluate?

Outline 5 Graph pattern matching using views ◦When, what and how? When a query can be evaluated using views? ◦Pattern containment: an iff condition How to evaluate? ◦query answering using views What to choose? ◦ minimum containment & minimal containment Extension: bounded simulation Experimental Study Conclusion

Graphs, patterns and views 6 customer developer pattern query customer 2 developer 3 developer 2 customer 3 query result edgesmatches (customer, developer){(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (view definition) (view extension) edgesmatches (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2), edgesmatches (customer, developer){(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} binary relation node match: satisfies predicates edge match: connects two node matches view definition 2 customer developer project manager customer developer view definition 1 view 1 view 2 view extension 1 view extension 2

Graph pattern matching using views 7 Given a pattern query Q, and a set V of view definitions, find another query A s.t. ◦A is equivalent to Q (A(G) = Q(G)) for all data graph G ◦A only refers to V and extensions V(G) query A data graph G views V Q(G) matches query Q A(G)

8 When a pattern query can be answered using views?

Pattern containment 9 customerdeveloper project manager customer developer project manager View 1 customer developer View 2 (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (project manager, developer)(PM 1, developer 2) (project manager, customer)(PM 1, customer 2) (developer, customer)(developer 2, customer 2) (customer, developer)(customer 2, developer 2) Query result

Determining Pattern containment 10

Pattern containment: example 11 customer developer project manager View 1 customer developer View 2 customerdeveloper project manager query as “data graph” λ customer project manager developer view matches

12 How to answer pattern query using views?

Query evaluation using views 13 Given Q, a set of views V and extensions, a mapping λ, find the query result Q(G) Algorithm ◦Collect edge matches for each query edge e and λ(e) ◦Iteratively remove non-matches until no change happens ◦Return Q(G)

Query evaluation using views 14 customerdeveloper query project manager customer developer project manager View 1 customer developer View 2 (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (project manager, developer){(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer){(PM 1, customer 2), (PM 2, customer 2)} (developer, customer){(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (customer, developer){(customer 2, developer 2), (customer 3, developer 3)} Query result “bottom-up” strategy

15 What should be selected?

What to choose? 16 customer developer project manager software tester customer software customer developer project manager customer developer software customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 3 view 4 view 5 view 6 choose all?

Minimum containment 17

An log|E p |-approximation 18

Minimum containment 19 customer developer project manager software tester customer software customer developer project manager customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 4 view 6 view 5 customer developer software view 3 Ec

Minimal containment 20

Minimal containment 21 customer developer project manager software tester customer software customer developer project manager customer developer project manager software customer developer project manager software tester developer software query view 2 view 1 view 4 view 6 view 5 customer developer software view 3

Bounded pattern matching using views 22 Bounded pattern queries Answering bounded pattern queries ◦Idea: “reduce” bounded pattern queries to weighted pattern queries ◦View matches: weighted edge to weighted paths ◦Complexity and algorithms carry over to bounded queries customerdeveloper project manager A collaborative pattern 2 2 PM customer 2 customer 1 developer 2 developer 1 A collaborative (chat) network tester customerdeveloper project manager View 1 customerdeveloper View

Putting everything together 23 ProblemComplexityAlgorithm SimulationcontainmentPTIMEO(card(V)|Q| 2 +|V| 2 +|Q||V|) minimum containment NP-c/APX-hardlog|E p |-approximable O(card(V)|Q| 2 +|V| 2 +|Q||V|+|Q|card(V) 3/2 ) minimal containment PTIMEO(card(V)|Q| 2 +|V| 2 +|Q||V|) evaluationPTIMEO(|Q||V(G)| + |V(G)| 2 ) Bounded simulation containmentPTIMEO(|Q| 2 |V|) minimum containment NP-c/APX-hardlog|E p |-approximable O(|Q| 2 |V|+|Q|card(V) 3/2 ) minimal containment PTIMEO(|Q| 2 |V|) evaluationPTIMEO(|Q||V(G)| + |V(G)| 2 ) ClassesRelationalXMLgraph/RDF languageConjunctive query Relational algebra Xpath (XQuery) RPQsECRPQs(P)SPARQL(bounded) pattern query containmentNP-cundecidablecoNP-c - undecida ble undecidable EXPTIMEPTIME

24 Experimental study

Efficiency: pattern queries 25 “Music”; < 7 days Comedy; View > 10k “Sports” Rate > 4 Youtube Views 2.2 times and 1.75 times faster greater improvement over denser graphs |E| = |V| a

Efficiency: bounded pattern queries 26 greater improvement over larger graphs “Books”; rating > 4 “Music CD”; sales rank> times and 7.1 times faster “DVD”; reviews> 1000 Amazon Views

Minimum vs. Minimal 27 Minimum takes slightly more time to find substantially smaller sets of views

conclusion 28 Pattern containment is tractable for (bounded) pattern queries Query evaluation using views is much more efficient for large graphs than “batch” counterparts Journey just starts… ◦More features to select good views to cache? ◦When a query is not contained in existing views? ◦View-based subgraph queries?

29 Thank you! Answering pattern query using views