Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Constraint Satisfaction Problems
The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.
Greening Backbone Networks Shutting Off Cables in Bundled Links Will Fisher, Martin Suchara, and Jennifer Rexford Princeton University.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
NetSEC: metrology-based application for network security Jean-François SCARIOT Bernard MARTINET Centre Interuniversitaire de Calcul de Grenoble TNC 2002.
Analysis of Algorithms
1 Introduction to Transportation Systems. 2 PART I: CONTEXT, CONCEPTS AND CHARACTERIZATI ON.
AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
CS4026 Formal Models of Computation Running Haskell Programs – power.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Dr. A.I. Cristea CS 319: Theory of Databases: FDs.
SQL: The Query Language Part 2
Reductions Complexity ©D.Moshkovitz.
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
1 Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs Accepted for presentation at SWAT 2010.
Quadratic Equations and Problem Solving
The Weighted Proportional Resource Allocation Milan Vojnović Microsoft Research Joint work with Thành Nguyen Microsoft Research Asia, Beijing, April, 2011.
3 Logic The Study of What’s True or False or Somewhere in Between.
Complex Numbers Objectives:
O X Click on Number next to person for a question.
Quadratic Inequalities
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Solve by Substitution: Isolate one variable in an equation
The x- and y-Intercepts
Graphing Ax + By = C Topic
The Slope-Intercept Form of a Line
Quadratic Graphs and Completing the Square
Absolute-Value Equations and Inequalities
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Problems and Their Classes
Properties of Exponents
Chapter 5 Test Review Sections 5-1 through 5-4.
 .
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Week 1.
We will resume in: 25 Minutes.
O X Click on Number next to person for a question.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
Solve an equation by multiplying by a reciprocal
CpSc 3220 Designing a Database
1 Functions and Applications
Introduction Distance-based Adaptable Similarity Search
MA 1165: Special Assignment Completing the Square.
all-pairs shortest paths in undirected graphs
© Imperial College LondonPage 1 Model checking and refinement checking for modal transition systems and their cousins MTS meeting 2007 Adam Antonik & Michael.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Presentation transcript:

Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.

2 Query answering on Big Data Query answering is expensive – Complexity of query answering is high SQL (RA): PSPACE-complete, SPC: NP-complete – On BIG D: simple operation is cost-prohibitive Query answering is cost-prohibitive when D is big, even for simple queries State-of-Art: A linear scan of a data set D would take 1.9 days when D is of 1PB (10 15 B) 5.28 years when D is of 1EB (10 18 B) Fast! (6GB/s)

3 What can we do? Is it possible to compute Q(D) within our available resources, no matter how large D is ? scale independence

4 On Scale Independence In practice: explicit terminating within certain budget – Anytime algorithms for Intelligent Systems ( Dean, 1987 ) – Approximate aggregate query answering systems (Armbrust; Agarwal) – Querying graphs within bounded resource (Fan, 2014) In theory: complexity bounds – Formalization and sound characterizations (Fan, PODS’14) Impossibility: characterization for RA queries is impossible. 1.How to decide queries that can be accurately answered scale independently? 2.How to scale independently answer such queries? 3.What if a query cannot be accurately answered scale independently? SPC queries : “the most fundamental and the most widely used queries”

5 Characterizing scale independence for SPC Whether a query Q has the following properties? for all datasets D, there exists a subset D Q of D such that 1)Q(D Q ) = Q(D); 2)D Q consists of no more than M tuples; and 3)D Q can be effectively identified with a cost independent of |D|. Boundedness Effective Boundedness Use effective boundedness to formalize scale independent queries

6 Q 0 : find all photos from an album a 0 in which a person u 0 is tagged by one of her friends. Example: A Real-life Query from Facebook Facebook graph DB (D 0 ) 1.25 billion users; 140 billion friend links Q is neither bounded nor effectively bounded!

7 Access Schema: utilizing data semantics Q is effectively bounded under the access schema Access schema for D 0 in_album: tagging: friends: Q 0 (D 0 ) can be evaluated by accessing no more than 7000 tuples

8 A bounded evaluation approach for querying Big Data Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Adjusting

9 A bounded evaluation approach for querying Big Data Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate scale independent query plans if it is. 2.Generating Making Q effectively bounded if it isn’t. 3. Making

10 Effective Boundedness Checking A characterization for boundedness: A sound and complete set of inference rules for boundedness A quadratic-time checking algorithm based on The above characterization Connection between boundedness and effective boundedness Checking effective boundedness is fast with our characterization!

11 A bounded evaluation approach Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Making

12 A direct characterization of effective boundedness: A sound and complete set of inference rules for effective boundedness A O(|Q| 2 | A | 3 ) bounded query plan generation algorithm Generating Effectively Bounded Query Plans Generating scale independent query plan is fast!

13 A bounded evaluation approach Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Adjusting

14 Making Queries Effectively Bounded Finding dominating parameters: – Good news: always possible (trivial parameters) – Bad news: nontrivial dominating parameters NP-complete and NPO-complete A quadratic time heuristic algorithm to making queries effectively bounded Parameterized queries in o recommender systems, o e-commercial searching and o social search platforms.

15 Evaluation on Real-life Datasets Real-life datasets: - UK traffic accident data (21.4GB) - The Ministry of Transport Test data (16.2GB) Experimental Results: 1. Effective boundedness is practical: -- easy to make parameterized queries effectively bounded 2. Bounded query evaluation approach is effective on big data: -- scale independent query plans faster than MySQL (even faster when D grows) Bounded query evaluation approach is an effective solution for querying big data!

16 Conclusion Summary Two characterizations of (effective) boundedness Fundamental problems A bounded evaluation framework for querying big data Algorithms underlying the framework