Download presentation
Presentation is loading. Please wait.
Published byJeremy Marriott Modified over 10 years ago
1
Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.
2
2 Query answering on Big Data Query answering is expensive – Complexity of query answering is high SQL (RA): PSPACE-complete, SPC: NP-complete – On BIG D: simple operation is cost-prohibitive Query answering is cost-prohibitive when D is big, even for simple queries State-of-Art: A linear scan of a data set D would take 1.9 days when D is of 1PB (10 15 B) 5.28 years when D is of 1EB (10 18 B) Fast! (6GB/s)
3
3 What can we do? Is it possible to compute Q(D) within our available resources, no matter how large D is ? scale independence
4
4 On Scale Independence In practice: explicit terminating within certain budget – Anytime algorithms for Intelligent Systems ( Dean, 1987 ) – Approximate aggregate query answering systems (Armbrust; Agarwal) – Querying graphs within bounded resource (Fan, 2014) In theory: complexity bounds – Formalization and sound characterizations (Fan, PODS’14) Impossibility: characterization for RA queries is impossible. 1.How to decide queries that can be accurately answered scale independently? 2.How to scale independently answer such queries? 3.What if a query cannot be accurately answered scale independently? SPC queries : “the most fundamental and the most widely used queries”
5
5 Characterizing scale independence for SPC Whether a query Q has the following properties? for all datasets D, there exists a subset D Q of D such that 1)Q(D Q ) = Q(D); 2)D Q consists of no more than M tuples; and 3)D Q can be effectively identified with a cost independent of |D|. Boundedness Effective Boundedness Use effective boundedness to formalize scale independent queries
6
6 Q 0 : find all photos from an album a 0 in which a person u 0 is tagged by one of her friends. Example: A Real-life Query from Facebook Facebook graph DB (D 0 ) 1.25 billion users; 140 billion friend links Q is neither bounded nor effectively bounded!
7
7 Access Schema: utilizing data semantics Q is effectively bounded under the access schema Access schema for D 0 in_album: tagging: friends: Q 0 (D 0 ) can be evaluated by accessing no more than 7000 tuples
8
8 A bounded evaluation approach for querying Big Data Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Adjusting
9
9 A bounded evaluation approach for querying Big Data Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate scale independent query plans if it is. 2.Generating Making Q effectively bounded if it isn’t. 3. Making
10
10 Effective Boundedness Checking A characterization for boundedness: A sound and complete set of inference rules for boundedness A quadratic-time checking algorithm based on The above characterization Connection between boundedness and effective boundedness Checking effective boundedness is fast with our characterization!
11
11 A bounded evaluation approach Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Making
12
12 A direct characterization of effective boundedness: A sound and complete set of inference rules for effective boundedness A O(|Q| 2 | A | 3 ) bounded query plan generation algorithm Generating Effectively Bounded Query Plans Generating scale independent query plan is fast!
13
13 A bounded evaluation approach Given an SPC query Q: Check whether Q is effectively bounded. 1. Checking Generate bounded query plans if it is. 2. Evaluation Making Q effectively bounded if it isn’t. 3. Adjusting
14
14 Making Queries Effectively Bounded Finding dominating parameters: – Good news: always possible (trivial parameters) – Bad news: nontrivial dominating parameters NP-complete and NPO-complete A quadratic time heuristic algorithm to making queries effectively bounded Parameterized queries in o recommender systems, o e-commercial searching and o social search platforms.
15
15 Evaluation on Real-life Datasets Real-life datasets: - UK traffic accident data (21.4GB) - The Ministry of Transport Test data (16.2GB) Experimental Results: 1. Effective boundedness is practical: -- easy to make parameterized queries effectively bounded 2. Bounded query evaluation approach is effective on big data: -- scale independent query plans -- 10 3 faster than MySQL (even faster when D grows) Bounded query evaluation approach is an effective solution for querying big data!
16
16 Conclusion Summary Two characterizations of (effective) boundedness Fundamental problems A bounded evaluation framework for querying big data Algorithms underlying the framework
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.