Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.

Similar presentations


Presentation on theme: "1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by."— Presentation transcript:

1 1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by partial evaluation

2 2 Coping with the sheer volume of big data Can we do better if we have n processors/SSDs? Divide and conquer: split the work across n processors DB M M M interconnection network D Using SSD of 6G/s, a linear scan of a data set D would take D 1.9 days when D is of 1PB (10 15 B) D 5.28 years when D is of 1EB (10 18 B) How to make use of parallelism in query answering? PP P

3 MapReduce 3 3

4 4 A programming model with two primitive functions: How does it work? Input: a list of key-value pairs Map: applied to each pair, computes key-value pairs The intermediate key-value pairs are hash-partitioned based on k2. Each partition (k2, list(v2)) is sent to a reducer Reduce: takes a partition as input, and computes key-value pairs Map:  list (k2, v2) Reduce:  list (k3, v3) The process may reiterate – multiple map/reduce steps Google

5 5 Architecture (Hadoop) No need to worry about how the data is stored and sent mapper reducer One block for each mapper (a map task) In local store of mappers Stored in DFS Partitioned in blocks (64M) Hash partition (k2) Aggregate results Multiple steps

6 6 Connection with parallel database systems Data partitioned parallelism mapper reducer Parallel computation What parallelism?

7 7 Parallel database systems Restricted query languages: only two primitive mapper reducer DB P M P M P M interconnection network

8 8 MapReduce implementation of relational operators Map(key, t) emit (t.A, t.A) Reduce(hkey, hvalue[ ]) emit(hkey, hkey) Input: for each tuple t in R, a pair (key, value), where value = t not necessarily a key of R Apply to each input tuple, in parallel; emit new tuples with projected attributes the reducer is not necessary; but it eliminates duplicates. Why? Projection  A R Mappers: processing each tuple in parallel

9 9 Selection Map(key, t) if C(t) then emit (t, “1”) Reduce(hkey, hvalue[ ]) emit(hkey, hkey) Input: for each tuple t in R, a pair (key, value), where value = t Apply to each input tuple, in parallel; select tuples that satisfy condition C How does it eliminate duplicates? Selection  C R Reducers: eliminate duplicates

10 10 Union Map: process tuples of R1 and R2 uniformly Map(key, t) emit (t, “1”) Reduce(hkey, hvalue[ ]) emit(hkey, hkey) Input: for each tuple t in R1 and s in R2, a pair (key, value) A mapper just passes an input tuple to a reducer Reducers simply eliminate duplicates Union R1  R2 A mapper is assigned chunks from either R1 or R2

11 11 Set difference Reducers do the checking Map(key, t) if t is in R1 then emit(t, “1”) else emit(t, “2”) Reduce(hkey, hvalue[ ]) if only “1” appears in the list hvelue then emit(hkey, hkey) Input: for each tuple t in R1 and s in R2, a pair (key, value) tag each tuple with its source Why does it work? Set difference R1  R2 distinguishable

12 12 Reduce-side join Reduce-side join (based on hash partitioning) Input: for each tuple t in R1 and s in R2, a pair (key, value) Natural R1 R1.A = R2.B R2, where R1[A, C], R2[B, D] Map(key, t) if t is in R1 then emit(t.[A], (“1”, t[C])) else emit(t.[B], (“2”, t.[D])) Reduce(hkey, hvalue[ ]) for each (“1”, t[C]) and each (“2”, s[D]) in the list hvalue emit((hkey, t[C], s[D]), hkey) Hashing on join attributes Nested loop How to implement R1 R2 R3? Inefficient. Why?

13 13 Map-side join Map-side join: Input relations are partitioned and sorted based on join keys Map over R1 and read from the corresponding partition of R2 Partitioned join Recall R1 R1.A = R2.B R2 Partition R1 and R2 into n partitions, by the same partitioning function in R1.A and R2.B, via either range or hash partitioning Compute R i 1 R1.A = R2.B R i 2 locally at processor i Merge the local results map(key, t) read R i 2 for each tuple s in relation R i 2 if t[A] = s[B] then emit((t[A], t[C], s[D]), t[A]) Limitation: sort order and partitioning Merge join

14 14 In-memory join Recall R1 R1.A < R2.B R2 Partition R1 into n partitions, by any partitioning method, and distribute it across n processors Replicate the other relation R2 across all processors Compute R j 1 R1.A < R2.B R2 locally at processor j Merge the local results Fragment and replicate join map(key, t) for each tuple s in relation R2 (local) if t[A] = s[B] then emit((t[A], t[C], s[D]), t[A]) Broadcast join A smaller relation is broadcast to each node and stored in its local memory The other relation is partitioned and distributed across mappers Limitation: memory

15 15 Aggregation Leveraging the MapReduce framework Map(key, t) emit (t[A], t[B]) Reduce(hkey, hvalue[ ]) sum := 0; for each value s in the list hvalue sum := sum + s; emit(hkey, sum) R(A, B, C), compute sum(B) group by A Grouping: done by MapReduce framework Compute the aggregation for each group

16 16 Practice: validation of functional dependencies A functional dependency (FD) defined on schema R: X  Y –For any instance D of R, D satisfies the FD if for any pair of tuples t and t’, if t[X] = t’[X], then t[Y] = t’[Y] –Violations of the FD in D: { t | there exists t’ in D, such that t[X] = t’[X], but t[Y]  t’[Y] } Develop a MapReduce algorithm to find all the violations of the FD in D key : Write a MapReduce algorithm to validate a set of FDs? Does MapReduce support recursion? –Map: for each tuple t, emit (t[X], t) –Reduce: find all tuples t such that there exists t’, with but t[Y]  t’[Y]; emit (1, t)

17 17 Transitive closures The transitive closure TC of a relation R[A, B] –R is a subset of TC –if (a, b) and (b, c) are in TC, then (a, c) is in TC That is, TC(x, y) :- R(x, y); TC(x, z) :- TC(x, y), TC(y, z). Develop a MapReduce algorithm that given R, computes TC A fixpoint computation –How to determine when to terminate? –How to minimize redundant computation? : Write a MapReduce algorithm

18 18 A naïve MapReduce algorithm Termination? Map((a, b), value) emit (a, (“r”, b)); emit(b, (“l”, a)); Reduce(b, hvalue[ ]) for each (“l”, a) in hvalue for each (“r”, c) in hvalue emit(a, c); emit(b, c); emit(a, b); Given R(A, B), compute TC Initially, the input relation R One round of recursive computation: Apply the transitivity rule Restore (a, b), (b, c). Why? Iteration: the output of reducers becomes the input of mappers in the next round of MapReduce computation

19 19 A MapReduce algorithm Naïve: not very efficient. Why? Map((a, b), value) emit (a, (“r”, b)); emit(b, (“l”, a)); Reduce(b, hvalue[ ]) for each (“l”, a) in hvalue for each (“r”, c) in hvalue emit(a, c); emit(b, c); emit(a, b); Given R(A, B), compute TC Termination: when the intermediate result no longer changes controlled by a non-MapReduce driver How many rounds? How to improve it?

20 20 Smart transitive closure (recursive doubling) Implement smart TC in MapReduce 1. P 0 (x, y) := { (x, x) | x is in R}; 2. Q 0 := R; 3. i : = 0; 4. while Q i   do a)i := i + 1; b)P i (x, y) :=  (x, y) (Q i-1 (x, z) P i-1 (z, y)); c)P i := P i  P i - 1 ; d)Q i (x, y) :=  (x, y) (Q i-1 (x, z) Q i-1 (z, y)); e) Q i := Q i  P i ; How many rounds? Recursive doubling: log (|R|) MapReduce: you know how to do join, union and set difference A project R: edge relation P i (x, y): the distance from x to y is in [0, 2 i – 1] Q i (x, y): the distance from x to y is exactly 2 i Recursive doubling Remove those guys with a shorter path [2 i-1, 2 i – 1]

21 21 Advantages of MapReduce Simple: one only needs to define two functions no need to worry about how the data is stored, distributed and how the operations are scheduled Fault tolerance: why? scalability: a large number of low end machines scale out (scale horizontally): adding a new computer to a distributed software application; lost-cost “commodity” scale up (scale vertically): upgrade, add (costly) resources to a single node flexibility: independent of data models or schema independence: it can work with various storage layers

22 22 Fault tolerance Able to handle an average of 1.2 failures per analysis job mapper reducer triplicated Detecting failures and reassigning the tasks of failed nodes to healthy nodes Redundancy checking to achieve load balancing

23 23 Pitfalls of MapReduce No schema: schema-free No index: index-free A single dataflow: a single input and a single output No high-level languages: no SQL Why bad? Inefficient to do join Low efficiency: I/O optimization, utilization Carry invariant input data; no support for incremental computation: redundant computation The MapReduce model does not provide a mechanism to maintain global data structures that can be accessed and updated by all mappers and reducers; maximize communication: all-to-all Functional programming Not minimize human effort, not to maximize performance Why low efficiency?

24 24 Inefficiency of MapReduce mapper reducer Blocking: Reduce does not start until all Map tasks are completed Despite these, MapReduce is popular in industry Unnecessary shipment of invariant input data in each MapReduce round – Haloop fixes

25 25 MapReduce platforms Apache Hadoop, used by Facebook, Yahoo, … –Hive, Facebook, HiveQL (SQL) –PIG, Yahoo, Pig Latin (SQL like) –SCOPE, Microsoft, SQL NoSQL –Cassandra, Facebook, CQL (no join) –HBase, Google, distributed BigTable –MongoDB, document-oriented (NoSQL) Distributed graph query engines –Pregel, Google –TAO: Facebook, –GraphLab, machine learning and data mining –Neo4j, Neo Tech; Trinity, Microsoft; HyperGraphDB (knowledge) A vertex-centric model Study some of these

26 Vertex-centric models 26

27 27 The BSP model (Pregel) Vertex: computation is defined to run on each vertex Supersteps: within each, all vertices compute in parallel –Each vertex modifies its states and that of outgoing edges –Sends messages to other vertices (in the next superstep) –Receives messages sent to it (from the last superstep) Termination: –Each vertex votes to halt –When all vertices are inactive and no messages in transit Synchronization: supersteps Asynchronous: all vertices within each superstep analogous to MapReduce rounds Message passing Vertex-centric, message passing Bulk Synchronous Parallel

28 28 The vertex centric model of GraphLab Vertex: computation is defined to run on each vertex All vertices compute in parallel –Each vertex reads and writes to data on adjacent nodes or edges Consistency: serialization –Full consistency: no overlap for concurrent updates –Edge consistency: exclusive read-write to its vertex and adjacent edges; read only to adjacent vertices –Vertex consistency: all updates in parallel (sync operations) Asynchronous: all vertices No supersteps asynchronous Machine learning, data mining

29 29 Vertex-centric models vs. MapReduce Vertex centric: think like a vertex; MapReduce: think like a graph Can we do better? Vertex centric: maximize parallelism – asynchronous, minimize data shipment via message passing MapReduce: inefficiency caused by blocking; distributing intermediate results Vertex centric: limited to graphs; MapReduce: general Lack of global control: ordering for processing vertices in recursive computation, incremental computation, etc Have to re-cast algorithms in these models, hard to reuse existing (incremental) algorithms

30 GRAPE: A parallel model based on partial evaluation 30

31 Querying distributed graphs Given a big graph G, and n processors S1, …, Sn G is partitioned into fragments (G1, …, Gn) G is distributed to n processors: Gi is stored at Si Dividing a big G into small fragments of manageable size Each processor Si processes its local fragment Gi in parallel Parallel query answering Input: G = (G1, …, Gn), distributed to (S1, …, Sn), and a query Q Output: Q(G), the answer to Q in G Q( ) G G G1G1 G1G1 GnGn GnGn G2G2 G2G2 … How does it work? 31

32 GRAPE (GRAPh Engine) 32 Divide and conquer partition G into fragments (G1, …, Gn), distributed to various sites manageable sizes upon receiving a query Q, evaluate Q( Gi ) in parallel collect partial answers at a coordinator site, and assemble them to find the answer Q( G ) in the entire G evaluate Q on smaller Gi data-partitioned parallelism Each machine (site) Si processes the same query Q, uses only data stored in its local fragment Gi 32

33 Partial evaluation 33 The connection between partial evaluation and parallel processing compute f( x )  f( s, d ) conduct the part of computation that depends only on s generate a partial answer the part of known input Partial evaluation in distributed query processing evaluate Q( Gi ) in parallel collect partial matches at a coordinator site, and assemble them to find the answer Q( G ) in the entire G yet unavailable input a residual function Gj as the yet unavailable input as residual functions at each site, Gi as the known input

34 Coordinator 34 Coordinator: receive/post queries, control termination, and assemble answers Upon receiving a query Q post Q to all workers Initialize a status flag for each worker, mutable by the worker Terminate the computation when all flags are true Assemble partial answers from workers, and produce the final answer Q(G) Termination, partial answer assembling Each machine (site) Si is either a coordinator a worker: conduct local computation and produce partial answers 34

35 Workers 35 Worker: conduct local computation and produce partial answers upon receiving a query Q, evaluate Q( Gi ) in parallel send messages to request data for “border nodes” use local data Gi only Local computation, partial evaluation, recursion, partial answers 35 Incremental computation: upon receiving new messages M evaluate Q( Gi + M) in parallel set its flag true if no more changes to partial results, and send the partial answer to the coordinator This step repeats until the partial answer at site Si is ready With edges to other fragments Incremental computation

36 36 Subgraph isomorphism Input: A pattern graph Q and a graph G Output: All the matches of Q in G, i.e., all subgraphs of G that are isomorphic to Q a bijective function f on nodes: (u,u’ ) ∈ Q iff (f(u), f(u’)) ∈ G How to compute partial answers? partition G into fragments (G1, …, Gn), distributed to n workers Coordinator: upon receiving a pattern query Q post pattern Q to each worker set flag[i] false for each worker Si

37 Local computation at each worker 37 Worker: upon receiving a query Q, send messages to request the d-neighbour for each border node, denoted by  G d Compute partial answer Q(G i   G d ), by calling any existing subgraph isomorphism algorithm set flag[i] true, and send Q(G i   G d ) to the coordinator The algorithm is correct, why? Correctness analysis the diameter of Q Coordinator: the final answer Q(G) is simply the union of all Q(  G d ) the data locality of subgraph isomorphism No incremental computation is needed 37

38 38 Performance guarantee Partial evaluation + data-partitioned parallelism Complexity analysis: let t(|G|, |Q|) be the sequential complexity T(|G|, |Q|, n) be the parallel complexity Then T(|G|, |Q|, n) = O(t(|G|, |Q|)/n), n: the number of processors Parallel scalability: the more processors, the faster the algorithm (provided that p << |G|). Why? Reuse of existing algorithms: the worker can employ any existing sequential algorithm for subgraph isomorphism polynomial reduction Proof: computational costs + communication costs compare this with MapReduce Necessary parts for your project Correctness proof Complexity analysis Performance guarantees

39 Transitive closures TC(x, y) :- R(x, y); TC(x, z) :- TC(x, y), TC(y, z). : compared with MapReduce: minimize unnecessary recomputation Worker: compute TC( Gi ) in parallel send messages to request data for “border nodes” Represent R as a graph G: elements as nodes, and R as its edge relation partition G into fragments (G1, …, Gn), distributed to n workers given new messages M, incrementally compute TC(Gi + M) Termination: repeat until there exist no more changes incremental step 39

40 40 Summary and review What is the MapReduce framework? Pros? Pitfalls? How to implement joins in the MapReduce framework? Develop algorithms in MapReduce Vertex-centric models for querying big graphs Distributed query processing with performance guarantee by partial evaluation? What performance guarantees we want for evaluating graph queries on distributed graphs? Compare the four parallel models: MapReduce, PBS, vertex- centric, and GRAPE Correctness proof, complexity analysis and performance guarantees for your algorithm

41 Project (1) A “smart” algorithm for computing the transitive closure of R is given in “ Distributed Algorithms for the Transitive Closure” http://asingleneuron.files.wordpress.com/2013/10/distributedalgorithmsfortr ansitiveclosure.pdf Implement the algorithm in the programming models of MapReduce GRAPE Give correctness proof and complexity analysis Experimentally evaluate and compare the implementations 41 1. TC(x, y) :=  ; Q := R; 2. while Q   do a)  TC := Q TC; b) TC := Q  TC   TC; c) Q := Q Q  TC; Recall that you can implement union and join in MapReduce

42 42 Flashback: Graph simulation Input: a graph pattern graph Q and a graph G Output: Q(G) is a binary relation S on the nodes of Q and G Complexity: O((| V | + | VQ |) (| E | + | EQ| )) time each node u in Q is mapped to a node v i n G, such that (u, v) ∈ S for each (u,v) ∈ S, each edge (u,u’) in Q is mapped to an edge (v, v’ ) i n G, such that (u’,v’ ) ∈ S 42

43 43 Project (2) Develop two algorithms that, given a graph G and a pattern Q, compute Q(G) via graph simulation, based on two of the following models: MapReduce BSP (Pregel) The vertex-centric model of GraphLab GRAPE 43 Give correctness proof and complexity analysis of your algorithms Experimentally evaluate your algorithms

44 44 Project (3) Given a directed graph G and a pair of nodes (s, t) in G, the distance between s and t is the length of a shortest path from s to t. Develop two algorithms that, given a graph G, compute the shortest distance for all pair of nodes in G, based on two of the following models: MapReduce BSP (Pregel) The vertex-centric model of GraphLab GRAPE Give correctness proof and complexity analysis of your algorithms Experimentally evaluate your algorithms 44

45 45 Project (4) Show that GRAPE can be revised to support relational queries, based on what you have learned about parallel/distributed query processing. Show how to support relational operators how to optimize your queries Implement a lightweight system to support relational queries in GRAPE Basic relational operators SPC queries Demonstrate your system

46 46 Project (5) Write a survey on any of the following topics, by evaluating 5-6 representative papers/systems: Parallel models for query evaluation Distributed graph query engines Distributed database systems MapReduce platforms NoSQL systems … Demonstrate your understanding of the topics 46 Develop a set of criteria, and evaluate techniques/systems based on the criteria

47 Reading for the next week http://homepages.inf.ed.ac.uk/wenfei/publication.html 47 1. W. Fan, F. Geerts, and F. Neven. Making Queries Tractable on Big Data with Preprocessing, VLDB 2013 (BD-tractability) 2. Y. Tao, W. Lin. X. Xiao. Minimal MapReduce Algorithms (MMC) http://www.cse.cuhk.edu.hk/~taoyf/paper/sigmod13-mr.pdf 3. J. Hellerstein. The Declarative Imperative: Experiences and Conjectures in Distributed logic. SIGMOD Record 39(1), 2010 (communication complexity) http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS- 2010-90.pdf 4. Y. Cao, W. Fan, and W. Yu. Bounded Conjunctive Queries. VLDB 2014. (scale independence) 5. W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression, SIGMOD, 2012. (query-preserving compression) 6. W. Fan, X. Wang, and Y. Wu. Answering Graph Pattern Queries using Views, ICDE 2014. (query answering using views) 7. F. Afrati, V. Borkar, M. Carey, N. Poluzotis, and J. D. Ullman. Map- Reduce Extensions and Recursive Queries, EDBT 2011. https://asterix.ics.uci.edu/pub/EDBT11-afrati.pdf


Download ppt "1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by."

Similar presentations


Ads by Google