Download presentation
Presentation is loading. Please wait.
Published byOphelia Terry Modified over 9 years ago
1
Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494
2
October 13, 20112 Example Definitions Algorithm CN Generation Sequential Algorithm CLP : Naïve CLP : New OLP DLP Performance Studies CN Evaluation CONTENTS
3
October 13, 20113 BANKS Model Author1Author2 Paper1 Author1Author2 Paper2 Steiner Trees
4
October 13, 20114 DISCOVER Model Author1Author2 Paper1 TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITES PAPERCITE Writes {} Paper {} Writes {} Joining Network Of Tuples Joining Network Of Tuple Sets Author1: Paper1 Author2: Paper1 Author1Author2 Paper2 Author1: Paper2 Author2: Paper2 Author Author1 Author Author2 Author Author1 Writes {} Paper {} Writes {} Author Author2
5
5 Background : DISCOVER October 13, 2011
6
6 Background : DISCOVER Schema Graph (TPC-H) October 13, 2011
7
Background : DISCOVER 7 Example Data Source : Discover[3] October 13, 2011
8
Background : DISCOVER 8 Query: Smith,Miller” Source : Discover[3] October 13, 2011
9
9 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 October 13, 2011
10
10 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 4 O1 C1 N1 C2 O3 Joining Network Of Tuples October 13, 2011
11
11October 5, 2011 Joining Network Of Tuple Sets Background : DISCOVER Source : Discover[2]
12
12 Background : DISCOVER October 13, 2011
13
13 Background : DISCOVER October 13, 2011
14
14 Candidate Networks Generation Complete : Every possible MTJNT is produced by a candidate network output by the algorithm Minimal : Does not produce any redundant candidate networks Example: ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller ⋈ CUSTOMER{} ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS{} ORDERS Smith ⋈ LINEITEM{} ⋈ ORDERS Miller Tmax : Maximum number of tuple sets in a CN Background : DISCOVER October 13, 2011
15
15 CN Generation October 13, 2011 Source : Discover[2]
16
16 CN Generation October 13, 2011 Source : Discover[2]
17
17 CN Generation October 13, 2011 Source : Discover[2]
18
18 CN Evaluation : October 13, 2011
19
Sequential Algorithm : Example 19 Dataset : DBLP Source : TTS[1] TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011
20
20 Source : TTS[1] Sequential Algorithm : Example TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011
21
CN Evaluation : state-of-art sequential algorithm 21October 13, 2011
22
22 Source : TTS[1] Sequential Algorithm : Execution Graph October 13, 2011
23
23 Sequential Algorithm : Execution Graph October 13, 2011
24
24 New Solution Use of multi-core architecture Why not existing parallel multi-query processing? Large number of queries Large sharing between queries Large intermediate results What we need on multi-core archs? CNs in the same core share : most computational cost CNs in different cores share : least computational cost Handle high workload skew Handle errors caused by estimation adaptively October 13, 2011
25
25 CN Level Parallelism : Straightforward Approach largest first rule : partition with the least workload Final Cost : max(cost of each core) = 1949 Source : TTS[1] October 13, 2011
26
26 CLP : Straightforward Approach Source : TTS[1] select the core : O(n) October 13, 2011
27
27 CLP: Sharing-Aware CN Partitioning Which CN to distribute first? the largest not-shared/extra cost To which partition? with maximum sharing if it does not destroy the workload balancing. Total cost for a partition = cost after sharing sub-expressions for all CNs in that partition October 13, 2011
28
APPAPP W C CWC C PPP Core 1Core 2Core 3 CNMinCost 1720 2727 3 4715 5727 6 7715 8727 9 10 100102 500510 50 MaxHeap 5 555 55555 : Non-Exec Graph of Core 3 October 13, 201128
29
APPAPP W C CWC C PPP 10 100102 500510 50 MaxHeap Core 1Core 2Core 3 CNMinCost 1610 2727 3115 4605 5115 6727 7715 8115 9 5 555 55555 October 13, 201129
30
APPP W C WC C PP Core 1Core 2Core 3 CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 10 102 510 50 5 555 55555 MaxHeap October 13, 201130
31
PPP C WC C P CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 10 102 510 50 5 555 55555 MaxHeap Core 1Core 2Core 3 October 13, 201131
32
PP WC C CNMinCost 1115 2727 3115 4 5 6727 7715 8115 9 102 510 5 555 55555 Core 1Core 2Core 3 MaxHeap October 13, 201132
33
33 CLP: Sharing-Aware CN Partitioning Source : TTS[1] October 13, 2011
34
34 CLP: Sharing-Aware CN Partitioning Source : TTS[1] Initialization October 13, 2011
35
35 CLP: Error Accumulation Source : TTS[1] October 13, 2011
36
36 Operator Level Parallelism October 13, 2011
37
37 Operator Level Parallelism Source : TTS[1] October 13, 2011
38
38 OLP : Overcoming Error Accumulation October 13, 2011
39
39 OLP : Overcoming Accumulated Cost Source : TTS[1] 643 685 October 13, 2011
40
40 Operator Level Parallelism Source : TTS[1] October 13, 2011
41
41 Data Level Parallelism each operation in GE can be performed on multiple cores uses the operation level parallelism if there is no workload skew partition data adaptively before each time workload skew happens Which node to partition? Most costly node if its dominant When to merge the sub-results? At final phase October 13, 2011
42
42 Data Level Parallelism Source : TTS[1] Core 1 Core 2Core 3 October 13, 2011
43
43 Data Level Parallelism Source : TTS[1] Divide the tuples of child node Select the child node to be partitioned Makes copies of selected child node and all its father nodes Adds corresponding edges Re-estimate October 13, 2011
44
44 Performance Studies October 13, 2011
45
45 Source : TTS[1] Performance Studies October 13, 2011
46
46 Source : TTS[1] October 13, 2011
47
47 Source : TTS[1] October 13, 2011
48
48 Source : TTS[1 ] October 13, 2011
49
References 1. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Ten Thousand SQLs: Parallel Keyword Queries Computing, Proceedings of the VLDB Endowment, Volume 3 Issue 1-2, September 2010, Singapore 2. Vagelis Hristidis, Yannis Papakonstantinou, Discover: keyword search in relational databases, VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases, Hong Kong 3. [PPT] DISCOVER: Keyword Search in Relational Databases 49October 13, 2011
50
50October 13, 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.