A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses George Candea (EPFL & Aster Data) Neoklis Polyzotis (UC Santa Cruz) Radek Vingralek (Aster Data)
Highly Concurrent Data Warehouses Data analytics is a core service of any DW. High query concurrency is becoming important. At the same time, customers need predictability. – Requirement of actual customer: Increasing concurrency from one query to 40 should not increase latency by more than 6x. 2
Shortcoming of Existing Systems DWs employ the query-at-a-time model. – Each query executes as a separate physical plan. Result: Concurrent plans contend for resources. This creates a situation of “workload fear”. 3
Our Contribution: CJOIN A novel physical operator for star queries. – Star queries arise frequently in ad-hoc analytics. Main ideas: – A single physical plan for all concurrent queries. – The plan is always ``on’’. – Deep work sharing: I/O, join processing, storage. 4
Outline Preliminaries The CJOIN operator Experimental study Conclusions 5
Setting We assume a star-schema DW. We target the class of star queries. Goal: Executing efficiently concurrent star queries. – Low latency. – Graceful scale-up. 6
Further Assumptions Fact table is too large to fit in main memory. Dimension tables are “small”. – Example from TPC-DS: 2.5GB of dimension data for 1TB warehouse. Indices and materialized views may exist. Workload is volatile. 7
Outline Preliminaries The CJOIN operator Experimental study Conclusions 8
Design Overview 9 Preprocessor Filter Distributor Filter Optimizer Conventional Query Processor CJOIN Star Queries Other Queries Query Stream
Running Example 10 Q1Q1 select COUNT(*) from F join X join Y where φ 1 (X) and ψ 1 (Y) Q2Q2 select SUM(F.m) from F join Y where ψ 2 (Y) Queries Schema Fact Table F m Dimension X Dimension Y join X and TRUE(X)
The CJOIN Operator 11 Preprocessor Filter Distributor Filter Fact Table F COUNT SUM Q1Q1 Q2Q2 Continuous Scan
The CJOIN Operator 12 Preprocessor Filter Distributor Filter Dimension X Q1Q1 Dimension Y Q 1 ∧ −Q 2 −Q1 ∧ Q2−Q1 ∧ Q2 Q1 ∧ Q2Q1 ∧ Q2 Fact Table F COUNT SUM Q1Q1 Q2Q2 Continuous Scan a a b Q 1 : a Q 2 : b Q1Q1 Q2Q2 11 * * 01 Hash Table X Q1Q1 Q2Q2 10 * * Hash Table Y Query Start
Processing Fact Tuples 13 Preprocessor Filter Distributor Filter Q1Q1 Q2Q2 11 * * 01 Q1Q1 Q2Q2 Q1Q1 Q2Q2 10 * * 00 Fact Table F Q1Q1 Q2Q2 Q1Q1 Q2Q2 COUNT SUM Q1Q1 Q2Q Q1Q1 Q2Q a a b Q 1 : a Q 2 : b Hash Table XHash Table Y Query Start 0 1 Continuous Scan
Registering New Queries 14 Preprocessor Filter Distributor Filter Dimension X Q1Q1 Q1Q1 Q2Q2 11 * * 01 Q1Q1 Q2Q2 Fact Table F Q1Q1 Q2Q2 Q1Q1 Q2Q2 COUNT SUM Q1Q1 Q2Q2 Q1Q1 Q2Q2 10 * * Q1Q1 Q2Q a a b Q 1 : a Q 2 : b Hash Table XHash Table Y Query Start Q1Q1 Q2Q2 11 * * Q3Q Q3Q Continuous Scan Q3Q3 select AVG(F.m) from F join X where φ 3 (X) join Y and TRUE(Y) select * from X where φ 3 (Χ) −Q 1 ∧ Q 3 ∧ −Q 3
Registering New Queries 15 Preprocessor Filter Distributor Filter Q1Q1 Q 2 Q 3 Fact Table F Q1Q1 Q 2 Q 3 Q1Q1 COUNT SUM Q1Q1 Q2Q2 Q1Q1 Q2Q2 10 * * Q1Q1 Q 2 Q a a b Q 1 : a Q 2 : b Hash Table XHash Table Y Query Start Q3Q c Q 3 : c Begin Q 3 AVG Q3Q Continuous Scan Q1Q1 Q2Q2 11 * * Q3Q select AVG(F.m) from F join X where φ 3 (X) join Y and TRUE(Y) c:
Properties of CJOIN Processing CJOIN enables a deep form of work sharing: – Join computation. – Tuple storage. – I/O. Computational cost per tuple is low. -Hence, CJOIN can sustain a high I/O throughput. Predictable query latency. – Continuous scan can provide a progress indicator. 16
Other Details (in the paper) Run-time optimization of Filter ordering. Updates. Implementation on multi-core systems. Extensions: – Column stores. – Fact table partitioning. – Galaxy schemata. 17 Preprocessor Distributor Filter x n
Outline Preliminaries The CJOIN operator Experimental study Conclusions 18
Experimental Methodology Systems: – CJOIN Prototype on top of Postgres. – Postgres with shared scans enabled. – Commercial system X. We use the Star Schema Benchmark (SSB). – Scale factor = 100 (100GB of data). – Workload comprises parameterized SSB queries. Hardware: – Quad-core Intel Xeon. – 8GB of shared RAM. – RAID-5 array of four 15K RPM SAS disks. 19
Effect of Concurrency 20 Throughput increases with more concurrent queries.
Response Time Predictability 21 Query latency is predictable; no more workload fear.
Influence of Data Scale 22 CJOIN is effective even for small data sets. Concurrency level: 128
Related Work Materialized views [R+95,HRU96]. Multiple query Optimization [T88]. Work Sharing. – Staged DBs [HSA05]. – Scan Sharing [F94, Z+07, Q+08]. – Aggregation [CR07]. BLINK [R+08]. Streaming database systems [M+02, B+04]. 23
Conclusions High query concurrency is crucial for DWs. Query-at-a-time leads to poor performance. Our solution: CJOIN. – Target: Class of star queries. – Deep work sharing: I/O, join, tuple storage. – Efficient realization on multi-core architectures. Experiments show an order of magnitude improvement over commercial system. 24
THANK YOU!