Download presentation
Presentation is loading. Please wait.
1
CSCI5570 Large Scale Data Processing Systems
NewSQL James Cheng CSE, CUHK Slide Ack.: modified based on the slides from Hefu Chai
2
Andrew Pavlo, Carlo Curino, Stanley Zdonik
Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems Andrew Pavlo, Carlo Curino, Stanley Zdonik SIGMOD 2012
3
Main Memory • Parallel • Shared-Nothing Transaction Processing
H-Store: A High-Performance, Distributed Main Memory Transaction Processing System Proc. VLDB Endow., vol. 1, iss. 2, pp , 2008.
5
Transaction Procedure Name Execution Client Application
Input Parameters Client Application Database Cluster
6
Transaction Result Client Application Database Cluster
7
OLTP Transactions Fast Repetitive Small touch a small subset
of data using index (i.e., no full table scans or large distributed joins) typically executed as pre-defined txn templates or stored procedures short-lived (i.e., no user stalls) Fast Repetitive Small
8
We need an approach that supports…
Stored Procedure Load balancing in the presence of time-varying skew Complex schemas Deployments with larger number of partitions
9
Optimal Database Design
Scalability of NewSQL depends on the existence of an optimal database design, which defines how an application’s data and workload is partitioned or replicated across nodes how queries and transactions are routed to nodes the above determines two crucial factors: the number of transactions accessing multiple nodes the skewness of the load across the cluster A growing fraction of distributed transactions and load skew => 10x worse performance (see following slides)
10
Automatic Database Design Tool
for Parallel Systems Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems SIGMOD 2012
11
What are the key issues? Two key issues when generating a good database design for enterprise OLTP applications Distributed transactions network overhead for employ two-phase commit or similar distributed consensus protocol to ensure atomicity and serializability Temporal workload skew node with skewed load becomes saturated, while other nodes are idle and clients are blocked waiting for results
12
What are the key issues? Distributed transactions
Temporal workload skew
13
Impact of distributed transactions on throughput
14
What are the key issues? Distributed transactions
Temporal workload skew
15
Temporal workload skew
Think about the example of Wikipedia Even though the average load of the cluster for the entire day is uniform, the load across the cluster for any point is unbalanced (due to difference in languages of the wiki content and time difference) Static Skew Vs. Temporal Skew
16
Impact of temporal workload skew on throughput
17
What are the key issues? A complex tradeoff: distributed transactions vs. temporal workload skew put database on a single node and execute all transactions there no distributed transactions extreme load skew execute all transactions as distributed transactions that access data at every partition total distributed transactions no load skew
18
Horticulture’s Goal Analyze Generate partitioning that
a database schema the structure of application’s stored procedures a sample transaction workload Generate partitioning that minimizes distribution overhead balances access skew
20
Two Main Technical Contributions
Maintain the tradeoff between distributed transactions and temporal skew Extend design space to include replicated secondary indexes Organically handling stored procedure routing Two Main Technical Contributions Large Neighborhood Search: automatic database partitioning Three Unique Features Skew-Aware Cost Model: coordination cost and load distribution estimation
21
What are the design options
For each table: Horizontal partition Replicate on all partitions Replicate a secondary index for a subset of its columns Effectively route incoming transaction requests
22
Horizontal Partitioning
23
Table Replication For read-only or read-mostly tables
24
Secondary Index For read-only or read-mostly columns
25
Stored Procedure Routing
26
Stored Procedure Routing
27
What are the key technique contributions
Large-Neighborhood Search Skew-Aware Cost Model
28
Large-Neighborhood Search
4. Perform local search for a new design using Drelax as starting point. Replace Dbest w/ new design with a lower cost. Restart Step 3 if k searches do not improve Dbest or no design in Drelax‘s neighborhood. 5. After running for a limited time, stop and return Dbest 3. Create a new incomplete design Drelax by relaxing (i.e., resetting) a subset of Dbest 2. Generate an initial “best” design Dbest based on the most frequently accessed columns 1. Analyze sample workload to pre-compute info used to guide the search process Database schema Stored procedures Sample workload
29
Large-Neighborhood Search
Initial Design Select the most frequently accessed column in each table as the horizontal partitioning attribute Greedily replicate read-only tables until no space left Select next most frequently accessed, read-only column as secondary index attribute for each table Select the routing parameter for stored procedures based on how often the parameters are referenced in Q (Q: queries that access columns selected in Step 1)
30
Large-Neighborhood Search
Relaxation: The process of selecting random tables in the database and resetting their chosen partitioning attributes in Dbest Allow LNS to escape a local minimum and jump to a new neighborhood of potential solutions Horticulture: decides the number of tables to relax randomly chooses which tables to relax (routing parameters of stored procedures referencing a relaxed table will also be reset) generates the candidate attributed for the relaxed tables and procedures
31
Large-Neighborhood Search
For each procedure, choose the routing parameter w/ the lowest cost, before moving down the tree. Explore the tree using branch-and-bound search, replace the table’s design option in Drelax to that of the tree node. Estimate the cost, if lower than that of Dbest, go down the tree. Local Search Phase 1 Phase 2
32
What are the key technique contributions
Large-Neighborhood Search Skew-Aware Cost Model
33
Skew-Aware Cost Model LNS relies on a cost model to estimate the cost of executing the sample workload using a given design The cost model must be able to accentuate the properties that are important in a DB be computed quickly estimate the cost of an incomplete design return a monotonically increasing cost as more variables are set when searching down the tree
34
Skew-Aware Cost Model Distributed Transactions Workload Skew Factor +
35
Skew-Aware Cost Model Tradeoff! Measure
how much workload executes as distributed transactions how uniformly load is distributed across the cluster 𝑐𝑜𝑠𝑡 𝐷, 𝑊 = 𝛼×𝐶𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑠𝑡 𝐷, 𝑊 +𝛽×𝑆𝑘𝑒𝑤𝐹𝑎𝑐𝑡𝑜𝑟(𝐷, 𝑊) 𝛼+𝛽 Tradeoff!
36
Skew-Aware Cost Model Coordinator Cost
𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 ×𝑛𝑢𝑚𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 × 𝑑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 Total number of partitions accessed divided by total number of partitions could have been accessed, and scale it based on the ratio of distributed transactions to single-partition transactions
37
Skew-Aware Cost Model ∑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠 Skew Factor
𝑖=0 𝑛𝑢𝑚𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑠𝑘𝑒𝑤[𝑖]×𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠[𝑖] ∑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠 To avoid time varying skew, divide W into finite intervals Estimate skew factor, skew[i], of each interval i Final skew factor is the mean of the skew factors weighted by the number of transactions executed in each interval
38
Incomplete Designs Estimated cost monotonically increasing!
Query that references a table with an unset attribute in a design is labeled as unknown For each unknown query Coordinator cost: assume that any unknown query is single-partitioned Skew factor: assume that unknown queries execute on all partitions in the cluster ‘Unknown’ can change to ‘known’ ‘Known’ cannot change to ‘unknown’ Estimated cost monotonically increasing!
39
Optimizations Access Graphs Workload Compression
40
Access Graph Model and store input sample workload as an access graph:
Vertex: table Edge: tables are co-accessed in a query Edge weight: the number of times the queries forming the relationship LNS uses access graph to quickly identify important relationships between tables w/o repeatedly reprocessing input sample workload
41
Optimizations Access Graphs Workload Compression
42
Workload Compression Given a larger input sample workload, LNS finds a better database design, but less efficient Solution – workload compression: Combine sets of similar queries in individual transactions into fewer weighted records Combine similar transactions into a smaller number of weighted records in the same manner The cost model scales its estimates using these weights w/o having to process each of the records separately in the original workload
43
Algorithm Comparison
44
Throughput
45
Search Times The best solution found by Horticulture over time (red line: known optimal design, if available)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.