CSCI5570 Large Scale Data Processing Systems

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Chapter 13 (Web): Distributed Databases
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
FLANN Fast Library for Approximate Nearest Neighbors
Distributed Databases
Systems analysis and design, 6th edition Dennis, wixom, and roth
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
@andy_pavlo Automatic Database Partitioning in Parallel OLTP Systems SIGMOD May 22 nd, 2012.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
Practical Database Design and Tuning
CS 540 Database Management Systems
Parallel Databases.
Efficient Join Query Evaluation in a Parallel Database System
Physical Database Design and Performance
CSE-291 Cloud Computing, Fall 2016 Kesden
CSCI5570 Large Scale Data Processing Systems
Interquery Parallelism
COMP 430 Intro. to Database Systems
Physical Database Design for Relational Databases Step 3 – Step 8
Chapter 19: Distributed Databases
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Evaluation of Relational Operations: Other Operations
April 30th – Scheduling / parallel
CSCI1600: Embedded and Real Time Software
CPSC 531: System Modeling and Simulation
K Nearest Neighbor Classification
Predictive Performance
NoSQL Databases An Overview
Automatic Physical Design Tuning: Workload as a Sequence
CS222P: Principles of Data Management Notes #11 Selection, Projection
Physical Database Design
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Practical Database Design and Tuning
Indexing and Hashing Basic Concepts Ordered Indices
Selected Topics: External Sorting, Join Algorithms, …
HStore: A High Performance, Distributed Main Memory Transaction Processing System Authors: Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo,
Database management concepts
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Implementation of Relational Operations
Chapter 11 Database Performance Tuning and Query Optimization
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Database System Architectures
CSCI1600: Embedded and Real Time Software
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

CSCI5570 Large Scale Data Processing Systems NewSQL James Cheng CSE, CUHK Slide Ack.: modified based on the slides from Hefu Chai

Andrew Pavlo, Carlo Curino, Stanley Zdonik Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems Andrew Pavlo, Carlo Curino, Stanley Zdonik SIGMOD 2012

Main Memory • Parallel • Shared-Nothing Transaction Processing H-Store: A High-Performance, Distributed Main Memory Transaction Processing System Proc. VLDB Endow., vol. 1, iss. 2, pp. 1496-1499, 2008.

Transaction Procedure Name Execution Client Application Input Parameters Client Application Database Cluster

Transaction Result Client Application Database Cluster

OLTP Transactions Fast Repetitive Small touch a small subset of data using index (i.e., no full table scans or large distributed joins) typically executed as pre-defined txn templates or stored procedures short-lived (i.e., no user stalls) Fast Repetitive Small

We need an approach that supports… Stored Procedure Load balancing in the presence of time-varying skew Complex schemas Deployments with larger number of partitions

Optimal Database Design Scalability of NewSQL depends on the existence of an optimal database design, which defines how an application’s data and workload is partitioned or replicated across nodes how queries and transactions are routed to nodes the above determines two crucial factors: the number of transactions accessing multiple nodes the skewness of the load across the cluster A growing fraction of distributed transactions and load skew => 10x worse performance (see following slides)

Automatic Database Design Tool for Parallel Systems Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems SIGMOD 2012

What are the key issues? Two key issues when generating a good database design for enterprise OLTP applications Distributed transactions network overhead for employ two-phase commit or similar distributed consensus protocol to ensure atomicity and serializability Temporal workload skew node with skewed load becomes saturated, while other nodes are idle and clients are blocked waiting for results

What are the key issues? Distributed transactions Temporal workload skew

Impact of distributed transactions on throughput

What are the key issues? Distributed transactions Temporal workload skew

Temporal workload skew Think about the example of Wikipedia Even though the average load of the cluster for the entire day is uniform, the load across the cluster for any point is unbalanced (due to difference in languages of the wiki content and time difference) Static Skew Vs. Temporal Skew

Impact of temporal workload skew on throughput

What are the key issues? A complex tradeoff: distributed transactions vs. temporal workload skew put database on a single node and execute all transactions there no distributed transactions extreme load skew execute all transactions as distributed transactions that access data at every partition total distributed transactions no load skew

Horticulture’s Goal Analyze Generate partitioning that a database schema the structure of application’s stored procedures a sample transaction workload Generate partitioning that minimizes distribution overhead balances access skew

Two Main Technical Contributions Maintain the tradeoff between distributed transactions and temporal skew Extend design space to include replicated secondary indexes Organically handling stored procedure routing Two Main Technical Contributions Large Neighborhood Search: automatic database partitioning Three Unique Features Skew-Aware Cost Model: coordination cost and load distribution estimation

What are the design options For each table: Horizontal partition Replicate on all partitions Replicate a secondary index for a subset of its columns Effectively route incoming transaction requests

Horizontal Partitioning

Table Replication For read-only or read-mostly tables

Secondary Index For read-only or read-mostly columns

Stored Procedure Routing

Stored Procedure Routing

What are the key technique contributions Large-Neighborhood Search Skew-Aware Cost Model

Large-Neighborhood Search 4. Perform local search for a new design using Drelax as starting point. Replace Dbest w/ new design with a lower cost. Restart Step 3 if k searches do not improve Dbest or no design in Drelax‘s neighborhood. 5. After running for a limited time, stop and return Dbest 3. Create a new incomplete design Drelax by relaxing (i.e., resetting) a subset of Dbest 2. Generate an initial “best” design Dbest based on the most frequently accessed columns 1. Analyze sample workload to pre-compute info used to guide the search process Database schema Stored procedures Sample workload

Large-Neighborhood Search Initial Design Select the most frequently accessed column in each table as the horizontal partitioning attribute Greedily replicate read-only tables until no space left Select next most frequently accessed, read-only column as secondary index attribute for each table Select the routing parameter for stored procedures based on how often the parameters are referenced in Q (Q: queries that access columns selected in Step 1)

Large-Neighborhood Search Relaxation: The process of selecting random tables in the database and resetting their chosen partitioning attributes in Dbest Allow LNS to escape a local minimum and jump to a new neighborhood of potential solutions Horticulture: decides the number of tables to relax randomly chooses which tables to relax (routing parameters of stored procedures referencing a relaxed table will also be reset) generates the candidate attributed for the relaxed tables and procedures

Large-Neighborhood Search For each procedure, choose the routing parameter w/ the lowest cost, before moving down the tree. Explore the tree using branch-and-bound search, replace the table’s design option in Drelax to that of the tree node. Estimate the cost, if lower than that of Dbest, go down the tree. Local Search Phase 1 Phase 2

What are the key technique contributions Large-Neighborhood Search Skew-Aware Cost Model

Skew-Aware Cost Model LNS relies on a cost model to estimate the cost of executing the sample workload using a given design The cost model must be able to accentuate the properties that are important in a DB be computed quickly estimate the cost of an incomplete design return a monotonically increasing cost as more variables are set when searching down the tree

Skew-Aware Cost Model Distributed Transactions Workload Skew Factor +

Skew-Aware Cost Model Tradeoff! Measure how much workload executes as distributed transactions how uniformly load is distributed across the cluster 𝑐𝑜𝑠𝑡 𝐷, 𝑊 = 𝛼×𝐶𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑠𝑡 𝐷, 𝑊 +𝛽×𝑆𝑘𝑒𝑤𝐹𝑎𝑐𝑡𝑜𝑟(𝐷, 𝑊) 𝛼+𝛽 Tradeoff!

Skew-Aware Cost Model Coordinator Cost 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 ×𝑛𝑢𝑚𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 × 1.0+ 𝑑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 Total number of partitions accessed divided by total number of partitions could have been accessed, and scale it based on the ratio of distributed transactions to single-partition transactions

Skew-Aware Cost Model ∑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠 Skew Factor 𝑖=0 𝑛𝑢𝑚𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑠𝑘𝑒𝑤[𝑖]×𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠[𝑖] ∑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠 To avoid time varying skew, divide W into finite intervals Estimate skew factor, skew[i], of each interval i Final skew factor is the mean of the skew factors weighted by the number of transactions executed in each interval

Incomplete Designs Estimated cost monotonically increasing! Query that references a table with an unset attribute in a design is labeled as unknown For each unknown query Coordinator cost: assume that any unknown query is single-partitioned Skew factor: assume that unknown queries execute on all partitions in the cluster ‘Unknown’ can change to ‘known’ ‘Known’ cannot change to ‘unknown’ Estimated cost monotonically increasing!

Optimizations Access Graphs Workload Compression

Access Graph Model and store input sample workload as an access graph: Vertex: table Edge: tables are co-accessed in a query Edge weight: the number of times the queries forming the relationship LNS uses access graph to quickly identify important relationships between tables w/o repeatedly reprocessing input sample workload

Optimizations Access Graphs Workload Compression

Workload Compression Given a larger input sample workload, LNS finds a better database design, but less efficient Solution – workload compression: Combine sets of similar queries in individual transactions into fewer weighted records Combine similar transactions into a smaller number of weighted records in the same manner The cost model scales its estimates using these weights w/o having to process each of the records separately in the original workload

Algorithm Comparison

Throughput

Search Times The best solution found by Horticulture over time (red line: known optimal design, if available)