Sarah Sproehnle Cloudera, Inc

Slides:



Advertisements
Similar presentations
Phoenix We put the SQL back in NoSQL James Taylor Demos:
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Drop in replacement of MySQL. Agenda MySQL branch GPL licence Maria storage engine Virtual columns FederatedX storage engine PBXT storage engine XtraDB.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 8-1 David M. Kroenke’s Chapter Eight: Database Redesign Database Processing:
Fundamentals, Design, and Implementation, 9/e Chapter 8 Database Redesign.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Physical Database Design Chapter 5 G. Green 1. Agenda Purpose Activities Fields Records Files 2.
Project Management Database and SQL Server Katmai New Features Qingsong Yao
Presented by, MySQL AB® & O’Reilly Media, Inc. Applied Partitioning and Scaling Your (OLTP) Database System Phil Hildebrand thePlatform.
Semantec Ltd. Oracle Performance Tuning Boyan Pavlov Indexes Indexes.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Prentice Hall © COS 346 Day Agenda Questions? Assignment 7 Corrected –4 A’s and 4 B’s Assignment 8 posted –Due April 6 Quiz 2 next class.
Physical Database Monitoring and Tuning the Operational System.
Database Design (for IQ-M). Introduction This section has been re-vamped for the course I have removed all the design bits that are not absolutely.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Lecture 8 Index Organized Tables Clusters Index compression
1 Physical Data Organization and Indexing Lecture 14.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 – MYSQL and PHP (Part1) SFDV3011 – Advanced Web Development 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
MySQL More… 1. More on SQL In MySQL, the Information Schema is the “Catalog” in the SQL standard SQL has three components: Data definition Data manipulation.
File and Database Design Class 22. File and database design: 1. Choosing the storage format for each attribute from the logical data model. 2. Grouping.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Managing Schema Objects
Chapter 5 Index and Clustering
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
Relational Databases and MySQL. Relational Databases Relational databases model data by storing rows and columns in tables. The power of the relational.
Page 1 © Hortonworks Inc – All Rights Reserved Hive: Data Organization for Performance Gopal Vijayaraghavan.
Cloudera Kudu Introduction
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation Chapter Eight: Database Redesign.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Query Optimization Cases. D. ChristozovINF 280 DB Systems Query Optimization: Cases 2 Executable Block 1 Algorithm using Indices (if available) Temporary.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Introduction to MySQL  Working with MySQL and MySQL Workbench.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
LAB: Web-scale Data Management on a Cloud Lab 11. Query Execution Plan 2011/05/27.
Bend SQL to Your Will With EXPLAIN Ligaya Turmelle MySQL Support Engineer
1 Copyright 2009 Sun Microsystems Inc. The World’s Most Popular Open Source Database How MySQL.com Improved their Database Performance with Query Analyzer.
Partitioning Sheeri K. Cabral Database Administrator The Pythian Group, January 12, 2009.
Practical Database Design and Tuning
Tuning Transact-SQL Queries
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
UFC #1433 In-Memory tables 2014 vs 2016
Are You Getting the Best Out of Your MySQL Indexes?
Alejandro Álvarez on behalf of the FTS team
Database Design and Implementation
MySQL Explain examples
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Physical Database Design
Practical Database Design and Tuning
Shaving of Microseconds
Intro to Relational Databases
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Presentation transcript:

Sarah Sproehnle Cloudera, Inc

Outline  Identifying slow queries  Indexing techniques  Proper schema design  EXPLAIN  Rewriting subqueries  Common mistakes

Identifying slow queries  Use the slow query log! --log-slow-queries --long-query-time --log-queries-not-using-indexes  allows microsecond granularity  5.1: SET GLOBAL slow_query_log = 1;  Do not use log_output = table  mysqldumpslow  Monitor SHOW PROCESSLIST  mytop, innotop, Enterprise Monitor (query analyzer is great, but adds latency; use selectively)  Ask around

Indexing  Most indexes are b-trees  B-tree is great for: WHERE col = x WHERE col > x WHERE col IS NULL WHERE col LIKE ‘foo%’ ORDER BY col [DESC] LIMIT n  Not useful for: WHERE function(col) = x WHERE col LIKE ‘%foo’ ORDER BY col -- without a LIMIT  Covering indexes Query: SELECT a FROM t WHERE b=‘foo’ ORDER BY c; Index: KEY(b, c, a);

Indexing continued  Creating the index can be painful (consider InnoDB plugin or replication switchover)  The order of columns in a composite index matters! KEY (a, b) will be used for: WHERE a = x WHERE a = x AND b = x WHERE a = x ORDER BY b SELECT b.. WHERE a = x but not… WHERE b = x  Hash indexes for Memory, NDB Cluster and InnoDB (adaptive) Fast and compact, but only useful for equality lookups  For InnoDB, do not append the PK to a secondary index  Watch out for duplicate indexes (mk-duplicate-key-checker)

Schema design  Normalize or denormalize?  Choose NOT NULL if possible  Choose good primary keys (keep them small!) They are often used as foreign keys; InnoDB uses the primary key as a row id  Keep your data small. Use the right data types: SMALLINT vs. INT vs. BIGINT CHAR vs. VARCHAR TIMESTAMP (4 bytes) vs. DATETIME (8 bytes) Store IP addresses as INT UNSIGNED (inet_aton) Use PROCEDURE ANALYSE()  Index a prefix of a string column: KEY(col(5))  Use RANGE partitioning, but careful which functions you use  Use replication to split read/writes

EXPLAIN  EXPLAIN SELECT… unfortunately doesn’t work on UPDATE/DELETE  Important that you use EXPLAIN on your true data set  Useful for verifying: Is an appropriate index being used? What order are the tables joined in? This is critical given MySQL’s nested join algorithm. Is a temporary table required? (“Using temporary”) Covering index or are seeks to the row(s) needed? (“Using index”)  Example: a query that looks innocuous SELECT.. WHERE idx IN (42, 101, 1024); EXPLAIN can reveal interesting results!

Rewriting subqueries  mysql> EXPLAIN SELECT name FROM Country WHERE code IN(SELECT countrycode FROM City WHERE population> )\G *************************** 1. row *************************** select_type: PRIMARY table: Country type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: Using where *************************** 2. row ********* select_type: DEPENDENT SUBQUERY This query should not be correlated! table: City type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where

Rewriting subqueries  Instead of “WHERE col IN(SELECT..)”, write a JOIN SELECT DISTINCT Country.name FROM Country JOIN City ON code = countrycode WHERE City.population > ;  In general: SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition); Can be rewritten as follows: SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition; Fixed in 5.4!

Common mistakes/General advice  Need a random row? Do not use ORDER BY rand() LIMIT 1! Consider generating a random number and doing a lookup by auto_inc column.  Avoid hints (STRAIGHT_JOIN, FORCE INDEX)  Use hints when necessary, especially pre 5.1  Don’t set sort_buffer_size extremely large  Query cache: 256MB max Careful when benchmarking (use SQL_NO_CACHE)  memcached  Move long running queries (e.g., reporting queries) to a slave

More general advice  Consider other storage engines  Materialize data into a Memory or MyISAM table. Keep it fresh with triggers or an event. Use - -init-file for Memory tables  LIMIT for paging – not always suitable App servers generally do not scale with large result sets, but evaluating LIMIT n,m over and over is painful  For batch processing, consider Hadoop: Can handle very large datasets Sqoop moves data from MySQL to Hadoop (and back) Built in reliability and scalability Don’t think in MapReduce? Use Hive

Thanks for attending! Sarah Sproehnle Cloudera, Inc