Inserts At Drive Speed Ben Haley Research Director NetQoS.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
A comparison of MySQL And Oracle Jeremy Haubrich.
A Fast Growing Market. Interesting New Players Lyzasoft.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Database Management: Getting Data Together Chapter 14.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Database Systems More SQL Database Design -- More SQL1.
A Guide to SQL, Seventh Edition. Objectives Understand, create, and drop views Recognize the benefits of using views Grant and revoke user’s database.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Chapter 4: Organizing and Manipulating the Data in Databases
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Introduction. 
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
ITEC224 Database Programming
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Lecture 9 Methodology – Physical Database Design for Relational Databases.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
August 01, 2008 Performance Modeling John Meisenbacher, MasterCard Worldwide.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007.
Relational Database CISC/QCSE 810 some materials from Software Carpentry.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Database and Data File Management Oct 6/7/8, 2010 Fall 2010 | / Recitation 2.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Methodology – Physical Database Design for Relational Databases.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Indexes and Views Unit 7.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Session 1 Module 1: Introduction to Data Integrity
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.
1 Database Fundamentals Introduction to SQL. 2 SQL Overview Structured Query Language The standard for relational database management systems (RDBMS)
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
If you have a transaction processing system, John Meisenbacher
SQL Basics Review Reviewing what we’ve learned so far…….
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Understanding Core Database Concepts Lesson 1. Objectives.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Storage and File Organization
Fundamentals of DBMS Notes-1.
Methodology – Physical Database Design for Relational Databases
Lecture 12 Lecture 12: Indexing.
Introduction to Database Systems
MANAGING DATA RESOURCES
Database management concepts
Contents Preface I Introduction Lesson Objectives I-2
Understanding Core Database Concepts
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Inserts At Drive Speed Ben Haley Research Director NetQoS

Overview  Introduction  Our problem  Why use a storage engine?  How to implement a read-only storage engine  Optimization Goal: Provide a new tool that might help solve your issues.

Who is NetQoS?  Commercial software vendor  Network Traffic Analysis o Who is on the network? o What applications are they using? o Where is the traffic going? o How is the network running? o Can the users get their work done?  Built on MySQL

Who Are Our Customers?

Problem Domain  Collected, analyzed and reported on network data  Each collector received >100k records/second  Data was stored for the top IP addresses, applications, ToS for each interface  Data was stored at 15-minute resolution  Kept data for 6 weeks – 13 months

What Did Customers Want?  Greater resolution  New ways to look at data  More detail  Use existing hardware

Key Observations  Information was in optional or temporary files  Data is unchanging  Large data volumes (100s of GB/day)  Data collectors scattered over the enterprise  Expensive to pull data to a central analysis box  Most analysis focused on short timeframes  Small subset of the data was interesting  Hierarchical data  Flexible formats

1 st Approach – Custom Service  C++ service to query data  Create a result set to pull back to reporting console  Advantages o Fast o Leveraged existing software  Issues o Not very flexible o Only access through the console UI

2 nd Approach – Traditional DB  Insert data into database  Reporting console queries database  Advantages o Easy o Somewhat flexible o Access from standard DB tools  Issues o Hard to maintain insert/delete rates o Database load operations tax CPU and I/O o Not as flexible as desired

3 rd Approach –Storage Engine  Manage data outside the database  Create storage engine to retrieve data into MySQL  Advantages o Fast o Extremely flexible o Only pay CPU and I/O overhead in queries o Access from standard DB tools  Issues o Learning curve o Multiple moving parts

What Does This Look Like? MySQL MyISAM InnoDB Archive … Custom Data Files Queries Data Collection and Management

Collector Provides  Collect data  Create data files  Age out old data  Indexing  Compression Collector manages data

MySQL Provides  Remote Connectivity  SSL Encryption  SQL Support o Queries (select) o Aggregations (group by) o Sorting (order by) o Integration with other data (join operations) o Functions o UDF Support MySQL gives us a SQL stack

Storage Engine Provides  Map data into MySQL  Provides optimization information on indexes  Efficient data extraction  Flatten data structure  Decompression Storage engine provides the glue between collector and MySQL

How To  Great document for storage engines: _Engine#Writing_a_Custom_Storage_Engine _Engine#Writing_a_Custom_Storage_Engine  I am going to concentrate on divergence

Overview of our approach  Singleton data storage  Storage engine maps to the data storage  Table schema is a view into storage  Table name for unique view  Column names map to data elements  Indices may be real or virtual

Storage Management  External process creates/removes data  Storage engine indicates the data  During query, storage engine locks data range

Simple Create Table Example CREATE TABLE `testtable` ( `Router` int(10) unsigned, `Timestamp` int(10) unsigned, `Srcaddr` int(10) unsigned, `Dstaddr` int(10) unsigned, `Inpkts` int(10) unsigned, `Inbytes` int(10) unsigned, index `routerNDX`(`Router`) ) ENGINE=NFA;

Behind the Scenes  MySQL creates a.frm file (defines table)  Storage engine validates the DDL  No data tables are created  No indices are created  Table create/delete is almost free

Validation Example – Static Format  Restricted to specific table names  Each table name maps to a subset of data  Fixed set of columns for table name  Fixed index definitions

Validation Example – Dynamic Fmt  Table name can be anything  Column names must match known definitions o Physical Columns o Virtual Columns  Indices may be real or artificial o Realized Indices o Virtual Indices

Virtual Columns  Represent alternate ways of representing data or derived values  Provides a shortcut instead of using functions  Examples: o Actual columns ipAddress – IP address ipMask – subnet CIDR mask (0-32) o Virtual columns ipMaskBits – bit pattern described by ipMask ipSubnet – ipAddress & ipMaskBits

Optimizing Columns  Our storage engine supports many columns  Storage engines have to return the entire row defined in the table  MySQL uses only columns referenced in select statement  Table acts as view, so make view narrower

Index Optimization Options  MySQL parser o Define indices in schema o Provide guidance to MySQL  Roll your own o Limits table interoperability o Best left to the experts

Real Index  Expose the internal file format  Example: o Data organized by timestamp, srcAddr o Query: select router, count(*) where srcAddr=inet_aton( ) and timestamp > ‘ ’; o Add index timestamp o Storage engine walks data by timestamp

Real Index #2  Example: o Data organized by timestamp, srcAddr o Query: select router, count(*) where srcAddr=inet_aton( ) and timestamp > ‘ ’; o Add index timestamp, srcAddr o Storage engine still walks data by timestamp Database is unable to leverage the full index!

Virtual Index  Not completely supported, but good enough  Example: o Data organized by timestamp, srcAddr o Query: select router, count(*) where srcAddr=inet_aton( ) and timestamp > ‘ ’; o Add index srcAddr, timestamp o Storage engine still walks data by timestamp, but filters on srcAddr o Would fail on range scan of srcAddr

Virtual Index #2  Example: o Data organized by timestamp, srcAddr o Query: select router, count(*) where srcAddr=inet_aton( ) and timestamp > ‘ ’; o Add index timestamp o Add index srcAddr o Storage engine still walks data by timestamp, but filters on srcAddr

Index Optimization  Leverage storage format  Add virtual index support where helpful  Don’t overanalyze o Be accurate if fast o Estimates are fine o Heuristics are often great o Be careful about mixing approaches

Index Heuristics Example  Start with large estimate for number of rows returned  Adjust estimate based on expected value of column constraints o Time – great – files are organized by time o srcAddr Good for equality Terrible for range o Bytes – poor

Typical Query Pattern  Create temp table o Specify only necessary columns o Specify optimal indices for where clause and engine  Select …  Drop table

KISS  Only support what you have to: o Do you need multiple datasets? o Do you need flexible table definitions? o Do you need insert/delete/alter support? o How will data be accessed? It’s OK to limit functionality to just solving your problem

Conclusion  Why we used a storage engine  Storage engine pattern  Optimization o Columns o Indices o Virtual indices

Application  Log analysis  Transaction records  ETL alternative  Custom database

Questions