Phoenix We put the SQL back in NoSQL James Taylor Demos:

Slides:



Advertisements
Similar presentations
HBase and Hive at StumbleUpon
Advertisements

Introduction to NHibernate By Andrew Smith. The Basics Object Relation Mapper Maps POCOs to database tables Based on Java Hibernate. V stable Generates.
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Data Definition and Integrity Constraints
Dr. Alexandra I. Cristea CS 252: Fundamentals of Relational Databases: SQL5.
OASIS OData Technical Committee. AGENDA Introduction OASIS OData Technical Committee OData Overview Work of the Technical Committee Q&A.
Examples of Physical Query Plan Alternatives
Presented By Akin S Walter-Johnson Ms Principal PeerLabs, Inc
Introduction to SQL Tuning Brown Bag Three essential concepts.
Database Performance Tuning and Query Optimization
Index Dennis Shasha and Philippe Bonnet, 2013.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
What Happens when a SQL statement is issued?
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
Software and Services Group “Project Panthera”: Better Analytics with SQL, MapReduce and HBase Jason Dai Principal Engineer Intel SSG (Software and Services.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Introduction to Structured Query Language (SQL)
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Executing Explain Plans and Explaining Execution Plans Craig Martin 01/20/2011.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
SQLXPress Visual Query Tuner Copyright © 2014 Merlon Software Corporation.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
5/24/01 Leveraging SQL Server 2000 in ColdFusion Applications December 9, 2003 Chris Lomvardias SRA International
Just a Little PHP Programming PHP on the Server. Common Programming Language Features Comments Data Types Variable Declarations Expressions Flow of Control.
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Turbocharge SQL Performance with Oracle Database 12c Philip Moore Senior Data Architect and Developer.
Constraints Lesson 8. Skills Matrix Constraints Domain Integrity: A domain refers to a column in a table. Domain integrity includes data types, rules,
Query Processing – Implementing Set Operations and Joins Chap. 19.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
SQL pepper. Why SQL File I/O is a great deal of code Optimal file organization and indexing is critical and a great deal of code and theory implementation.
Dynamicpartnerconnections.com Development for performance Oleksandr Katrusha, Program manager
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Top 10 Entity Framework Features Every Developer Should Know
In-Memory Capabilities
Query Optimization Techniques
Indexes By Adrienne Watt.
Open Source distributed document DB for an enterprise
UFC #1433 In-Memory tables 2014 vs 2016
Cameron Blashka| Informer Implementation Specialist
Query Tuning without Production Data
Database Performance Tuning and Query Optimization
Physical Database Design
In-Memory OLTP for Database Developers
Four Rules For Columnstore Query Performance
Contents Preface I Introduction Lesson Objectives I-2
Chapter 11 Database Performance Tuning and Query Optimization
Diving into Query Execution Plans
All about Indexes Gail Shaw.
Presentation transcript:

James Taylor jtaylor@salesforce.com Phoenix We put the SQL back in NoSQL James Taylor jtaylor@salesforce.com Demos: GOC demo – popups and filters Pulse – show Splunk dashboard and talk to the process – Shriman Stats.pl for GSI and SDA - Saran

Agenda What is Phoenix? Why SQL? What is next? Q&A Completed Add stories the team is planning to work on for the next sprint – List in priority order

What is Phoenix? SQL layer on top of HBase Delivered as an embedded JDBC driver Targets low latency queries over HBase data Columns modeled as multi-part row key and key values Versioned schema repository Query engine transforms SQL into puts, delete, scans Uses native HBase APIs instead of Map/Reduce Brings the computation to the data: Aggregate, insert, delete datathrough coprocessors Push predicates through custom filters 100% Java Open source here: https://github.com/forcedotcom/phoenix Completed Add stories the team is planning to work on for the next sprint – List in priority order

Why SQL? Broaden HBase adoption Give folks an API they already know Reduce the amount of code users need to write SELECT TRUNC(date,'DAY’), AVG(cpu_usage) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY') Performance optimizations transparent to the user Aggregation Stats gathering Secondary indexing Leverage existing tooling SQL client Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys CREATE TABLE web_stat ( domain VARCHAR NOT NULL, feature VARCHAR NOT NULL, date DATE NOT NULL, usage BIGINT, active_visitor INTEGER, CONSTRAINT pk PRIMARY KEY (domain, feature, date) ); Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang custom function Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang custom function Derive class from ScalarFunction Add annotation to define name, args, and types Implement evaluate method Register function (blog on this coming soon: http://phoenix-hbase.blogspot.com/) Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Set CURRENT_SCN property on connection to earlier timestamp Queries will see only rows before timestamp Schema in-place at that point in time will be used Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Declare new new child entity as nested table Prefix column qualifier of nested entities with: table name + child primary key + child column name Restrict join to be only through parent/child relation Execute query by scanning nested child rows TBD: https:/github.com/forcedotcom/phoenix/issues/19 Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes “Salt” row key on upsert by mod-ing with cluster size Query for fully qualified key by inserting salt byte Range scan by concatenating results of scan over all possible salt bytes Or alternately Define column used for hash to derive row key prefix TBD: https://github.com/forcedotcom/phoenix/issues/74 Completed Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Surface the HBase put-and-increment functionality through the standard SQL sequence support TBD: https://github.com/forcedotcom/phoenix/issues/18 Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Support the standard SQL TABLESAMPLE clause Implement filter that uses a skip next hint Base next key on the table stats “guide posts” TBD: https://github.com/forcedotcom/phoenix/issues/22 Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Declare columns at query time Add stories the team is planning to work on for the next sprint – List in priority order

But I can’t surface x,y,z in SQL… Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Declare columns at query time SELECT col1,col2,col3 FROM my_table(col2 VARCHAR, col3 INTEGER) WHERE col3 > 10 TBD: https://github.com/forcedotcom/phoenix/issues/9 Add stories the team is planning to work on for the next sprint – List in priority order

Conclusion Phoenix fits the 80/20 use case rule Let us know what you’d like to see added Get involved – we need your help! Think about how your new feature can be surfaced in SQL Add stories the team is planning to work on for the next sprint – List in priority order

Thank you! Questions/comments?

Product Metrics HTable Query Processing Product Metrics HTable Row Key ORG_ID DATE FEATURE TXNS Key Values IO_TIME RESPONSE_TIME Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3) Filter Filter: IO_TIME > 100 Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge SELECT feature, SUM(txns) FROM product_metrics WHERE org_id = :1 AND date >= :2 AND date <= :3 AND io_time > 100 GROUP BY feature Add stories the team is planning to work on for the next sprint – List in priority order

Phoenix Query Optimizations Start/stop key of scan based on AND-ed columns Through SUBSTR, ROUND, TRUNC, LIKE Parallelized on client by chunking over start/stop key of scan Aggregation on region-servers through coprocessor Inline for GROUP BY over row key ordered columns In memory map per group otherwise WHERE clause executed through custom filters Incremental evaluation with early termination Evaluated through byte pointers IN and OR over same column (in progress) Becomes batched get or filter with next row hint Top N queries (future) Through coprocessor keeping top N rows TABLESAMPLE (future) Becomes filter with next row hint Completed Add stories the team is planning to work on for the next sprint – List in priority order

Phoenix Performance Add stories the team is planning to work on for the next sprint – List in priority order

Phoenix Performance Completed Add stories the team is planning to work on for the next sprint – List in priority order