Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services.

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

Business Intelligence Simon Pease. Experience with BI Developing end-to-end BI prototype for Plan International Developing end-to-end BI prototype for.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Big Data Working with Terabytes in SQL Server Andrew Novick
Technical BI Project Lifecycle
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
A Fast Growing Market. Interesting New Players Lyzasoft.
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica GAMIFIED REWARDS
Using Continuous ETL with Real-Time Queries to Eliminate MySQL Bottlenecks April 2009.
SQL Server 2008 R2 StreamInsight Complex Event Processing Event Stream Processing.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
HOL9396: Oracle Event Processing 12c
Real-Time Business Intelligence with SQL Server 2005 Analysis Services.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
Performance and Scalability. Performance and Scalability Challenges Optimizing PerformanceScaling UpScaling Out.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Making Every Bit Count in Wide Area Analytics Ariel Rabkin Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai 1.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
SQL Server 2008 R2 for the DBA Patrick LeBlanc. Objectives  New Editions  Datacenter  Parallel Data Warehouse  Multi-server management  Utility Control.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer.
Developer TECH REFRESH 15 Junho 2015 #pttechrefres h Understand your end-users and your app with Application Insights.
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
PANEL SENIOR BIG DATA ARCHITECT BD-COE
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Your Data Any Place, Any Time Performance and Scalability.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Self-Service Data Integration with Power Query Stéphane Fréchette.
Execution Plans Detail From Zero to Hero İsmail Adar.
Database Development with SQL Server Data Tools (SSDT) Björn Eriksen, Architect Evangelist DPE Microsoft
Microsoft Ignite /28/2017 6:07 PM
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
نمايندگي استان يزد. نمايندگي استان يزد طراحی کسب و کار الکترونیکی ارائه کننده : محسن افسر قره باغ.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Supervisor : Prof . Abbdolahzadeh
Data Platform and Analytics Foundational Training
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Solving the Hard Problems
Data Warehouse in the Cloud – Marketing or Reality?
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Applying Data Warehouse Techniques
A developers guide to Azure SQL Data Warehouse
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Blazing-Fast Performance:
Welcome! Power BI User Group (PUG)
A developers guide to Azure SQL Data Warehouse
Welcome! Power BI User Group (PUG)
Welcome to SQL Saturday Denmark
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Building a Threat-Analytics Multi-Region Data Lake on AWS
Applying Data Warehouse Techniques
Claus Busk Andersen Program Manager BI Microsoft Business Solutions
Applying Data Warehouse Techniques
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

The Data Crunch » Data volumes rising fast » Human-originated data (e.g. e-commerce purchases) rising fast » Machine-generated data (e.g. e-commerce events and network packets) rising even faster » Every business needs answers with lower latency » Every significant problem is distributed: » Geographically distributed organizations » Multiple boxes for scale » Exploit multiple cores

Data management is hard » If you make a mistake, the system won’t be fast enough » Can’t afford to lose data » New technologies are very difficult to use » MapReduce » NoSQL » Multi-threaded programming in Java, C++, Erlang, Scala, … » Collaborate, interoperate, evolve

Today’s Computing Model 4 Databas e $$$$ Transaction Processing Application Infrastructu re Application Log file Real-Time Application Business Events Business Transactions Batch Load Process Polling

Stream Computing Model 5 Databas e $$ Transaction Processing Application Infrastructu re Application Real-Time Application Stream Process or Business Events Business Transactions Real-Time Answers Traditional BI Application

Case study: Mozilla

Demo: Mozilla downloads

SQL – life in the old dinosaur yet » Widely spoken » Rich » Orthogonal » Declarative » Tune your system without changing your logical schema » Apps don’t interfere with each other » Adaptive » Route around failure » Exploit available resources » Make tradeoffs to meet QoS goals

Streaming SQL: example #1 Tweets about this conference: » SELECT STREAM ROWTIME, author, text FROM Tweets WHERE text LIKE ‘%#MySQL%'

Demo: studio & simple query

Streaming SQL basics » Streams: » CREATE STREAM Tweets ( author VARCHAR(20), text VARCHAR(140)); » Relational operators have streaming counterparts: » Project (SELECT) » Filter (WHERE) » Union » Join » Aggregation (GROUP BY) » Windowed aggregation (e.g. SUM(x) OVER window) » Sort (ORDER BY)

Streaming SQL: example #2 » Each minute, return the number of clicks on each web page: » SELECT STREAM ROWTIME, uri, COUNT(*) FROM PageRequests GROUP BY FLOOR(ROWTIME TO MINUTE), uri

Streaming SQL: Time » ROWTIME pseudo-column » Provided by source application or generated by system » WINDOW » Present in regular SQL (e.g. SQL:2003) but more important in streaming SQL » Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed aggregation » Monotonicity (“sortedness”) » Prerequisite for certain streaming operations

Streaming SQL: example #3 Find all orders from New York that shipped within an hour: » CREATE VIEW compliant_orders AS SELECT STREAM * FROM orders OVER sla JOIN shipments ON orders.id = shipments.orderid WHERE city = 'New York' WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)

Streaming SQL: other stuff » Schemas, views, tables » Ability to nest queries » User-defined functions and transforms » Adapters make external systems look like read/write streams

Streaming SQL: example #4 Find all stock trades where the average price over the last ten trades is two standard deviations higher than the average over the last hour: » SELECT STREAM * FROM ( SELECT STREAM ticker, price, volume, AVG(price) OVER lastHour AS avgHr, STDDEV(price) OVER lastHour AS stddevHr, AVG(price) OVER lastTenTrades AS avg10 FROM Trades WINDOW lastTenTrades AS (PARTITION BY ticker ROWS 10 PRECEDING)), lastHour AS (PARTITION BY ticker RANGE INTERVAL ’1’ HOUR PRECEDING) WHERE avg10 > avgHr + 2 * stddevHr

Streaming SQL for business intelligence Conventional BI: » Star schema: » Fact table » Dimension tables » Aggregate tables » Data warehouse populated using an ETL process » OLAP servers (e.g. Mondrian) provide a top-down view of data Challenge: » Keep all of these systems up to date in real time » Alert when key metrics are outside acceptable range

ETL Process for OLAP OLAP Operational database Data warehouse Conventional ETL Aggregate tables populated from DW OLAP cache flushed after load

Real-time OLAP: Challenges OLAP imperatives Highly aggregated data – e.g. one number computed from 10M rows Therefore: 1.Use a cache 2.Materialize results as aggregates Real-time imperatives View latest version of the data Maintaining N aggregates requires ~N blocks of I/O per incoming row Therefore: 1.Don’t use a cache 2.Don’t maintain aggregates

Real-time OLAP: Solutions 1.Notify cache when underlying data has changed 1.Populate cache from data warehouse 2.Continuous ETL process 2.Build aggregates in memory 1.Flush to disk intermittently 2.OLAP engine looks for aggregates in memory first

Continuous ETL for Real-time OLAP OLAP Operational database Data warehouse SQLstream Aggregate tables populated incrementally OLAP cache flushed proactively

Demo: » Use mozilla data, show raw parsed input with geoip stuff added, per-second rollups per country. » In MySQL, show an aggregate table growing.

Summary 1.Data problems are getting harder 2.People are trying – and failing – to solve these problems with SQL databases 3.Stream computing is a powerful new kind of platform 4.Streaming SQL is pragmatic and powerful

Any questions?

Thank you for attending! Further reading: » “Data in Flight” by Julian Hyde (Communications of the ACM, Vol. 53 No. 1, Pages 48-52)Data in Flight Blog: