Cindy Gross @SQLCindy http://smallbitesofbigdata.com Big Data for the SQL Eye Cindy Gross @SQLCindy http://smallbitesofbigdata.com @SQLCindy.

Slides:



Advertisements
Similar presentations
Roger Breu SQL Server PDW Solution Sales Microsoft Western Europe Microsoft Solutions for Big Data | Oct 17th 2013 From Numbers.
Advertisements

MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
BIG DATA – WHAT’S THE BIG DEAL The call would start soon, please be on mute. Thanks for your time and patience.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Big data analytics Rafal Lukawiecki Strategic Consultant Project Botticelli
SQL SERVER 2012 FOR THE NEW WORLD OF DATA Doug Leland General Manager SQL Server Marketing.
Breaking points of traditional approach What if you could handle big data?
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
An Introduction To Big Data For The SQL Server DBA.
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
Microsoft Cognitive Services and Cortana Analytics
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
IT Operations Management
Connected Infrastructure
Data Platform and Analytics Foundational Training
Data Platform Modernization
Bring the power of data to every user in your organization
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connected Living Connected Living What to look for Architecture
Data Platform and Analytics Foundational Training
Examine information management in Cortana Intelligence
Azure Machine Learning Deploying and Managing Models in production
S4 Solution Specialist Sales Summit
Microsoft Machine Learning & Data Science Summit
Orchestrating Data and Services with Azure Data Factory
Melbourne Azure Meetup
SQL 2016 new Hosting Offers Secure Database Hybrid HyperScale
Microsoft Azure: The only consistent Hybrid Cloud
Why Is My SQL DW Query Slow?
Machine Learning in practice
Enable the Hybrid Data Platform
7/4/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
S4 Solution Specialist Sales Summit
Connected Infrastructure
Building Analytics At Scale With USQL and C#
Azure ML and Cognitive Services
Data Platform and Analytics Foundational Training
IT Operations Management
Remote Monitoring solution
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Business Critical Application Platform
Add intelligence to Dynamics AX with Cortana Intelligence suite
Cloudy with a Chance of Data
HDInsight makes Hadoop Easy
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Turning back time … … to 1998.
Azure Data Catalog Adoption Patterns and Best Practices
Power Apps & Flow for Microsoft Dynamics SL
Overview of Azure Data Lake Store
Dive into Predictive Maintenance using Cortana Intelligence Suite
Data Platform Modernization
Microsoft Ignite /22/2018 3:58 PM BRK2254
Server & Tools Business
The Internet of Things (IoT) from the back-end perspective
Virtual Reality with Azure and Unity
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
HDInsight Tools for Visual Studio
TechEd /23/2019 9:35 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
*AZs available across US, Europe and Asia
Server & Tools Business
Customer 360.
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

Cindy Gross @SQLCindy http://smallbitesofbigdata.com Big Data for the SQL Eye Cindy Gross @SQLCindy http://smallbitesofbigdata.com @SQLCindy

What’s the Buzz?

Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for this?'; Nothing new here…. http://hortonworks.com/blog/hive-cheat-sheet-for-sql-users/ https://cwiki.apache.org/confluence/display/Hive/Tutorial @SQLCindy

@SQLCindy

And yet it’s more! CREATE EXTERNAL TABLE IF NOT EXISTS toDo (fun STRING, rank INT COMMENT 'rank the greatness', type STRING) COMMENT 'two tables walk into a bar....' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data/demo/'; This syntax reinforces that this is schema on read and that it’s separation of data from schema. @SQLCindy

@SQLCindy

A mix of old and new -- read some data SELECT 'you cannot make me ', score, fun, type FROM toDo WHERE score <= 0 ORDER BY score; SELECT 'when can we ', score, fun, type WHERE score > 0 DISTRIBUTE BY score SORT BY score; @SQLCindy

@SQLCindy

Visual job graph @SQLCindy

Show it off

ODBC Oh My Tableau Any ODBC compliant BI tool can use the Hive ODBC connector, this makes Hadoop/Hive “just another data source”. … And you now have new expectations around latency (fast query vs fast insight). You have easier experimentation, easier “fail fast” iteration. And/polyglot rather than “or” Enables the business user.

ODBC Oh My

Yes, Hive is just another data source

That’s Hive folks! Hive on Hadoop on HDInsight on Azure Big Data in the cloud! @SQLCindy

And the New OSS Kid on the Block Spark! %sql SELECT avg(score) AS avgScore, funtype FROM toDo WHERE score > -5000 GROUP BY funtype ORDER BY funtype @SQLCindy

Lighting a Spark

Spark SQL: Interactive

Lots of Analytics Options!

It adds up to more options Your choice in analytics Real-time, more history, fast ingestion ODBC makes Hive and Spark “just another data source” Experimentation via “fail fast” iteration Enables the business user … And new expectations around latency

Back to You Have you used Big Data? Azure? What questions do you have? What do you want to know by the end of this talk? What makes your projects go right or wrong? Will you use Big Data? @SQLCindy

Big Data Uses @SQLCindy

5/7/2018 A leading game development studio that creates, develops, produces, and publishes a number of popular video games needed to analyze large amounts of in-game data that were unstructured. They chose Azure HDInsight, Data Factory, SQL Server on-premises, Power View, Power Query to do in- game analytics and understanding what gamers do during game-play and what campaigns they can run to influence in-game purchases. Finally, twitter sentiment is collected to correlate with sales. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Game Development Company In-game Analytics Game Development Company Part 1: What They Did | In-game Analytics Gaming A predominantly mobile-based game development company. While they are a mid-sized organization, they have partnered with media giants on various gaming projects Challenge As a game development studio, they wanted to do in-game analytics to understand their players more and what they do in the games Solution Azure HDInsight (MapReduce and Storm), Service Bus, SQL Server for reporting Collects telemetry and logging data to gain in-game analytics: How many players using the game How many players invited their friends How far along did players get into the tutorial How many attempts did they make on one level/stage Media tonic

Game Development Company In-game Analytics Game Development Company Part 2: How They Did It | In-game Analytics How They Did It Collect data from games in Azure Blobs Game sends telemetry/logging data as JSON files Contains every action of user in the game Data is pushed to Azure Service Bus as real-time Tens of Gigabytes of data captured daily HDInsight picks up real-time data and processes From Service Bus, HDInsight processes using Apache Storm and MapReduce Constantly running experiments to determine insight A/B testing In-game metrics and analytics Spin up 32-node cluster nightly for four hours Output sent to SQL Server for BI Transfer data to SQL Server for BI Real-time Event Service Bus Azure HDInsight Azure Blobs BI for insights SQL Server On-premises BK1

5/7/2018 A game development studio that wanted to do in-game analytics to understand their players more and what they do in their games. They chose Azure HDInsight including Storm in HDInsight so they can do near real-time in-game analytics of their users. Now, they can understand how many players are playing, how many are referring the game, how difficult a game level is, etc. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Typical Big Data Use Cases IT infrastructure optimization Legal discovery Social network analysis Traffic flow optimization Telemetry Churn analysis Natural resource exploration Weather forecasting Healthcare outcomes Fraud detection Life sciences research Advertising analysis Equipment monitoring Smart meter monitoring Businesses using Big Data are “making it big”. They are taking advantage of all this ambient data and they’re moving ahead, gaining a foothold in new markets and gaining marketshare in existing markets. Think about how Netflix makes movie recommendations or how Google can predict a flu outbreak before the CDC does. HDInsight is very focused on the volume and variety problems. We have our RX/Stream Insight and BI stack added in to help address the solution velocity issues. Store now, question later Iterate through questions

It depends It’s mostly true

Hadoop Shines When…. Data exploration, analytics and reporting, new data-driven actionable insights Rapid iterating Unknown unknowns Flexible scaling Data driven actions for early competitive advantage or first to market Low number of direct, concurrent users Low cost data archival http://blogs.msdn.com/b/cindygross/archive/2015/02/25/master-choosing-the-right-project-for-hadoop.aspx

Hadoop Anti-Patterns…. Replace system whose pain points don’t align with Hadoop’s strengths OLTP needs adequately met by an existing system Known data with a static schema Many end users Interactive response time requirements (becoming less true) Your first Hadoop project + mission critical system http://blogs.msdn.com/b/cindygross/archive/2015/02/25/master-choosing-the-right-project-for-hadoop.aspx

You tell me… What is Big Data? What is Hadoop? What specific scenario would you use it for? @SQLCindy

Hortonworks HDP @SQLCindy

Cortana Analytics Suite Build 2015 5/7/2018 12:24 PM Cortana Analytics Suite Information Management Azure Data Factory Data Catalog Event Hub Big Data Stores Machine Learning and Analytics Dashboards and Visualizations Business apps Custom apps Sensors and devices Power BI Azure Machine Learning Personal Digital Assistant People Azure Data Lake Store Cortana Azure HDInsight (Hadoop) Perceptual Intelligence Azure Data Lake Analytics Azure SQL Data Warehouse Face, vision Speech, text Automated Systems Azure Stream Analytics Business Scenarios Recommendations, customer churn, forecasting, etc. DATA INTELLIGENCE ACTION © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure Data Lake - The Pieces @SQLCindy

Azure Data Lake: store & managed clusters Machine Learning & Data Science Conference 5/7/2018 12:24 PM Azure Data Lake: store & managed clusters On-Premises Azure Cloud Azure Data Lake managed clusters MAP Reduce Hive, Pig HBase Storm Hadoop Cluster YARN-based Compute HDFS/WebHDFS API Map Reduce Hive, Pig HBase Storm Hadoop File System YARN-based Compute Azure Data Lake store WebHDFS API Hadoop File System © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Analytics: Two Form Factors HDInsight “managed Hadoop clusters” ADLA “analytics service” n1 n2 n3 n4 Hive/Pig/etc. job HDInsight Cluster lots of containers YARN Layer ADLA Account U-SQL/Hive/Pig job 45 45 Input File Output File Storage (Blob or ADLS)

Playing in the Lake

Azure Data Lake - The Action

Azure has so much more Go straight to the business code Scale storage and compute separately Open Source Linux Managed and unmanaged services Hybrid On-demand and 24x7 options SQL Server @SQLCindy

It’s a Polyglot Stream your data into a lake Pick the best compute for each task @SQLCindy

And it’s Fun! @SQLCindy

Cindy Gross @SQLCindy http://smallbitesofbigdata.com Big Data for the SQL Eye Cindy Gross @SQLCindy http://smallbitesofbigdata.com @SQLCindy

And back to you @SQLCindy

What is Big Data? It Is Scale Out Enables elasticity Encourages exploration Faster data ingestion Lower TCO Empowers self-service BI and analytics Rapid time to insight It Is NOT A well-defined thing About volume, size A replacement for everything The answer to every problem

What is Hadoop? Conceptual View It Is A type of Big Data Just another data source A loose collection of open source code Distributed by many Handles loosely structured data Write once, read many It Is Not Actually a thing! The only way to do Big Data