An Introduction To Big Data For The SQL Server DBA.

Slides:



Advertisements
Similar presentations
Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
Advertisements

Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Microsoft Ignite /16/2017 5:47 PM
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Tyson Condie.
Introduction to Hadoop and HDFS
An Introduction to HDInsight June 27 th,
Modern Data Warehouse: Microsoft APS Alain Dormehl June 2015.
Information managers are seeking innovative DBMS’s which are able to handle large data volumes in new ways or to optimize existing products and processes.
PolyBase in SQL Server 16 David J. DeWitt Rimma V. Nehme
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Zhangxi Lin Texas Tech University
Streaming Relational Internal & external Non-relational NoSQL MobileReports Natural language queryDashboardsApplications Orchestration Machine learningModeling.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Hadoop and the Modern Data Architecture
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
PolyBase Query Hadoop with ease Sahaj Saini SQL Server, Microsoft.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
Andy Roberts Data Architect
SQL Server Evolution New innovations Jen Underwood Sr. Program Manager of Business Intelligence & Analytics Microsoft George Walters Sr. Technical Solutions.
Modern Data Warehousing Symmetric Multi-Processing SQL (SMP) vs Massive Parallel Processing SQL (MPP) Alain Dormehl P-Cubed Session Level : Intermediary.
Making Data Work for Everyone Gordon Phillips May 28, 2014.
Data Warehousing The Easy Way with AWS Redshift
Apache Hadoop on Windows Azure Avkash Chauhan
PolyBase Query Hadoop with ease Sahaj Saini Program Manager, Microsoft.
©2015 DesignMind. All Rights Reserved.. 2 About DesignMind.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Microsoft Partner since 2011
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
Microsoft Ignite /28/2017 6:07 PM
A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Business Insights Play briefing deck.
Connected Infrastructure
Big Data Analytics on Large Scale Shared Storage System
Data Platform and Analytics Foundational Training
SAS users meeting in Halifax
Connected Living Connected Living What to look for Architecture
Smart Building Solution
Connected Maintenance Solution
Welcome! Power BI User Group (PUG)
Parcel Tracking Solution Parcel Tracking What to look for Architecture
Smart Building Solution
Optimizing Edge-Cloud IoT Applications for Performance and Cost
Connected Maintenance Solution
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Connected Infrastructure
Building Analytics At Scale With USQL and C#
Remote Monitoring solution
Cloudy with a Chance of Data
What is the Azure SQL Datawarehouse?
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Microsoft Connect /22/2018 9:50 PM
Managing batch processing Transient Azure SQL Warehouse Resource
Overview of big data tools
Azure Data Lake for First Time Swimmers
Big-Data Analytics with Azure HDInsight
Moving your on-prem data warehouse to cloud. What are your options?
Introduction to Azure Data Lake
Customer 360.
SQL Server 2019 Bringing Apache Spark to SQL Server
Architecture of modern data warehouse
Presentation transcript:

An Introduction To Big Data For The SQL Server DBA

A little about me….

Goals  Challenge our status quo  Define what “Big Data” is?  Cover some key technologies  Microsoft and Azure  Potential use cases

A challenge! Think

Why Not SQL Server?  How is it being used? - Type of Data  Immutable  Mutable  Type of activity  The need to process differently  Getting it right – Responsible data placement

Why Big Data? – Internet of Things (IOT)

What Is Big Data? - Data Challenges  “data sets that are too large and complex to manipulate or interrogate with standard methods or tools” – Oxford Dictionary  Factors driving the need for big data technology  Competitive advantage  Decision making  Value of data or devaluation of data  Cost of scaling previously utilized solutions

What Is Big Data? - Data Challenges  Challenges faced  Faster access to data sets that are ever increasing in size and frequency of receipt  Can’t sustain ETL processing  Real time analysis is critical (predictive analytics)  All new programming languages to get this data (sometimes many)  Challenges answered  ETL can be removed (schema on read – note other challenges introduced here)  Large Massively Parallel Processing (MPP) systems that can scale out on commodity hardware

What is Big Data? Standard Data (OLTP / OLAP) Big Data Structured / ProcessedDataStructured / Semi Structured / Unstructured Schema on WriteProcessing / Querying Schema on read Less agile – more up front development AgilityMore agile – allows for dynamic changes Business professionals / applications UsersData Scientists / BI professionals

Big Data vs OLTP  Data is distributed across many nodes  Eventually consistent  Concurrency is not managed the same  Can potentially be solved with caching (Splice Machine one example)  Better to combine a “real” OLTP solution with your big data solution

Basics of Big Data Technologies  Data storage – Hadoop / HDFS, Azure Data Lake, Azure Blob, APS / SQL Data Warehouse  Machine learning – “R”, Azure ML  Search-based apps – Solr, Azure Search  Real-time analytics tools – Storm, Kafka, MS Event Hub, Stream Analytics  Visualization – Data Zen, Power BI, Tableau…

Big Data Technologies - Hadoop  Hortonworks  Standard distribution on-prem and cloud  HDInsight in Azure  Cloudera  MapR  Others

Big Data Technologies – Hadoop (Hortonworks distribution)

1.Take a large problem and divide it into sub-problems 2.Perform the same function on all sub-problems 3.Combine the output from all sub-problems DoWork() … … … Output MAP REDUCE MapReduce in Hadoop (From David DeWitt’s Presentation at PASS 2012)

Big Data Support In Azure

Big Data – Azure Data Lake

 Store – WebHDFS solution – Cosmos  Massive Scale Out  Built for Analytics  Currently accessible through U-SQL (Think C# and T-SQL combined)  Consider how else this might be available

Big Data – Azure Data Lake  Analytics  Queries as a Service using U-SQL  Massively Scaled Out Computation  Abstracted Storage  Optimized For You

Big Data – Azure HDInsight  HDInsight = Hortonworks Hadoop In the Cloud  Abstracted storage – allows compute to be more dynamic

Big Data - Azure SQL Data Warehouse  Cloud Based  Scale out - Elastic  Massive Data Volumes (Ingest > 10 TB / hour)  Relational and Non-Relational  Leverage T-SQL

Azure SQL Data Warehouse – Architecture

Event Hubs & Stream Analytics

Big Data Support – SQL Server 2016 PolyBase!

PolyBase Parallel Data Transfers (David DeWitt’s Presentation at PASS 2012) SQL Server … PolyBase Cluster DN Hadoop Cluster

Big Data Support – SQL Server PolyBase  Seamless integration to key big data solutions  T-SQL through PolyBase queries HDFS  Import & export data from HDFS / Azure blob storage from / to SQL Server  Seamless BI integration  No need to learn MapReduce, etc

Big Data Support – SQL Server PolyBase  Computational scale-out leverages parallel query execution framework developed for PDW and Azure SQL DW  Definition to Schema on Read in SQL Server  Easily allow querying of new data – no ETL  Statistics for Hadoop data

Big Data - Potential Use Cases  Data Warehouse Source  Real Time Analytics  Banking fraud detection  Insurance  Medical device metrics  Grocery  Long Term Storage – Cheaper  Search

Takeaways  Reconsider data in our environment  Determine if SQL Server is proper for all data  Install a sandbox / Azure resources

Questions?

Why Big Data? Because we LOVE data!