Download presentation
Presentation is loading. Please wait.
Published byMelvyn McBride Modified over 8 years ago
1
An Introduction To Big Data For The SQL Server DBA
2
A little about me…. dbenoit@sqlsentry.com david.k.benoit@gmail.com dbenoit@sqlsentry.com david.k.benoit@gmail.com
3
Goals Challenge our status quo Define what “Big Data” is? Cover some key technologies Microsoft and Azure Potential use cases
4
A challenge! Think
5
Why Not SQL Server? How is it being used? - Type of Data Immutable Mutable Type of activity The need to process differently Getting it right – Responsible data placement
7
Why Big Data? – Internet of Things (IOT)
8
What Is Big Data? - Data Challenges “data sets that are too large and complex to manipulate or interrogate with standard methods or tools” – Oxford Dictionary Factors driving the need for big data technology Competitive advantage Decision making Value of data or devaluation of data Cost of scaling previously utilized solutions
9
What Is Big Data? - Data Challenges Challenges faced Faster access to data sets that are ever increasing in size and frequency of receipt Can’t sustain ETL processing Real time analysis is critical (predictive analytics) All new programming languages to get this data (sometimes many) Challenges answered ETL can be removed (schema on read – note other challenges introduced here) Large Massively Parallel Processing (MPP) systems that can scale out on commodity hardware
10
What is Big Data? Standard Data (OLTP / OLAP) Big Data Structured / ProcessedDataStructured / Semi Structured / Unstructured Schema on WriteProcessing / Querying Schema on read Less agile – more up front development AgilityMore agile – allows for dynamic changes Business professionals / applications UsersData Scientists / BI professionals
11
Big Data vs OLTP Data is distributed across many nodes Eventually consistent Concurrency is not managed the same Can potentially be solved with caching (Splice Machine one example) Better to combine a “real” OLTP solution with your big data solution
12
Basics of Big Data Technologies Data storage – Hadoop / HDFS, Azure Data Lake, Azure Blob, APS / SQL Data Warehouse Machine learning – “R”, Azure ML Search-based apps – Solr, Azure Search Real-time analytics tools – Storm, Kafka, MS Event Hub, Stream Analytics Visualization – Data Zen, Power BI, Tableau…
13
Big Data Technologies - Hadoop Hortonworks Standard distribution on-prem and cloud HDInsight in Azure Cloudera MapR Others
14
Big Data Technologies – Hadoop (Hortonworks distribution)
15
1.Take a large problem and divide it into sub-problems 2.Perform the same function on all sub-problems 3.Combine the output from all sub-problems DoWork() … … … Output MAP REDUCE MapReduce in Hadoop (From David DeWitt’s Presentation at PASS 2012)
16
Big Data Support In Azure
17
Big Data – Azure Data Lake
18
Store – WebHDFS solution – Cosmos Massive Scale Out Built for Analytics Currently accessible through U-SQL (Think C# and T-SQL combined) Consider how else this might be available
19
Big Data – Azure Data Lake Analytics Queries as a Service using U-SQL Massively Scaled Out Computation Abstracted Storage Optimized For You
20
Big Data – Azure HDInsight HDInsight = Hortonworks Hadoop In the Cloud Abstracted storage – allows compute to be more dynamic
21
Big Data - Azure SQL Data Warehouse Cloud Based Scale out - Elastic Massive Data Volumes (Ingest > 10 TB / hour) Relational and Non-Relational Leverage T-SQL
22
Azure SQL Data Warehouse – Architecture
23
Event Hubs & Stream Analytics
24
Big Data Support – SQL Server 2016 PolyBase!
25
PolyBase Parallel Data Transfers (David DeWitt’s Presentation at PASS 2012) SQL Server … PolyBase Cluster DN Hadoop Cluster
26
Big Data Support – SQL Server 2016 - PolyBase Seamless integration to key big data solutions T-SQL through PolyBase queries HDFS Import & export data from HDFS / Azure blob storage from / to SQL Server Seamless BI integration No need to learn MapReduce, etc
27
Big Data Support – SQL Server 2016 - PolyBase Computational scale-out leverages parallel query execution framework developed for PDW and Azure SQL DW Definition to Schema on Read in SQL Server Easily allow querying of new data – no ETL Statistics for Hadoop data
28
Big Data - Potential Use Cases Data Warehouse Source Real Time Analytics Banking fraud detection Insurance Medical device metrics Grocery Long Term Storage – Cheaper Search
29
Takeaways Reconsider data in our environment Determine if SQL Server is proper for all data Install a sandbox / Azure resources
30
Questions?
31
Why Big Data? Because we LOVE data!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.