Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction To Big Data For The SQL Server DBA.

Similar presentations


Presentation on theme: "An Introduction To Big Data For The SQL Server DBA."— Presentation transcript:

1 An Introduction To Big Data For The SQL Server DBA

2 A little about me…. dbenoit@sqlsentry.com david.k.benoit@gmail.com dbenoit@sqlsentry.com david.k.benoit@gmail.com

3 Goals  Challenge our status quo  Define what “Big Data” is?  Cover some key technologies  Microsoft and Azure  Potential use cases

4 A challenge! Think

5 Why Not SQL Server?  How is it being used? - Type of Data  Immutable  Mutable  Type of activity  The need to process differently  Getting it right – Responsible data placement

6

7 Why Big Data? – Internet of Things (IOT)

8 What Is Big Data? - Data Challenges  “data sets that are too large and complex to manipulate or interrogate with standard methods or tools” – Oxford Dictionary  Factors driving the need for big data technology  Competitive advantage  Decision making  Value of data or devaluation of data  Cost of scaling previously utilized solutions

9 What Is Big Data? - Data Challenges  Challenges faced  Faster access to data sets that are ever increasing in size and frequency of receipt  Can’t sustain ETL processing  Real time analysis is critical (predictive analytics)  All new programming languages to get this data (sometimes many)  Challenges answered  ETL can be removed (schema on read – note other challenges introduced here)  Large Massively Parallel Processing (MPP) systems that can scale out on commodity hardware

10 What is Big Data? Standard Data (OLTP / OLAP) Big Data Structured / ProcessedDataStructured / Semi Structured / Unstructured Schema on WriteProcessing / Querying Schema on read Less agile – more up front development AgilityMore agile – allows for dynamic changes Business professionals / applications UsersData Scientists / BI professionals

11 Big Data vs OLTP  Data is distributed across many nodes  Eventually consistent  Concurrency is not managed the same  Can potentially be solved with caching (Splice Machine one example)  Better to combine a “real” OLTP solution with your big data solution

12 Basics of Big Data Technologies  Data storage – Hadoop / HDFS, Azure Data Lake, Azure Blob, APS / SQL Data Warehouse  Machine learning – “R”, Azure ML  Search-based apps – Solr, Azure Search  Real-time analytics tools – Storm, Kafka, MS Event Hub, Stream Analytics  Visualization – Data Zen, Power BI, Tableau…

13 Big Data Technologies - Hadoop  Hortonworks  Standard distribution on-prem and cloud  HDInsight in Azure  Cloudera  MapR  Others

14 Big Data Technologies – Hadoop (Hortonworks distribution)

15 1.Take a large problem and divide it into sub-problems 2.Perform the same function on all sub-problems 3.Combine the output from all sub-problems DoWork() … … … Output MAP REDUCE MapReduce in Hadoop (From David DeWitt’s Presentation at PASS 2012)

16 Big Data Support In Azure

17 Big Data – Azure Data Lake

18  Store – WebHDFS solution – Cosmos  Massive Scale Out  Built for Analytics  Currently accessible through U-SQL (Think C# and T-SQL combined)  Consider how else this might be available

19 Big Data – Azure Data Lake  Analytics  Queries as a Service using U-SQL  Massively Scaled Out Computation  Abstracted Storage  Optimized For You

20 Big Data – Azure HDInsight  HDInsight = Hortonworks Hadoop In the Cloud  Abstracted storage – allows compute to be more dynamic

21 Big Data - Azure SQL Data Warehouse  Cloud Based  Scale out - Elastic  Massive Data Volumes (Ingest > 10 TB / hour)  Relational and Non-Relational  Leverage T-SQL

22 Azure SQL Data Warehouse – Architecture

23 Event Hubs & Stream Analytics

24 Big Data Support – SQL Server 2016 PolyBase!

25 PolyBase Parallel Data Transfers (David DeWitt’s Presentation at PASS 2012) SQL Server … PolyBase Cluster DN Hadoop Cluster

26 Big Data Support – SQL Server 2016 - PolyBase  Seamless integration to key big data solutions  T-SQL through PolyBase queries HDFS  Import & export data from HDFS / Azure blob storage from / to SQL Server  Seamless BI integration  No need to learn MapReduce, etc

27 Big Data Support – SQL Server 2016 - PolyBase  Computational scale-out leverages parallel query execution framework developed for PDW and Azure SQL DW  Definition to Schema on Read in SQL Server  Easily allow querying of new data – no ETL  Statistics for Hadoop data

28 Big Data - Potential Use Cases  Data Warehouse Source  Real Time Analytics  Banking fraud detection  Insurance  Medical device metrics  Grocery  Long Term Storage – Cheaper  Search

29 Takeaways  Reconsider data in our environment  Determine if SQL Server is proper for all data  Install a sandbox / Azure resources

30 Questions?

31 Why Big Data? Because we LOVE data!


Download ppt "An Introduction To Big Data For The SQL Server DBA."

Similar presentations


Ads by Google