Download presentation
Presentation is loading. Please wait.
1
Hadoop for SQL Server Pros
You might want to find a different session if … You’ve read a lot about Hadoop You’ve seen Hadoop in action In other words, if you’re not a Hadoop beginner, you’re going to be bored! There are a LOT of great sessions; you won’t hurt my feelings if this one isn’t for you.
2
Hadoop for SQL Server Pros
3
Who am I? I lead data integration teams for HCA. @jboulineau
12/2/2018 | Continuous Integration with SSDT
4
Objective Learn a bit about Hadoop by comparing it to something familiar … SQL Server! Agenda: A look at Hadoop basics Demo Hive
5
Analogies are like … Her eyes were like two brown circles with big black dots in the center. She had a deep, throaty, genuine laugh, like that sound a dog makes just before it throws up. Her vocabulary was as bad as, like, whatever.
6
What is Hadoop?
7
What is Hadoop? It is *not* a single, monolithic application. It is an open-source project made up of four different modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS): A distributed file system that provides high[ish]-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. ( )
8
What is Hadoop?
9
Storing Data Source:
10
Storing Data HDFS SQL Server Storage Engine Storage unit 128 MB block
8k page Redundancy 3 copies None Access Type WORM WMRM Implementation Java * C++ Allocation Metadata Name node filesystem image / edit log Allocation pages
11
Loading Data HDFS SQL Server Command line bcp API ADO.net sqoop
Bcp / SSIS Hive Query optimizer 3rd party tools
12
Retrieving Data MapReduce Spark Hive Drill Impala Etc.
13
Retrieving Data
14
Managing Resources YARN
15
Manage Resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.