Hadoop for SQL Server Pros You might want to find a different session if … You’ve read a lot about Hadoop You’ve seen Hadoop in action In other words, if you’re not a Hadoop beginner, you’re going to be bored! There are a LOT of great sessions; you won’t hurt my feelings if this one isn’t for you.
Hadoop for SQL Server Pros
Who am I? I lead data integration teams for HCA. @jboulineau www.newsqlblog.com jboulineau@gmail.com 12/2/2018 | Continuous Integration with SSDT
Objective Learn a bit about Hadoop by comparing it to something familiar … SQL Server! Agenda: A look at Hadoop basics Demo Hive
Analogies are like … Her eyes were like two brown circles with big black dots in the center. She had a deep, throaty, genuine laugh, like that sound a dog makes just before it throws up. Her vocabulary was as bad as, like, whatever.
What is Hadoop? https://www.linkedin.com/pulse/hadoop-ecosystem-2015-harald-van-der-weel
What is Hadoop? It is *not* a single, monolithic application. It is an open-source project made up of four different modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS): A distributed file system that provides high[ish]-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. ( https://hadoop.apache.org/ )
What is Hadoop? https://www.linkedin.com/pulse/hadoop-ecosystem-2015-harald-van-der-weel
Storing Data Source: http://www.journaldev.com/8800/hadoop2-architecture-and-how-major-components-works
Storing Data HDFS SQL Server Storage Engine Storage unit 128 MB block 8k page Redundancy 3 copies None Access Type WORM WMRM Implementation Java * C++ Allocation Metadata Name node filesystem image / edit log Allocation pages
Loading Data HDFS SQL Server Command line bcp API ADO.net sqoop Bcp / SSIS Hive Query optimizer 3rd party tools
Retrieving Data MapReduce Spark Hive Drill Impala Etc.
Retrieving Data
Managing Resources YARN
Manage Resources