Download presentation
Presentation is loading. Please wait.
Published byWilfrid Thornton Modified over 9 years ago
1
Introduction to Sqoop
2
Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop
3
What is Sqoop? Sqoop is … a suite of tools that connect Hadoop and database systems Major functions of Sqoop Import tables from databases into HDFS for deep analysis Replicate database schemas in Hive’s metastore Export MapReduce results back to a database for presentation to end-users
4
RDBMS important but vulnerable? Importance of RDBMS Holds a lot of valuable data in the form of structured tables of several hundred GB Provides fast access for OLTP applications like Update / delete records, Add individual records, Complex transactions Vulnerability Can’t store very large datasets (1 TB+) Poor support for complex datatypes/ large objects Schema evolution is hard Analytic queries better suited to a batch-oriented system
5
RDBMS and Hadoop RDBMS HDFS Historical data (before processing) Results of data Analysis (after processing)
6
Sqoop use case : Demographics- aware site analytics
7
Sample Sqoop commands Import using Sqoop sqoop import --connect jdbc:mysql://db.foo.com/corp --table user-profiles Export using Sqoop sqoop export --connect jdbc:mysql://db.foo.com/corp --table ads_results --export-dir results JDBC mysql driver Input : mysql table Hdfs location with analysis results Output : mysql table
8
Key features of Sqoop JDBC-based implementation - Works with many popular database vendors Auto-generation of tedious user-side code - Writing MapReduce applications to work with data, faster Integration with Hive - Allows to stay in a SQL-based environment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.