CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri
Introduction Data data more data several petabytes (PB) of data id transferring every day. Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. IT teams are burdened with ever-growing requests for data. Decision makers become frustrated because it takes hours or days to get answers to questions, if at all. Traditional architectures and infrastructures are not up to the challenge.
Abstract Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Proposed System By utilizing Sqoop we will import information from a social database framework into HDFS. Sqoop will read the table column by-line into HDFS. The yield of this import procedure is an arrangement of documents containing a duplicate of the foreign made table. Thus, the yield will be in different documents. These documents may be delimited content records or paired. In the wake of controlling the foreign records with Hive we will have an outcome information set which you can then fare back to the social database.
Database Data in MySQL File Script writers Real-time Hadoop cluster Web servers Hadoop Hive Structure
FLOW Step 1: Convert the data into files by using Sqoop sqoop import --connect jdbc:mysql://localhost/gsuproj --username sruthi --password sruthi --table pagelinks --target-dir sqoop-data Step 2: Store file into Hadoop cluster hadoop fs -copyFromLocal /root/pagelinks hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/pag elinks
Step 3: Read Data from HIVE CREATE external TABLE pagelinks ( pl_from string, pl_namespace string, pl_title string, pl_from_namespace string ) Row Format Delimited fields terminated by '~' LOCATION hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse; LOAD DATA INPATH hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse INTO TABLE pagelinks;
Advantages Scalable It can store and distribute very large sets across hundreds of the inexpensive servers that operate In parallel. Flexible Access different types of data (Structured and unstructured) Resilient to failure Data is sent to an individual node and also replicated to other nodes in the cluster,another copy available for use Fast Analysis Unique storage methods is based on a distributed file system. Efficiently process TB of data in just minutes and PB in hours Cost effective