Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Hadoop Basics -Venkat Cherukupalli

What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers Local computation and storage

Key Services Hadoop Distributed File System (HDFS) Reliable data storage MapReduce high-performance parallel data processing

HDFS Splits user data across servers in a cluster Replication - multiple node failures will not cause data loss Reliable, scalable and low-cost storage RAID – Massive scale Namenode and Datanode

MapReduce Parallel distributed processing system No special programming techniques Existing algorithms work without change

MapReduce Framework Processes large jobs in parallel across many nodes and combines results. Eliminates the bottlenecks imposed by monolithic storage systems. Results are collated and digested into a single output after each piece has been analyzed.

Self-healing Shifting work to the remaining nodes. Creates additional copy of the data from the replicas Self-healing for both storage and computation No sysadmin intervention

What is SQOOP Imports individual tables or entire databases to files in HDFS Generates Java classes to allow you to interact with your imported data Provides the ability to import from SQL databases straight into your Hive data warehouse Hive

Other Concepts HBase -is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java Hive - Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis data warehouseHadoopdata warehouseHadoop Pig -Platform for creating MapReduce programs used with Hadoop. MapReduceHadoopMapReduceHadoop ZooKeeper -Reliable distributed coordination

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Similar presentations

Presentation on theme: "Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Similar presentations

Presentation on theme: "Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers."— Presentation transcript:

Similar presentations

About project

Feedback