Download presentation
Presentation is loading. Please wait.
Published byAda Poole Modified over 9 years ago
1
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland 20850 (240) 389-0750 msilverman@treeminer.com
2
TREEMINER, INC. CONFIDENTIAL Agenda Introduction to Hadoop Developing and testing a Map/Reduce application Auto-Clustering in Hadoop and Interworking with Apache Storm
3
TREEMINER, INC. CONFIDENTIAL Introduction to Hadoop Hadoop consists of: Clustered, distributed, highly available file system (HDFS) Execution framework (Map/Reduce)
4
TREEMINER, INC. CONFIDENTIAL Hadoop File System “Rack” aware Local storage Distributed copies (generally 3) Rack
5
TREEMINER, INC. CONFIDENTIAL Sample Hadoop File System
6
TREEMINER, INC. CONFIDENTIAL Hadoop “Eco-System” Hive Allows SQL-like querying of data in HDFS Pig Basic scripting language for Hadoop Databases Hbase, Accumulo, Cassandra, Neo4j
7
TREEMINER, INC. CONFIDENTIAL Map / Reduce Parallel Execution Framework
8
TREEMINER, INC. CONFIDENTIAL Map / Reduce Parallel Execution Framework
9
TREEMINER, INC. CONFIDENTIAL WordCount Example
10
TREEMINER, INC. CONFIDENTIAL Getting Started Cloudera and Hortonworks have sandboxes that are easy to download and are fully contained implementations in a VM. Also download from Apache. http://hortonworks.com/products/hortonworks- sandbox/ http://www.cloudera.com/content/cloudera/en/dow nloads/quickstart_vms/cdh-5-3-x.html http://hadoop.apache.org/releases.html
11
TREEMINER, INC. CONFIDENTIAL Developing In Map / Reduce Standalone Mode – Hadoop runs as single process, best for debugging Pseudo-Distributed – Separate processes on same server Fully Distributed – Full blown cluster
12
TREEMINER, INC. CONFIDENTIAL Eclipse Framework Write code in eclipse PC or Linux Options: Run Hadoop on Windows Run Eclipse in Linux with Plugin Run Eclipse in Windows, Remote debug and profiling Profiling: Yourkit
13
TREEMINER, INC. CONFIDENTIAL WordCount Create a project in eclipse Load wordcount code (widely available and in sandbox downloads) Compile jar file Execute on hadoop in standalone mode $ hadoop jar path/to/file.jar input output
14
TREEMINER, INC. CONFIDENTIAL Monitoring Hadoop Jobs
15
TREEMINER, INC. CONFIDENTIAL Monitoring Hadoop Jobs
16
TREEMINER, INC. CONFIDENTIAL Resources http://www.cloudera.com http://www.hortonworks.com hadoop.apache.org http://web.stanford.edu/class/cs246/homew orks/tutorial.pdf Hadoop: A Definitive Guide by Tom White
17
TREEMINER, INC. CONFIDENTIAL Example: Document AutoClustering using Hadoop and Storm https://www.youtube.com/watch?v=5X65WV0n4rU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.