Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan
2 Who is using Apache Hadoop Traditionally = Developers Increasingly = Business Users / Data Scientists Why does this matter?
3 Configuring and managing a Hadoop cluster is hard
4 Resources / Expertise
5 Multiple Performance and Design Variables
6 The Cloud solves some of these
7 Advantages of using the cloud Fast Easy Flexible
8 You still require expertise
9 Lets check out another option
10 Hadoop in the Cloud Use Cases
11 Development / POC Clusters
12 Dynamic Clusters
13 Growth Clusters
14 Your data is already in the Cloud
15 Demo Run an actual job
Swift Filesystem for Hadoop: HADOOP-8545 New filesystem URL, swift:// Read from, write to local & remote Swift clusters Keep long-lived data in Swift; upload while Hadoop cluster off-line 16 The challenges of running Map Reduce jobs against Swift.. Identity management Block size Object store vs file paths Direct API into swift from HDFS
Map Reduce to Swift (via “HDFS”) 17 HDFS MapReduce Application X HDFS Proxy MapReduce Application X SWIFT
18 Hadoop + Openstack
19 Cloud Big Data Platform Hortonworks Data Platform HDP 1.1 HDP 1.3 Pig, Hive, HCatalog Coming soon HDP 2.0
20 Cloud Big Data Platform Secure by default Comes pre-optimized Web UI, CLI, REST API
21 Built on Openstack
22 Why an Open Platform matters Sandbox on Rackspace Cloud Sandbox VM RAX Resell
23 Cool stuff
24