Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

Slides:



Advertisements
Similar presentations
© Hortonworks Inc MapReduce over snapshots HBASE-8369 Enis Soztutar Enis [at] apache [dot] Page 1.
Advertisements

Beyond Mapper and Reducer
SQOOP HCatalog Integration
Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
HPC Pack On-Premises On-premises clusters Ability to scale to reduce runtimes Job scheduling and mgmt via head node Reliability HPC Pack Hybrid.
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
Shujaat Hussain.  Karmasphere's core technology, the Karmasphere Application Framework, is an open platform that provides independence across Hadoop.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
EXTENDING SCIENTIFIC WORKFLOW SYSTEMS TO SUPPORT MAPREDUCE BASED APPLICATIONS IN THE CLOUD Shashank Gugnani Tamas Kiss.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
HADOOP ADMIN: Session -2
Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.
Apache Spark and the future of big data applications Eric Baldeschwieler.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
An Introduction to HDInsight June 27 th,
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Page 1 © Hortonworks Inc – All Rights Reserved More Data, More Problems A Practical Guide to Testing on Hadoop 2015 Michael Miklavcic.
What does it mean to virtualize the Hadoop File System?
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Stairway to the cloud or can we take the highway? Taivo Liik.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Apache PIG rev Tools for Data Analysis with Hadoop Hadoop HDFS MapReduce Pig Statistical Software Hive.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
This is a free Course Available on Hadoop-Skills.com.
BIG DATA/ Hadoop Interview Questions.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Apache Hadoop on Windows Azure Avkash Chauhan
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Apache Tez : Accelerating Hadoop Query Processing Page 1.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Microsoft Partner since 2011
MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith
Organizations Are Embracing New Opportunities
Spark Presentation.
The Hadoop Sandbox The Playground for the Future of Your Career
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
SQOOP.
Hadoop Clusters Tess Fulkerson.
Enterprise security for big data solutions on Azure HDInsight
SQL Server on Amazon Web Services
SQL Server on Amazon Web Services
Pig Hive HBase Zookeeper
Presentation transcript:

Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

2 Who is using Apache Hadoop Traditionally = Developers Increasingly = Business Users / Data Scientists Why does this matter?

3 Configuring and managing a Hadoop cluster is hard

4 Resources / Expertise

5 Multiple Performance and Design Variables

6 The Cloud solves some of these

7 Advantages of using the cloud Fast Easy Flexible

8 You still require expertise

9 Lets check out another option

10 Hadoop in the Cloud Use Cases

11 Development / POC Clusters

12 Dynamic Clusters

13 Growth Clusters

14 Your data is already in the Cloud

15 Demo Run an actual job

Swift Filesystem for Hadoop: HADOOP-8545 New filesystem URL, swift:// Read from, write to local & remote Swift clusters Keep long-lived data in Swift; upload while Hadoop cluster off-line 16 The challenges of running Map Reduce jobs against Swift.. Identity management Block size Object store vs file paths Direct API into swift from HDFS

Map Reduce to Swift (via “HDFS”) 17 HDFS MapReduce Application X HDFS Proxy MapReduce Application X SWIFT

18 Hadoop + Openstack

19 Cloud Big Data Platform Hortonworks Data Platform HDP 1.1 HDP 1.3 Pig, Hive, HCatalog Coming soon HDP 2.0

20 Cloud Big Data Platform Secure by default Comes pre-optimized Web UI, CLI, REST API

21 Built on Openstack

22 Why an Open Platform matters Sandbox on Rackspace Cloud Sandbox VM RAX Resell

23 Cool stuff

24