© Hortonworks Inc. 2012 Inside hadoop-dev Steve Loughran– Apachecon EU, November 2012.

Slides:



Advertisements
Similar presentations
Apache Bigtop Working Group Cluster stuff. Cloud computing.
Advertisements

Introduction to Hadoop Richard Holowczak Baruch College.
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
Pig Optimization and Execution Page 1 Alan F. © Hortonworks Inc
Provisioning distributed OSGi applications in a cloud Guillaume Nodet, FuseSource November 2011.
© Hortonworks Inc Running Non-MapReduce Applications on Apache Hadoop Hitesh Shah & Siddharth Seth Hortonworks Inc. Page 1.
Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.
Visual Studio Team System (VSTS). Richard Hundhausen Author of software development books Microsoft Regional Director Microsoft MVP (VSTS) MCT, MCSD,
Hortonworks Eric Baldeschwieler – CEO © Hortonworks Inc Architecting the Future of Big Data June 29, 2011.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Virtual Machine approach to Security Gautam Prasad and Sudeep Pradhan 10/05/2010 CS 239 UCLA.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Software Configuration Management
G51FSE Version Control Naisan Benatar. Lecture 5 - Version Control 2 On today’s menu... The problems with lots of code and lots of people Version control.
TITLE SLIDE: HEADLINE Presenter name Title, Red Hat Date For Red Hat, it's 1994 all over again Sarangan Rangachari VP and GM, Storage and Big Data Red.
Virtual Desktop Infrastructure Solution Stack Cam Merrett – Demonstrator User device Connection Bandwidth Virtualisation Hardware Centralised desktops.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Apache Spark and the future of big data applications Eric Baldeschwieler.
Cloud Computing. Cloud Computing Overview Course Content
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Distributed Computing Systems Current Issues in DCS Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen
Our Experience Running YARN at Scale Bobby Evans.
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
© 2010 VMware Inc. All rights reserved Confidential VMware vFabric Data Director Powering Database-as-a-Service for Oracle, SQL Server, Hadoop and vFabric.
Transparency in Distributed Operating Systems Vijay Akkineni.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Branching. Version Control - Branching A way to write code without affecting the rest of your team Merge branches to integrate your changes.
© Hortonworks Inc HDFS: Hadoop Distributed FS Steve Loughran, ATLAS workshop, June 2013.
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 1 LANL-stor and the Challenges of Evolutionary Development Managing.
Page 1 © Hortonworks Inc – All Rights Reserved More Data, More Problems A Practical Guide to Testing on Hadoop 2015 Michael Miklavcic.
Sakai Development Process Michael Korcuska July 8, 2009.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Page 1 TBD 12/08/2014 Formation GIT Laurent Kappel Groupe SII 65, rue de Bercy Paris Tél : Fax :
Stairway to the cloud or can we take the highway? Taivo Liik.
© Hortonworks Inc Hadoop: Beyond MapReduce Steve Loughran, Big Data workshop, June 2013.
Nov 2006 Google released the paper on BigTable.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Build and Deployment Process Understand NCI’s DevOps and continuous integration requirements Understand NCI’s build and distribution requirements.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Next Generation of Apache Hadoop MapReduce Owen
Benjamin Day Get Good at DevOps: Feature Flag Deployments with ASP.NET, WebAPI, & JavaScript.
Page 1 Cloud Computing JYOTI GARG CSE 3 RD YEAR UIET KUK.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Microsoft Partner since 2011
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
CON8473 – Oracle Distribution of OpenStack Ronen Kofman Director of Product Management Oracle OpenStack September, 2014 Copyright © 2014, Oracle and/or.
Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta.
Page 1 © Hortonworks Inc – All Rights Reserved Apache Hadoop - Virtualization Winter 2015 Version 1.4 Hortonworks. We do Hadoop.
CyberPatriots 2016 Student Handbook.
Constructing Deploying and Maintaining Enterprise Systems
Introduction to ODPi Roman VP of
Docker Birthday #3.
Hadoop Clusters Tess Fulkerson.
Build /21/2018 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION.
Winter 2016 (c) Ian Davis.
Next-generation cluster management architecture and software
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing What is it ? Why use it ? Enablers Pros and Cons
Oracle 1z0-928 Oracle Cloud Platform Big Data Management 2018 Associate.
Presentation transcript:

© Hortonworks Inc Inside hadoop-dev Steve Loughran– Apachecon EU, November 2012

© Hortonworks Inc HP Labs: –Deployment, cloud infrastructure, Hadoop-in-Cloud Apache – member and committer –Ant (author, Ant in Action), Axis 2 –HadoopJoined Hortonworks in 2012 –UK based R&D Page 2

© Hortonworks Inc Hadoop is the OS for the datacentre Page 3

© Hortonworks Inc Page 4

History: ASF releases slowed Page 5 64 Releases from Branches from the last 2.5 years: –0.20.{0,1,2} – Stable release without security –0.20.2xx.y – Stable release with security – – released, unstable, deprecated – – orphan, unstable, lack of community –0.23.x Cloudera CDH: fork w/ patches pushed back

Now: 2 ASF branches Page 6 Hadoop 1.x Stable, used in production systems Features focus on fixes & low-risk performance Hadoop 2.x/trunk The successor Alpha-release. Download and test Where features & fixes first go in Your new code goes here.

© Hortonworks Inc Loosely coupled projects form the stack Page 7

© Hortonworks Inc Incubating & graduate projects Page 8 HCatalog Ambari Kafka Giraph templeton

© Hortonworks Inc Integration is a major undertaking Page 9 Latest ASF artifacts Stable, tested ASF artifacts ASF + own artifacts

© Hortonworks Inc What does all this mean? Page 10

© Hortonworks Inc There is more work than we can cope with Page 11

© Hortonworks Inc Hadoop is CS-Hard Core HDFS, MR and YARN –Distributed Computing –Consensus Protocols & Consistency Models –Work Scheduling & Data Placement –Reliability theory –CPU Architecture; x86 assembler Others –Machine learning –Distributed Transactions –Graph Theory –Queue Theory –Correctness proofs Page 12

© Hortonworks Inc If you have these skills, come and play! Page 13

© Hortonworks Inc But there are barriers Page 14

© Hortonworks Inc Your time & cluster Full time core Hortonworks + Cloudera Full time projects at others: LinkedIn, IBM, MSFT, VMWare Single developers can't compete Small test runs take too long Your cluster probably isn't as big as Yahoo!'s Commit-then-review neglects everyone's patches Page 15

© Hortonworks Inc Fear of damage The worth of Hadoop is the data in HDFS  the worth of all companies whose data it is  cost to individuals of data loss  cost to governments of losing their data ∴ resistance to radical changes in HDFS Scheduling performance worth $100Ks to individual organisations ∴ resistance to radical work in compute layer except by people with track record Page 16

© Hortonworks Inc Fear of support and maintenance costs What will show up on Yahoo!-scale clusters? Costs of regression testing Who maintains the code if the author disappears? Documentation? The 80%-done problem Page 17

© Hortonworks Inc How to get your code in Trust: get known in the -dev lists, meet-ups Competence: help with patches other than your own. Don't attempt rewrites of the core services Help develop plugin-points Test across the configuration space Test at scale, complexity, “unusualness” Page 18

© Hortonworks Inc Page 19 Testing: not just for the 1%

© Hortonworks Inc Page 20 Testing: not just for the 1% you have network and scale issues

© Hortonworks Inc Documentation & Books Page 21

© Hortonworks Inc Challenge: Major Works YARN and HDFS HA –Branch w/out RTC then review at merge –Agile; merge costs scale w/ duration of branch Independent works –Things that didn't get in -my lifecycle work, … –VMWare virtualisations –initial failure topology how best to get this stuff in Postgraduate Research –How to get the next generation of postgraduate researchers developing in and with Apache Hadoop? Page 22

© Hortonworks Inc A mentoring program? Guided support for associated projects, the goal to be to merge into the Hadoop codebase. Who has the time to mentor? Page 23

© Hortonworks Inc Better Distributed Development Regional developer workshops –with local university participation? Online meet-ups: google+ hangouts? –Shared IDEA or other editor sessions –Remote presentations and demos Page 24

© Hortonworks Inc Git + Gerrit Page 25

© Hortonworks Inc Get involved! Page 26 svn.apache.org issues.apache.org {hadoop,hbase, mahout, pig, oozie, …}.apache.org

© Hortonworks Inc hortonworks.com Page 27