Hadoopla: Microsoft and the Hadoop Ecosystem

Slides:



Advertisements
Similar presentations
R and HDInsight in Microsoft Azure
Advertisements

Senior Project Manager & Architect Love Your Data.
MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Distributed and Parallel Processing Technology Chapter1. Meet Hadoop Sun Jo 1.
Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:
Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
Presented by John Dougherty, Viriton 4/28/2015 Infrastructure and Stack.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
An Introduction to HDInsight June 27 th,
Hadoop implementation of MapReduce computational model Ján Vaňo.
Nov 2006 Google released the paper on BigTable.
Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Partner since 2011
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Microsoft Ignite /28/2017 6:07 PM
An Introduction to Big Data (With a strong focus on Apache) Nick Burch Senior Developer, Alfresco Software VP ConCom, ASF Member.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Introduction to Hadoop
#SQLSat266.
OMOP CDM on Hadoop Reference Architecture
Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.
SAS users meeting in Halifax
Big Data-BI Fusion: Microsoft HDInsight & MS BI
Hadoop.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Chapter 14 Big Data Analytics and NoSQL
Hadoop Clusters Tess Fulkerson.
Central Florida Business Intelligence User Group
Hadoop EcoSystem B.Ramamurthy.
07 | Analyzing Big Data with Excel
Ministry of Higher Education
Big Data - in Performance Engineering
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
Microsoft Connect /22/2018 9:50 PM
Introduction to Apache
TIM TAYLOR AND JOSH NEEDHAM
Big Data: I Microsoft ima slona za utrku
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Big-Data Analytics with Azure HDInsight
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Big Data.
Presentation transcript:

Hadoopla: Microsoft and the Hadoop Ecosystem Presented at SQL Saturday Waltham May 19th, 2012 Jim O’Neil Developer Evangelist, Microsoft jim.oneil@microsoft.com  @jimoneil

Big Data Starts with a V Volume there’s a lot of it; we’re hoarders Variety schema-schmema, it’s coming from the ‘internet of things’ Velocity he who hesitates doesn’t get the worm

There’s a Tech for That Volume Data Warehouses Distributed File Systems + Map-Reduce Variety NoSQL databases Velocity Complex Event Processing

Two Dimensions of Scale Up Out

Scaling Out is Hard Programming complexity Number of Machines 1 2 3 4 5 6 … n Number of Machines

Distributed File Systems name node data node data node data node data node

Map Reduce job tracker name node data node data node data node task tracker

Map Reduce I am what I am Word count example I : 1 I : 2 I : 1 am: 1 var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; Word count example Map Reduce I am what I am map I : 1 I : 2 reduce var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") context.write( words[i].toLowerCase(), 1);} } }; I : 1 am: 1 what : 1 am : 1 shuffle and sort var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) if (words[i] !== "") context.write(words[i].toLowerCase(), 1); } am: 1 what: 1 am: 2 what : 1 reduce

Enter Hadoop Apache project (http://hadoop.apache.org) Open source implementation of Google File System and MapReduce Hadoop Distributed File System (HDFS) Hadoop MapReduce Hadoop Common

Hadoop History 2002 Doug Cutting develops Nutch, web crawler 2004 Google publishes MapReduce + GFS paper 2006 Cutting joins Yahoo! Hadoop becomes Apache Lucene subproject Hadoop becomes top-level Apache project Cutting joins Cloudera 2011 Hortonworks formed by Yahoo! and Benchmark Capital 2011 Hadoop reaches version 1.0.0 (Dec. 27)

Adopters Yahoo! has a 40,000 node cluster Facebook has over 30PB of data in Hadoop Oracle’s Big Data Appliance includes a Hadoop distribution JP Morgan Chase uses it for fraud detection eBay is replacing its core search technology with it Microsoft is working with Hortonworks to distribute Hadoop on Windows both in the cloud and on-premises

http://hadooponazure.com Hadoop on Azure Limited customer preview Windows Server on-premises distribution to follow http://hadooponazure.com

Sign up

Cluster Provisioning

Demo

The Menagerie Begins Pig: query infrastructure for Hadoop SQL-like scripts (Pig Latin) launch map-reduce jobs http://pig.apache.org/ Hive: data warehouse system for Hadoop HiveQL (SQL-like) for querying (launching map reduce jobs) http://hive.apache.org

More Demo

More Ecosystem Hbase: NoSQL database built on HDFS Cassandra: Wide column NoSQL store Sqoop: bridge from RDBMS to HDFS

And More Flume: log aggregator to HDFS Scribe: another log aggregator Chukwa: log processing platform ______ / ___//_ ______ ____ / /_/ / / / / \/ __/ / __/ / /_/ / / / / __/ / / /_/\____/_/_/_/\__/ /_/ Distributed Log Collection.

And Some More Zookeeper: distributed system coordinator Oozie: workflow engine Avro: data serialization system Ganglia: distributed monitoring system

We’re Not Done Yet! Mahout: machine learning library Pegasus: graph mining system CloudBurst: genome sequence mapping

And It’s Just One Piece of the Big Data Pie Microsoft’s big data solution And It’s Just One Piece of the Big Data Pie FAMILIAR END USER TOOLS Power View Excel with PowerPivot Predictive Analytics Embedded BI BI PLATFORM SSAS SSRS Microsoft SQL Server / PDW Connectors Hadoop On Windows Azure Hadoop On Windows Server UNSTRUCTURED & STRUCTURED DATA Sensors Devices Bots Crawlers ERP CRM LOB

I meant what I said, and I said what I meant I meant what I said, and I said what I meant. An elephant's faithful, one hundred percent. Jim O’Neil Developer Evangelist, Microsoft jim.oneil@microsoft.com  @jimoneil