Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Slides:



Advertisements
Similar presentations
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Advertisements

System Center 2012 R2 Overview
R and HDInsight in Microsoft Azure
Drive Data Quality at Your Company: Create a Data Lake George Corugedo Chief Technology Officer & Co-Founder.
Senior Project Manager & Architect Love Your Data.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
Why Spark on Hadoop Matters
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
Microsoft SQL Server x 46% 900+ For Hosting Service Providers
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
TITLE SLIDE: HEADLINE Presenter name Title, Red Hat Date For Red Hat, it's 1994 all over again Sarangan Rangachari VP and GM, Storage and Big Data Red.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect Microsoft Corporation.
Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
MICROSOFT AZURE ISV PROFILE: D-SCOPE SYSTEMS D-Scope Systems is an enterprise-level medical media product and integration specialist company. It provides.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
© Hortonworks Inc Putting the Sting in Hive Alan Gates
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
PANEL SENIOR BIG DATA ARCHITECT BD-COE
Nov 2006 Google released the paper on BigTable.
CloudWay.ro Gives Clients Fast Invoicing, Stock Management, and Resource Planning via Microsoft Azure and Azure SQL Database MICROSOFT AZURE ISV PROFILE:
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Data-Centric Security and User Access Controls for Hadoop on Microsoft Azure MICROSOFT AZURE APP BUILDER PROFILE: BLUETALON BlueTalon provides data-centric.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering The Future of Data Management with Hadoop and the Enterprise Data.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
MICROSOFT AZURE APP BUILDER PROFILE: RAVERUS LTD. Raverus is a customer-driven company engaged in providing software applications designed to improve and.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Partner since 2011
DreamFactory for Microsoft Azure Is an Open Source REST API Platform That Enables Mobilization of Data in Minutes across Frameworks and Storage Methods.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Microsoft Ignite /28/2017 6:07 PM
Page 1 © Hortonworks Inc – All Rights Reserved Apache Hadoop - Virtualization Winter 2015 Version 1.4 Hortonworks. We do Hadoop.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
OMOP CDM on Hadoop Reference Architecture
Big Data at The Speed of Business
Connected Infrastructure
Big Data Analytics with HDInsight
Smart Building Solution
Why is my Hadoop* job slow?
Hadoop in the Enterprise
Smart Building Solution
Hadoopla: Microsoft and the Hadoop Ecosystem
Connected Infrastructure
Voice Analytics on Microsoft Azure Allows Various Customers to Get the Most Out of Conversations with Clients Through Efficient Content Analysis MICROSOFT.
Auth0 Is Identity Made Simple for Developers, Built by Developers and Supported by the High Availability and Performance of Microsoft Azure MICROSOFT AZURE.
Microsoft Connect /22/2018 9:50 PM
Accelerate Your Self-Service Data Analytics
Technical Capabilities
Introduction to Azure Data Lake
Presentation transcript:

Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMS Sentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA

APPLICATIONS DATA SYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMSEDWMPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in % from New Data Types 15x Machine Data by ZB by 2020 OLTP, ERP, CRM Systems Unstructured documents, s Clickstream Server logs Sentiment, Web Data Sensor. Machine Data Geolocation

OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATA SYSTEM REPOSITORIES SOURCES RDBMSEDWMPP OLTP, ERP, CRM Systems Documents, s Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data Governance & Integration SecurityOperations Data Access Data Management APPLICATIONS Business Analytics Custom Applications Packaged Applications OLTP, ERP, CRM Systems Unstructured documents, s Clickstream Server logs Sentiment, Web Data Sensor. Machine Data Geolocation

SCALE SCOPE New Analytic Apps New types of data LOB-driven

SCALE SCOPE A Modern Data Architecture/Data Lake New Analytic Apps New types of data LOB-driven RDBMS MPP EDW Governance & Integration SecurityOperations Data Access Data Management Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari (SCOM) Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume WebHDFS YARN : Data Operating System DATA MANAGEMENT SECURITY DATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Stream Storm Others In-Memory Analytics, ISV engines 1°°°°°°°°° °°°°°°°°°° °°°°°°°°°° ° ° N HDFS (Hadoop Distributed File System) Batch Map Reduce Deployment Choice LinuxWindowsOn-PremiseCloud Hortonworks Data Platform (HDP) The Only Completely Open Distribution for Apache Hadoop Fundamentally Versatile and Comprehensive enterprise capabilities Wholly Integrated for deep ecosystem interoperability

HDP certifies most recent & stable community innovation Hortonworks Data Platform Solr Hadoop &YARN Pig Tez Hive & HCatalog HBase Sqoop Oozie Zookeeper Mahout Ambari Storm Flume Knox Phoenix HDP 1.3 May HDP 2.0 October 2013 HDP 2.1 April 2014 SecurityOperations Data Access Data Management Falcon Governance & Integration

SOURCES APPLICATIONS OPERATIONAL TOOLS DEV & DATA TOOLS INFRASTRUCTURE xΩxΩ xΩxΩ a DATA SYSTEM HDInsight Azure New! Power BI

Traditional Database SCALE (storage & processing) Hadoop Platform NoSQL MPP Analytics EDW schema speed governance best fit use processing Required on write Required on read Reads are fast Writes are fast Standards and structured Loosely structured Limited, no data processing Processing coupled with data data types Structured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing

All offerings co-engineered by Hortonworks and Microsoft Enjoy seamless interoperability across on-premises and cloud

DATA ACCESS YARN : Data Operating System DATA MANAGEMENT 1°°°°°°°°° °°°°°°°°°° °°°°°°°°°° ° ° N HDFS (Hadoop Distributed File System) Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines Batch Map Reduce

Single Use System Batch Apps Multi Use Data Platform Batch, Interactive, Online, Streaming, … 1 st Gen of Hadoop HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) Redundant, Reliable Storage (HDFS) Efficient Cluster Resource Management & Shared Services (YARN) Flexible Data Processing Hive, Pig, others… Batch MapReduce Batch & Interactive Tez Online Data Processing HBase, Accumulo Stream Processing Storm others … 2 nd Gen of Hadoop Classic Hadoop Apps

NodeManager map 1.1 vertex NodeManager map 1.2 reduce 1.1 Batch vertex vertex vertex Interactive SQL ResourceManager Scheduler Real-Time nimbus 0 nimbus 1 nimbus 2

Business Analytics Custom Apps Apache YARN Apache MapReduce 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N Apache Tez Apache Hive SQL ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Apache Hive Contribution… an Open Community at its finest 1,672 Jira Tickets Closed 145 Developers 44 Companies ~390,000 Lines Of Code Added… (2x) 13 Months

Replaces MapReduce as primitive for Hive, Pig, etc Task with pluggable Input, Processor and Output Tez Task - Task Processor InputOutput

Hive – MRHive – Tez SELECT a.state JOIN (a, c) SELECT c.price SELECT b.id JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) MMM R R MM R MM R M M R HDFS MMM R R R MM R R SELECT a.state, c.itemId JOIN (a, c) JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) SELECT b.id SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Tez avoids unneeded writes to HDFS

Hive SQL DatatypesHive SQL Semantics INTSELECT, INSERT TINYINT/SMALLINT/BIGINTGROUP BY, ORDER BY, SORT BY BOOLEANJOIN on explicit join key FLOATInner, outer, cross and semi joins DOUBLESub-queries in FROM clause STRINGROLLUP and CUBE TIMESTAMPUNION BINARYWindowing Functions (OVER, RANK, etc) DECIMALCustom Java UDFs ARRAY, MAP, STRUCT, UNIONStandard Aggregation (SUM, AVG, etc.) DATEAdvanced UDFs (ngram, Xpath, URL) VARCHARSub-queries for IN/NOT IN, HAVING CHARExpanded JOIN Syntax INTERSECT / EXCEPT Hive 0.12 (HDP 2.0) Hive 0.11 Hive 0.13 (HDP 2.1) SQL Compliance Hive provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop

Disaster Recovery and Backup between environments Publishing data between environments for Discovery Site to Site Site to Cloud

Define sophisticated retention policies Simplify data retention for audit, compliance, or for data re-processing Staged Data Retain 5 Years Cleansed Data Retain 3 Years Conformed Data Retain 3 Years Presented Data Retain Last Copy Only

HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° MapReduce Indexing Job

Enterprise Identity Provider LDAP/AD Enterprise Identity Provider LDAP/AD Identity Providers Knox Gateway GWGW DMZ A stateless reverse proxy instance deployed in DMZ Firewall HDP Cluster 1 Masters JT NN Web HCat Oozie YARN HBase Hive DN TT HDP Hadoop Cluster 2 Masters JT NN Web HCat Oozie YARN HBase Hive DN TT -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway Firewall REST Client JDBC Client Browser

Ambari: Deploy, Manage, Monitor AMBARI WEB compute & storage PROVISION MANAGE MONITOR REST APIs AMBARI SERVER PROVISION | MANAGE | MONITOR

Ambari SCOM Mgmt Pack HADOOP Storage & Process at Scale Ambari SCOM Server Ambari SCOM Server aggregates + exposes Hadoop metrics Ambari SCOM monitors health + alerts in case of problems