Cloudera & Hadoop Use Cases Rob Lancaster | Omer Trajman "Big Data"... Applications From Enterprises to Individuals.

Slides:



Advertisements
Similar presentations
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Advertisements

Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
INTEGRATING BIG DATA TECHNOLOGY INTO LEGACY SYSTEMS Robert Cooley, Ph.D.CodeFreeze 1/16/2014.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Mihai Pintea. 2 Agenda Hadoop and MongoDB DataDirect driver What is Big Data.
Architecting for the Internet of Things
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
May 23nd 2012 Matt Mead, Cloudera
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Apache Spark and the future of big data applications Eric Baldeschwieler.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer,
© 2012 Unisys Corporation. All rights reserved. 1 Unisys Corporation. Proprietary and Confidential.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Nov 2006 Google released the paper on BigTable.
© 2012 IBM Corporation Converting Big Data into Big Knowledge.
Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering The Future of Data Management with Hadoop and the Enterprise Data.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
This is a free Course Available on Hadoop-Skills.com.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Microsoft Partner since 2011
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
OMOP CDM on Hadoop Reference Architecture
CNIT131 Internet Basics & Beginning HTML
Data Platform and Analytics Foundational Training
Connected Living Connected Living What to look for Architecture
Chapter 14 Big Data Analytics and NoSQL
Hadoopla: Microsoft and the Hadoop Ecosystem
Connected Living Connected Living What to look for Architecture
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
SQOOP.
Hadoop Clusters Tess Fulkerson.
Operationalize your data lake Accelerate business insight
Ministry of Higher Education
Overview of big data tools
Azure Data Lake for First Time Swimmers
Charles Tappert Seidenberg School of CSIS, Pace University
Customer 360.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Cloudera & Hadoop Use Cases Rob Lancaster | Omer Trajman "Big Data"... Applications From Enterprises to Individuals

The ‘Big Data’ Phenomenon ©2011 Cloudera, Inc. All Rights Reserved. 2 Big Data Drivers:  The proliferation of data capture and creation technologies  Increased “interconnectedness” drives consumption (creating more data)  Inexpensive storage makes it possible to keep more, longer  Innovative software and analysis tools turn data into information Big Data encompasses not only the content itself, but how it’s consumed. More Devices More Consumption More Content New & Better Information  Every gigabyte of stored content can generate a petabyte or more of transient data*  The information about you is much greater than the information you create *Source: IDC 2011

Big Data Challenges It’s not just about “big” ©2011 Cloudera, Inc. All Rights Reserved. 3 Cost-effectively managing the volume, velocity and variety of data Deriving value across structured and unstructured data Adapting to context changes and integrating new data sources and types

Common Challenges ©2011 Cloudera, Inc. All Rights Reserved. 4 1 Network Analysis and Sessionization 2 Content Optimization and Engagement Modeling 3 Usage Analysis and Mediation 4 Entity Surveillance and Signal Monitoring 5 Recommendations and Modeling 6 Loyalty, Promotion Analysis and Targeting 7 Fraud Analysis, Reconciliation and Risk 8 Time series Analysis, Mapping and Modeling

What is Apache Hadoop? 5 Hadoop Distributed File System (HDFS) MapReduce Consolidates Mixed Data Complex and relational data into a single repository Stores Inexpensively Keep raw data always available Processes at the Source Eliminate ETL bottlenecks Mine data first, govern later Apache Hadoop is a platform for data storage and processing that is… Scalable Fault tolerant Open source CORE HADOOP COMPONENTS ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.

Cloudera in Production ©2011 Cloudera, Inc. All Rights Reserved. 6 Logs Files Web Data Relational Databases IDE’s BI / Analytics Enterprise Reporting Enterprise Data Warehouse Operational Rules Engines Management Tools OPERATORSENGINEERS ANALYSTSBUSINESS USERS Cloudera’s Distribution Including Apache Hadoop (CDH) & SCM Express Cloudera Enterprise  Cloudera Management Suite  Cloudera Support UNIVERSITY  Consulting Services  Cloudera University Web Application CUSTOMERS

What Can Hadoop Do For You? ©2011 Cloudera, Inc. All Rights Reserved. 7 ADVANCED ANALYTICS 12 Two Core Use Cases Applied Across Industries DATA PROCESSING Social Network Analysis Content Optimization Network Analytics Loyalty & Promotions Analysis Fraud Analysis Entity Analysis Clickstream Sessionization Engagement Mediation Data Factory Trade Reconciliation SIGINT INDUSTRY TERM INDUSTRY Web Media Telco Retail Financial Federal Bioinformatics Genome Mapping Sequencing Analysis

Genomics Cost of DNA Sequencing Falling Very Fast Raw data needs to be aligned and matched Scientists want to collect and analyze these sequences Hadoop Can Read Native Format hadoop-bam Java library for manipulation of Binary Alignment/Map Alignment, SNP discovery, genotyping Genomic Tools Based On Hadoop SEAL – distributed short read alignment BlastReduce – parallel read mapping Crossbow – whole genome re-sequencing analysis Cloudburst - sensitive MapReduce alignment Copyright 2010 Cloudera Inc. All rights reserved 8

Biodiversity Indexing Consolidation and serving of Biological data Provide free and open access to biodiversity data Collection, search, discovery and access to a variety of data Data matching and cleansing Geography, Water/land mapping Dictionaries and taxonomic services Data is harvested into multiple RDBMS Sqoop to Hadoop for processing workflows and index generation Sqoop back to MySQL for Web app serving Future development is to crawl into and serve from HBase ©2011 Cloudera, Inc. All Rights Reserved. 9

Processing Seismic Data Optimize the IO-intensive phases of seismic processing Incorporate additional parallelism where it makes sense Simplify gather/transpose operations with MapReduce Seismic Unix for Core Algorithms Well-known, used at many grad programs in geophysics SU file format can be easily transformed for processing on HDFS Hadoop Streaming Seismic Unix, SEPlib, Javaseis - non-Java code in MR Framework is aware of parameter files needed by SU commands Copyright 2011 Cloudera Inc. All rights reserved

Targeted Offers ©2011 Cloudera, Inc. All Rights Reserved. 11 The checkout lane is everywhere Cookies track users through ad impressions Purchasing behavior is time sensitive Logs collected from on-site and off-site browsing Data is ingested incrementally Process happens at a variety of time scales Data logged to HBase as primary store Some events naturally associate, others require deeper analysis Random access useful for debugging algorithms

Recommendations and Forecasting Copyright 2010 Cloudera Inc. All rights reserved 12 Collect and serve personalization information Wide variety of constantly changing data sources Data guaranteed to be messy Data ingestion includes collection of raw data Filtering and fixing of poorly formatted data Normalization and matching across data sources Analysis looks for reliable attributes and groupings Interpretation (e.g. gender by name) Aggregation across likely matching identifiers Identify possible predicted attributes or preferences

Who is Cloudera? 13 The #1 commercial and non-commercial Apache Hadoop distribution. Complete, Integrated Hadoop StackWho is Cloudera? Helps organizations profit from all their data Largest contributor to Hadoop ecosystem Provides the most widely used open source distribution Develops the most sophisticated Hadoop operations software Supports mission critical Hadoop clusters Trained the largest number of Hadoop Developers and Administrators Coordination Data Integration Fast Read/Write Access Languages / Compilers Workflow Scheduling Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE APACHE OOZIE APACHE HIVE File System Mount UI Framework SDK FUSE-DFSHUEHUE SDK ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.

©2011 Cloudera, Inc. All Rights Reserved. 14 Cloudera helps you profit from all your data. cloudera.com +1 (888) twitter.com/ cloudera facebook.com/ cloudera Get Hadoop