1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering The Future of Data Management with Hadoop and the Enterprise Data Hub
2 © Cloudera, Inc. All rights reserved. Big Data is revolutionizing how businesses think Industrial RevolutionData Revolution
3 © Cloudera, Inc. All rights reserved. Helped 4+ million homes save over $320 Million for subscribers in energy bills Combined diverse data sets including streaming utility & sensor data in Cloudera Enterprise Improved usage insights help engage customers resulting in changes in energy usage Improve Products & Services Efficiency
4 © Cloudera, Inc. All rights reserved. MEDIA / ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care FINANCIAL SERVICES Risk & portfolio analysis New products CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment RETAIL Consumer sentiment Optimized marketing EDUCATION & RESEARCH Experiment sensor analysis LIFE SCIENCES Clinical trials Genomics AUTOMOTIVE Auto sensors reporting location, problems COMMUNICATIONS Location- based advertising HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis UTILITIES Smart Meter analysis for network capacity OIL & GAS Drilling exploration sensor analysis LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis Big Data is pervasive
5 © Cloudera, Inc. All rights reserved. What is data suddenly big? Web/Mobile Clickstream Social Media Sensor Networks Audio, Image & Video Video & Voice Processing Text Sentiment Analysis Social Graph Analysis
6 © Cloudera, Inc. All rights reserved. UNSTRUCTURED DATA * Source: IDC trillion gigabytes of data was created in 2011* More than 90% is unstructured data Data volume doubles every year 10,000 0 GB of Data (IN BILLIONS) Big Data is Getting Bigger & More Multi-structured STRUCTURED DATA
7 © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. The Established Way: Bringing Data to Applications Can’t Get a 360 View Many special-purpose systems Moving data around No complete views Can’t Retain Valuable Data Leaving data behind Risk and compliance High cost of storage Can’t Meet ETL SLAs Up-front modeling Transforms slow Transforms lose data Can’t Ask New Questions Existing systems strained No agility “BI backlog” SERVERSMARTSEDWSDOCUMENTSSTORAGESEARCHARCHIVE ERP, CRM, RDBMS, MACHINESFILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMSEXTERNAL DATA SOURCES
8 © Cloudera, Inc. All rights reserved. A modern data architecture is needed to drive success from data
9 © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. The Hadoop Way: Bringing Applications to Data SERVERSMARTSEDWSDOCUMENTSSTORAGESEARCHARCHIVE ERP, CRM, RDBMS, MACHINESFILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMSESTERNAL DATA SOURCES Consolidated Architecture Bring applications to data Combine different workloads on common data (i.e. SQL + Search) True analytic agility Active Archive Full fidelity original data Indefinite time, any source Lowest cost storage 1 1 Scalable Transformations One source of data for all analytics Persist state of transformed data Significantly faster & cheaper 2 2 Agile Exploration Simple search + BI tools “Schema on read” agility Reduce BI user backlog requests 3 3
10 © Cloudera, Inc. All rights reserved. Hadoop Ecosystem: An Open Platform NEW PROJECTS EXISTING PROJECTS *CDH SUPPORTED Core Hadoop (HDFS, MapReduce) Solr Pig Core Hadoop HBase ZooKeeper Solr Pig Core Hadoop Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Kudu* RecordService* Ibis* Falcon Knox Flink Parquet* Sentry* Spark* Tez Impala* Kafka* Drill Flume* Bigtop* Oozie* Hcatalog* Hue* Sqoop* Avro* Hive* Mahout* Hbase* ZooKeeper* Solr* Pig* YARN* Core Hadoop*
11 © Cloudera, Inc. All rights reserved. By 2017, Gartner “Predicts 2015: Big Data Challenges Move From Technology to the Organization” – November 2014 of big data projects will fail to go beyond the pilot phase 60% Through 2018, of deployed data lakes will be useless as they are overwhelmed with information assets captured for uncertain use cases. 90%
12 © Cloudera, Inc. All rights reserved. Big Data and the Technology Adoption Cycle According to FirstMark VC, Big Data is beginning the Early Majority
13 © Cloudera, Inc. All rights reserved. Where does the road go? Maturation Focus on AI Applications Specialized Use Case Support
14 © Cloudera, Inc. All rights reserved. In Healthcare, IoT can enable cutting the costs of chronic disease treatment by as much as 50 percent Source: McKinsey & Co - Customer Journey Analytics & Big Data, 2013 Source: McKinsey Analysis, The Internet of Things: Mapping the value beyond the hype, June 2015
15 © Cloudera, Inc. All rights reserved. End-to-end view of data is helping save lives by detecting sepsis early enough for successful treatment Has saved 100s of lives already & reduced hospital readmissions Centralized data from many systems available in a secure environment 2PB+ in multi-tenant environment supporting 100s of clients IMPROVE PRODUCTS & SERVICES EFFICIENCY
16 © Cloudera, Inc. All rights reserved. Thank you!
17 © Cloudera, Inc. All rights reserved. Data is Transforming Business DRIVE CUSTOMER INSIGHTS IMPROVE PRODUCTS & SERVICES EFFICIENCY LOWER BUSINESS RISKS