1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering, Cloudera, Inc. The Future of Data Management with Hadoop and the Enterprise Data Hub
2 © Cloudera, Inc. All rights reserved.
3 ©2014 Cloudera, Inc. All rights reserved. Cloudera Snapshot Founded2008, by former employees of Employees Today900+ World Class Support24x7 Global Staff Pro-active & Predictive Support Programs Mission CriticalThousands of Enterprise Users Over ~600 Paying Subscription Customers The Largest EcosystemOver Partners Cloudera UniversityOver 100,000+ Trained Open Source LeadersCloudera Employees are Leading Developers & Contributors Total Capital Raised$1B+ (from Intel, Google, Dell, T. Rowe Price, Accel, Greylock) MissionHelp Organizations Leverage the Power of All Their Data to Ask Bigger Questions.
4 © Cloudera, Inc. All rights reserved. A Big Data Revolution is happening as we speak Industrial RevolutionData Revolution
5 © Cloudera, Inc. All rights reserved. Data Drives Industries Financial ServicesPublic Sector Healthcare Telecommunications Retail Optimize network performanceMoney laundering detection Cyber security detection Product recommendations Personalized medicine
6 © Cloudera, Inc. All rights reserved. Data Drives Business SalesOperations Product Marketing Customer Satisfaction Increase conversions by 2%Convert 5% more leads Reduce fraud by 3% Reduce churn by 1% Increase user adoption by 10%
7 © Cloudera, Inc. All rights reserved. Why is Big Data Happening Now? Everything that can be measured will be measured. Employees and customers expect more personal interactions, but not at the cost of their privacy. The age of “segment of 1”. The most innovative companies embrace experimentation, predictive analytics and agility. InstrumentationPersonalizationAdvanced Analytics
8 © Cloudera, Inc. All rights reserved. Data is fueling this opportunity Web/Mobile Clickstream Social Media Sensor Networks Audio, Image & Video
9 © Cloudera, Inc. All rights reserved. Access to diverse analysis techniques SQL Video & Voice Processing Text Sentiment Analysis Social Graph Analysis
10 © Cloudera, Inc. All rights reserved. People require analytics “80% of CEOs cite data mining and analytics as strategically important.” PWC CEO Survey
11 © Cloudera, Inc. All rights reserved. UNSTRUCTURED DATA * Source: IDC trillion gigabytes of data was created in 2011* More than 90% is unstructured data Data volume doubles every year 10,000 0 GB of Data (IN BILLIONS) Big Data is Getting Bigger & More Multi-structured STRUCTURED DATA
12 © Cloudera, Inc. All rights reserved. Hadoop Changes the Game: Storage & Compute Together ©2014 Cloudera, Inc. All rights reserved. The Hadoop Way The Old Way $30,000+ per TB Expensive & Unattainable Hard to scale Network is a bottleneck Only handles relational data Difficult to add new fields & data types Expensive, Special purpose, “Reliable” Servers Expensive Licensed Software Network Data Storage (SAN, NAS) Compute (RDBMS, EDW) $300-$1,000 per TB Affordable & Attainable Scales out forever No bottlenecks Easy to ingest any data Agile data access Commodity “Unreliable” Servers Hybrid Open Source Software Compute (CPU) Memory Storage (Disk) z z z z
13 © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. The Legacy Way: Bringing Data to Applications Can’t Get a 360 View Many special-purpose systems Moving data around No complete views Can’t Retain Valuable Data Leaving data behind Risk and compliance High cost of storage Can’t Meet ETL SLAs Up-front modeling Transforms slow Transforms lose data Can’t Ask New Questions Existing systems strained No agility “BI backlog” SERVERSMARTSEDWSDOCUMENTSSTORAGESEARCHARCHIVE ERP, CRM, RDBMS, MACHINESFILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMSEXTERNAL DATA SOURCES
14 © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. The Agile Way: Bringing Applications to Data SERVERSMARTSEDWSDOCUMENTSSTORAGESEARCHARCHIVE ERP, CRM, RDBMS, MACHINESFILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMSESTERNAL DATA SOURCES Consolidated Architecture Bring applications to data Combine different workloads on common data (i.e. SQL + Search) True analytic agility Active Archive Full fidelity original data Indefinite time, any source Lowest cost storage 1 1 Scalable Transformations One source of data for all analytics Persist state of transformed data Significantly faster & cheaper 2 2 Agile Exploration Simple search + BI tools “Schema on read” agility Reduce BI user backlog requests 3 3
15 © Cloudera, Inc. All rights reserved. Hadoop is more than just Apache Hadoop Present Core Hadoop (HDFS, MR) HBase ZooKeeper Core Hadoop Hive Pig Mahout HBase ZooKeeper Core Hadoop Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Spark Impala Solr Kafka Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Parquet Sentry Spark Impala Solr Kafka Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop +YARN
16 © Cloudera, Inc. All rights reserved. Cloudera Enterprise powered by Apache Hadoop A new kind of data platform One place for unlimited any-type data Unified, multi-framework data access Key Advantages: High performance Enterprise system and data management Secure by default Open source, Open standards Security and Administration Unlimited Storage Proces s Discov er Model Serve Deployment Flexibility On-Premises Appliances Engineered Systems Public Cloud Private Cloud Hybrid Cloud
17 © Cloudera, Inc. All rights reserved. Data Drives Travel/Leisure Customer Segmentation Marketing Campaign Testing Regulatory Compliance
18 © Cloudera, Inc. All rights reserved. Data Drives Social
19 © Cloudera, Inc. All rights reserved. Data Drives Manufacturing Predictive maintenance Goods classification
20 © Cloudera, Inc. All rights reserved. Data Drives Healthcare Population Health Patient Monitoring Chronic Disease Management
21 © Cloudera, Inc. All rights reserved. MEDIA / ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care FINANCIAL SERVICES Risk & portfolio analysis New products CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment RETAIL Consumer sentiment Optimized marketing EDUCATION & RESEARCH Experiment sensor analysis LIFE SCIENCES Clinical trials Genomics AUTOMOTIVE Auto sensors reporting location, problems COMMUNICATIONS Location- based advertising HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis UTILITIES Smart Meter analysis for network capacity OIL & GAS Drilling exploration sensor analysis LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis Big Data takes on a lot of questions
22 © Cloudera, Inc. All rights reserved. 22 A Fortune 500 company specializing in agriculture and genomics can automate data-driven R&D decisions to reduce time to market from years to months. Ask Bigger Questions: How do we feed the world? ©2013 Cloudera, Inc. All rights reserved.22
23 © Cloudera, Inc. All rights reserved. Thank you! Alexander Bibighaus| Director of Engineering