© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

Oracle Exadata for SAP.
System Center 2012 R2 Overview
©2013 Avaya Inc. All rights reservedFebruary 26-28, 2013 | Orlando, FL.
Compuware Confidential. Do Not Duplicate THANK YOU APM in the cloud: Are you ready? By: Mike Taylor.
© 2009 VMware Inc. All rights reserved Architecting Virtualized Infrastructure for Big Data Richard CTO, Application Infrastructure,
Cloud Storage Theo Benson. Outline Distributed storage – Commodity server, limited resources, – Geodistribution, scalable, reliable Cassandra [FB] – High.
Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Software Defined Networking.
Cloud Storage Yizheng Chen. Outline Cassandra Hadoop/HDFS in Cloud Megastore.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
A NEW PLATFORM FOR A NEW ERA. 2 Pivotal Confidential–Internal Use Only 2 The Pivotal Big Data Suite.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Module – 7 network-attached storage (NAS)
Copyright © 2005 VMware, Inc. All rights reserved. VMware Virtualization Phil Anthony Virtual Systems Engineer
David Besemer, CTO On Demand Data Integration with Data Virtualization.
Scalability Module 6.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Apache Spark and the future of big data applications Eric Baldeschwieler.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
VIRTUALIZATION AND CLOUD COMPUTING Dr. John P. Abraham Professor, Computer Engineering UTPA.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
The Era of the Cloud OS: Transform the Datacentre
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Cloud Computing
What is Driving the Virtual Desktop? VMware View 4: Built for Desktops VMware View 4: Deployment References…Q&A Agenda.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
FUTURE OF NETWORKING SAJAN PAUL JUNIPER NETWORKS.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Server Virtualization & Disaster Recovery Ryerson University, Computer & Communication Services (CCS), Technical Support Group Eran Frank Manager, Technical.
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
VMware vSphere Configuration and Management v6
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Breaking points of traditional approach What if you could handle big data?
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Scalable and elastic Enterprise scale and performance for the largest workloads Shared- nothing live migration Hyper-V Network.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
© Copyright 2015 EMC Corporation. All rights reserved. EMC Isilon Scale-out NAS For Syncplicity.
Module Objectives At the end of the module, you will be able to:
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Simplify IT with Hyperconvergence June IT is Stuck Source: Goldman Sachs 40% 35% 30% 25% 20% 15% 10% 5% 0% 7%7% 7%7% 7%7% 36% 22% 8% 13% 8% 12%
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
BIG DATA/ Hadoop Interview Questions.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Microsoft Partner since 2011
Decentralized Distributed Storage System for Big Data Presenter: Wei Xie Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department.
Microsoft Ignite /28/2017 6:07 PM
Extreme Scale Infrastructure
Delivering on the Promise of a Virtualized Dynamic Data Center
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Organizations Are Embracing New Opportunities
Big Data Enterprise Patterns
Introduction to Distributed Platforms
Hybrid Management and Security
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Dr. John P. Abraham Professor, Computer Engineering UTPA
Clouds & Containers: Case Studies for Big Data
Presentation transcript:

© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware

2 Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise

3 Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis  Personalized Customer Targeting`

4 The Emerging Pattern of Big Data Systems: Retail Example Real-Time Streams Exa-scale Data Store Parallel Data Processing Parallel Data Processing Real-Time Processing Machine Learning Data Science Cloud Infrastructure Analytics

5 A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem  Proactively Maintained Connected Product

6 Storage: Plan for Peta-scale Data Storage and Processing PB of Data Analytics Rapidly Outgrows Traditional Data Size by 100x

7 Cloud Infrastructure Supports Mixed Big Data Workloads Machine Learning Hadoop Real-Time Analytics Change workload types to Real-time Analytics, Machine Learning, Hadoop above cloud infra, too Cloud Infrastructure Machine Learning Hadoop Real-Time Analytics Management Network/Security Storage/Availability Compute

8 Cloud Infrastructure Supports Multiple Tenants Change workload types to Real-time Analytics, Machine Learning, Hadoop above cloud infra, too Cloud Infrastructure Management Network/Security Storage/Availability Compute Web User Analytics Financial Analysis Historical Customer Behavior

9 Software-defined Datacenter: Compute Agility / Rapid deployment Lower Capex Isolation for resource control and security Operational efficiency Management The Core Values of Virtualization Apply to Big Data Network/Security Storage/Availability Compute

10 Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Nosy Workload 3 Cloud Infrastructure

11 Consolidation of workloads: Higher Utilization Hadoop 1 Hadoop 2 HBase Without virtualization independent Hadoop clusters each have access to fraction of total physical resources Consolidate and virtualize, -Consolidated cluster has access to entire pool of physical resources -For common use cases, reduce latency on priority jobs on consolidated cluster -Multiple HDFS striped across all physical hosts

12 Hadoop batch analysis Big Data Mix of Workloads File System/Data Store Host HBase real-time queries NoSQL Cassandra, Mongo, etc Big SQL Impala, Pivotal HawQ Compute layer Virtualization Host Other Spark, Shark, Solr, Platfora, Etc,…

13 Management Software-defined Datacenter: Storage Requirements of Next Generation Storage Network/Security Storage/Availability Compute 10x lower cost of storage Handle explosive data growth Support a variety of application types Solve the privacy and security issues

14 Software-defined Storage Enables Fundamental Economics Petabytes Deployed Traditional SAN/NAS Distributed Object Storage HDFS MAPR CEPH Scale-out NAS Isilon, NTAP

15 Big-Data using Local Disks Host Top of Rack Switch Servers with Local Disks core server SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotion etc,… High Performance 10GBE Switch per Rack

16 Big Data Storage Scale-out Network Storage Elastic Compute Scale-out Network Storage Hadoop Protocol Snapshots Posix Apps Full NFS Access Replication Erasure Coding

17 Customer Success: Hadoop as a Service at FedEx Scale-out Isilon Cluster -Shared Data -NAS + Hadoop Elastic vSphere Cluster -Mixed Workloads -vSphere -Existing Rack Mount Servers

18 Hadoop Virtual Node 2 NN data node Isilon Storage Configuration for Data/Compute Separation With Isilon Virtualization Host VMDK OS Image – VMDK Shared storage SAN/NAS OS Image – VMDK Hadoop Virtual Node 1 Ext4 Job- tracker Ext4 Temp OS Image – VMDK Ext4 Task- tracker Ext4 Hadoop Virtual Node 3 Ext4 Task- tracker Ext4

19 Agile Big Data at FedEx Trusted Isolation Well known auditable platform Security Deploy in minutes Optimize for shift in workload characteristics Agility Create true multi- tenancy Mixed workloads Elasticity

20 Breakthrough Use Cases  Web Log Analysis  Initial exploration was around detection of mobile devices accessing the website.  Analysis of 570 billion web server log entries took approximately 9 minutes to complete on a small cluster.  ZIP code Analysis  Analysis of data to determine which ZIP codes are the highest source or destination for shipments.  Shipment Analysis  Analysis of shipment information to determine patterns that may delay a package.

21 Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure

22 Q&A