Software Defined Infrastructure for Hadoop and Big Data

Slides:

Advertisements

Similar presentations

Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,

Advertisements

Joey Yep Technical Marketing, Seagate CSS Creating a Competitive Advantage with Cloud.

1 Vladimir Knežević Microsoft Software d.o.o.. 80% Održavanje 80% Održavanje 20% New Cost Reduction Keep Business Up & Running End User Productivity End.

Cisco and NetApp Confidential. Distributed under non-disclosure only. Name Date FlexPod Entry-level Solution FlexPod Value, Sized Right for Smaller Workloads.

© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.

INTRODUCING COMPELLENT – FASTEST GROWING SAN VENDOR Virtualized storage for enterprises and cloud data centers.

Citrix Partner Update The Citrix Delivery Centre.

“Better together” PowerVault virtualization solutions

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Cisco Confidential 1 © 2010 Cisco and/or its affiliates. All rights reserved. Data Center Solutions Marketing Data Center Business Advantage Customer Proof.

1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.

STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

Marty Sanders, VP of Americas Economic Benefits of Deploying Disruptive Hyperconvergence Technology September 2015.

Switched Storage Architecture Benefits Computer Measurements Group November 14 th, 2002 Yves Coderre.

Server Virtualization

 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

MidVision Enables Clients to Rent IBM WebSphere for Development, Test, and Peak Production Workloads in the Cloud on Microsoft Azure MICROSOFT AZURE ISV.

Tackling I/O Issues 1 David Race 16 March 2010.

© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.

Jenny Hobbs Consulting Systems Engineer April 2016 Business Case for Tailored Datacenter Integration (TDI)

Introduction To Cloud Computing By Diptee Chikmurge And Minakshi Vharkate Asst.Professor MIT AOE Alandi(D),Pune.

PHD Virtual Technologies “Reader’s Choice” Preferred product.

Extreme Scale Infrastructure

Journey to the HyperConverged Agile Infrastructure

READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.

E2800 Marco Deveronico All Flash or Hybrid system

DriveScale End User Sales Presentation

Chapter 6: Securing the Cloud

MERANTI Caused More Than 1.5 B$ Damage

DriveScale End User Sales Presentation

Avenues International Inc.

Organizations Are Embracing New Opportunities

By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani

DSS-G Configuration Bill Luken – April 10th , 2017

Utilize Internal Data via Mobile Business Apps

Module 2: DriveScale architecture and components

New Heights by Guiding Them into the Cloud

What is Cloud Computing - How cloud computing help your Business?

Boomerang Adds Smart Calendar Assistant and Reminders to Office 365 That Increase Productivity and Simplify Meeting Scheduling OFFICE 365 APP BUILDER.

Free Cloud Management Portal for Microsoft Azure Empowers Enterprise Users to Govern Their Cloud Spending and Optimize Cloud Usage and Planning MICROSOFT.

Cloud vs. On-premise 5 Advantages of Cloud Deployment

Couchbase Server is a NoSQL Database with a SQL-Based Query Language

Welcome! Thank you for joining us. We’ll get started in a few minutes.

Azure Hybrid Use Benefit Overview

Soft1 Open Enterprise Edition Allows Customers to Easily Synchronize Files Using Microsoft Office 365 and Seamlessly Store Any Information in SharePoint.

Customer Profile (Target)

Microsoft 365 Business Customer Targeting 2/6/18

Real IBM C exam questions and answers

SQL Server 2012 Licensing Overview.

Capitalize on modern technology

Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.

Hyperconvergence Your Way

11/19/2018 4:38 AM Microsoft 365 Business Customer Targeting Janine Brittain - EXEED 2/6/18 © Microsoft Corporation. All rights reserved. MICROSOFT.

Excelian Grid as a Service Offers Compute Power for a Variety of Scenarios, with Infrastructure on Microsoft Azure and Costs Aligned to Actual Use MICROSOFT.

Unitrends Enterprise Backup Solution Offers Backup and Recovery of Data in the Microsoft Azure Cloud for Better Protection of Virtual and Physical Systems.

CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.

The Jamespot for Office 365 Application Attaches Business Processes to Docs and Syncs Them to OneDrive to Simplify Collaboration and Sharing OFFICE 365.

Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.

Office 365 and Microsoft Project Integrations for HULAK Project Management Software Enable Teams to Remain Productive and Within Budget OFFICE 365 APP.

XtremeData on the Microsoft Azure Cloud Platform:

AWS Cloud Computing Masaki.

IBM Power Systems.

Guarantee Hyper-V, System Center Performance and Autoscale to Microsoft Azure with Application Performance Control System from VMTurbo MICROSOFT AZURE.

Composable Infrastructure for Data Intensive Workloads

CS 295: Modern Systems Organizing Storage Devices

OpenStack for the Enterprise

Presentation transcript:

Software Defined Infrastructure for Hadoop and Big Data DriveScale Software Defined Infrastructure for Hadoop and Big Data DriveScale Use Cases April 24, 2017

Presentation Overview Target Users Use Cases / Deployment Scenarios Reference Accounts ©2017 DriveScale Inc. All Rights Reserved.

DriveScale Target Customers Big Data Applications – Hadoop, Spark, Cassandra, NoSQL, etc On-Premise Applications – Private and Hybrid Public Cloud Concerned about infrastructure costs & wasted spend Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing Approaching Power / Rack Space Limits Want to share infrastructure across multiple applications Want to buy storage and compute independently (storage or compute bound)

DriveScale Target Customers – Common questions/statements How can I buy compute separate from storage? I just want to buy more compute, or I just want to buy more storage I want to virtualize Hadoop like AWS does… How can I increase my utilization? I want to share storage among server nodes with an NAS (i.e. Isilon) I need to lower my infrastructure costs Compared to status quo… Compared to NAS… Compared to Cloud…

Benefits of Software Defined Infrastructure to Big Data Operators # Benefit Details 1 Lower Capital Costs Reduced server costs (don’t have to buy disks) Buy Less Storage (higher utilization) Less Rack space needed (more dense CPU and Storage) Lower Disk cost (3.5” disks cost less than 2.5”) 2 Lower Operational Cost Add storage without physical labor Replace failed drives without labor Less equipment = reduced power 3 Speed up Big Data Deployments Faster Time to Value Create new clusters and nodes in minutes instead of weeks Integrated with Cloudera Director and Horton CloudBreak Share resources among multiple applications and clusters The benefits of this system are obvious. You save significantly in capital expenditure when you look at a 5 year horizon of server+disk purchases. Today, when you refresh your servers, you have to move all the data in them off to some other location, which is a time consuming process. With the DriveScale system, this is no longer necessary. You have all the levers and knobs you need to move resources to the workloads. Underutilized resources can be taken away from clusters and moved elsewhere. This also therefore means that you need less total hardware than if you were building discrete silos of equipment. That results in costs savings, rack space savings and power savings. Finally, the tools we provide help you provision clusters in minutes instead of days.

SDI Revolutionizes Big Data Storage 3 Good 2 Fair 1 Poor Storage Type DAS (Direct Attached Storage) DriveScale SW Defined Infrastructure Centralized storage NAS or SAN Comments Cost 5-10x Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need (dedup, RAID, erasure encoding, etc.) Performance 1/2 - 1/4 Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth Utilization 30-50% Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work. Adaptability (ability to change node storage) none Re-define your server and storage infrastructure as application needs change. Scalability anti-hadoop Give nodes direct access to their own disks. Don’t share file systems (“nothing shared”). 2 3 1 3 3 1 1 3 3 1 3 2 3 3 1

DriveScale Vs Isilon Enterprise NAS Category Isilon DriveScale Price $0.50 - $1.00 per GB 1/5 $0.10 - $0.15 per GB Storage Performance Peak Bandwidth: 2.4GB/sec per Isilon Node 4x Peak Bandwidth: 10GB/sec per DSA Hadoop/HDFS version compatibility 1-1.5 year delay from latest Hadoop (Isilon FS needs to continuously be made compatible) Always current (sits below HDFS layer) Storage Scale 288 Compute Nodes (144 Isilon Nodes) 4000+ Compute Nodes. Drive Density 6 – 9 Drives per RU 12 – 18 Drives per RU Network Complexity Two Networks (Ethernet and Infiniband) One Network (Ethernet) Compute/Storage Ratio 2:1 (2 Compute nodes per Isilon node) 16:1 (16 Compute nodes per DSA) Minimum config 3 Isilon Nodes 1 JBOD/DSA

Hadoop Storage Needs vs Various Storage Types Data Locality Converged compute & storage Replica-tion Extreme Read BW Low Cost Commodity Total Examples DAS & DriveScale SDI ✔ Dell, HP (nonApollo), Cisco, SuperMicro, etc. NAS - Enterprise ✖ Isilon, Netapp, Gluster NAS - HPC Lustre, GPFS SAN/Block - External ? ScaleIO, Ceph, Datera, Cinder, AWS EBS SAN/Block - Hyperconverged Nutanix, ScaleIO, Robin Object AWS S3, Scality, Swift, EMC ECS ✔ ✖ ? Optimal SubOptimal Debatable

(Direct Attached Storage) Benefits of Software Defined Infrastructure $ Lower Server Costs - Example With DAS with DriveScale 2RU Server with DAS (Direct Attached Storage) 1RU diskless Server System/CPU/MoBo/NICs RAM DISK $3,065 $1,142 $12,832 $2,063 $1,246 $172 System/CPU/MoBo/NICs RAM DISK (1TB for OS) 79.5% savings + + TOTAL $17,039 $3,481 TOTAL Save $13,558 per server node. x 1000 nodes = $13M 9

Benefits of Software Defined Infrastructure $ Lower Disk Costs - Example With DAS with DriveScale 2.5” drive DAS 3.5” drive in JBOD 48% savings $802.27 per 2.5” drive = $0.45/GB $417.62 per 3.5” drive = $0.22/GB Save $385 per drive. Save $192,000 per PetaByte Also note, the sweet spot for 3.5” drives is 8TB. Dell sells these for their JBOD for $794.40, which is only $0.09/GB, (8TB 7.2K RPM NLSAS 12Gbps 512e 3.5in Hot-plug Hard Drive [$794.40]) 10 10

Per Rack Cost; Commodity Server MFG with and without DriveScale (higher drive utilization) with DriveScale 2 x 48 port 10GbE switches (ON-4940S) 2 x 48 port 10GbE switches (ON-4940S) 43% Savings 20 x R730 2RU Servers w/20 2/5” 1.2TB drives each 20 x R430 1RU Servers $346,410 Day 1 Cost (only 2 JBODs) $198,504 Open RackSpace for more servers 2 x DriveScale Adaptor 2 x JBOD w/60 3.5” 2.0TB drives each 240 Terabytes Storage

Per Rack Cost; Commodity Server MFG with and without DriveScale (equal storage) with DriveScale 2 x 48 port 10GbE switches (ON-4940S) 2 x 48 port 10GbE switches (ON-4940S) 20 x R730 2RU Servers w/20 2/5” 1.2TB drives each 20 x R430 1RU Servers $692,820 4-year Cost (server refresh) $452,208 35% Savings 4 x DriveScale Adaptor 4 x JBOD w/60 3.5” 2.0TB drives each

DriveScale Deployment Scenarios Preserve your existing investment Greenfield Start off on the right foot with DriveScale Software Defined Infrastructure Existing Hadoop – Need more storage on nodes (i.e storage bound) Just add a JBOD and a DSA, and add disks to existing servers Existing Hadoop – Rebalance Storage between Nodes and Clusters Just add a JBOD and a DSA, pull unused disks from DAS servers to JBOD, and redistribute storage and nodes into new clusters. Existing Hadoop – Need more compute nodes (i.e. compute bound) Add a compute server, JBOD, and DSA. And you can add more nodes to your cluster without buying more DAS. Existing Hadoop – Moving from Public to Private or Hybrid Cloud Cloudera Director and HortonWorks CloudBreak deployed you to the cloud? They are both integrated with DriveScale to deploy your private cloud the same way. You can even have a single cluster span your private and public clouds (i.e. hybrid).

DriveScale Reference Customers ©2017 DriveScale Inc. All Rights Reserved.

DriveScale Target Customers Big Data Applications – Hadoop, Spark, Cassandra, NoSQL, etc On-Premise Applications – Private and Hybrid Public Cloud Concerned about infrastructure costs & wasted spend Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing Approaching Power / Rack Space Limits Want to share infrastructure across multiple applications Want to buy storage and compute independently (storage or compute bound)

DriveScale Target Customers – Common questions/statements How can I buy compute separate from storage? I just want to buy more compute, or I just want to buy more storage I want to virtualize Hadoop like AWS does… How can I increase my utilization? I want to share storage among server nodes with an NAS (i.e. Isilon) I need to lower my infrastructure costs Compared to status quo… Compared to NAS… Compared to Cloud…

DriveScale Customer Reference - Clearsense Technology company which transforms data from electronic medical records (EMRs), supporting source systems, & legacy databases to enable new revenue streams for healthcare Real-time analytics solution ingests many kinds of data from existing EMRs & data warehouses, and puts them into one platform to serve up analytics After 18 months in production at AWS, they were unhappy with responsiveness and costs, and decided to build a Private Cloud Hadoop responsiveness is something that they tout to their customers, and this technology supports that via infrastructure flexibility DriveScale Confidential Information © 2016

DriveScale Customer Reference - Clearsense “DriveScale helps us build a ‘future-proof’ infrastructure. As our customer needs increase, we can now respond more quickly and without having to make massive capital expenditures.” “We spent a lot of time engineering our software environment around real-time needs, ensuring a productive, scalable and economically viable environment. Now, with DriveScale, we have the ability to do the same thing with hardware,” Charles Boicey, Chief Innovation Officer at ClearSense DriveScale Confidential Information © 2016

DriveScale Customer Case Study: AppNexus Global technology company whose private cloud-based software platform enables and optimizes programmatic online advertising Serves 10s of billions ad buys per day; revenue > $2B annually 2200 Nodes split between Hadoop & Hbase Their customers are asking for more vcore and memory, but not more storage – they wanted to scale compute independent of storage Issues we solve: Highly Compute-bound environment – purchase compute-only nodes Want to reduce Operational impact of refreshing their servers Rack / Power constraints also an issue for them DriveScale Confidential Information © 2016

Timothy Smith, SVP of Technical Operations at AppNexus "DriveScale understood one of our core requirements, namely the desire to manage CPU and storage resources as separate pools.” “Storage and server technology upgrades move on two different timetables. Without DriveScale, we are forced into storage refresh cycles that aren't strictly necessary, and are very cumbersome. Separating storage from compute allows us to upgrade or reallocate compute resources independent of storage. “ “DriveScale significantly decreases the operations workload, as a result increasing the velocity of delivering new products and features to our customer base. DriveScale will also help us reduce wasted resources trapped in siloed clusters and thus contribute directly to our bottom line." DriveScale Confidential Information © 2016

Cloudera Do’s and Don’t of building private Infrastructure… Avoid Isilon for large (>100 servers) Hadoop infrastructure DriveScale behaves just like Locally Attached Storage

DriveScale (DS) vs HPE BDRA TCO for DriveScale is approximately 38% cheaper just for HPE hardware costs Additional Cloudera/Hortonworks licenses for each of the data nodes in BDRA Apollo 4200 which serves as the data node has a single controller and hence becomes a single point of failure (SPOF) for BDRA - If a node fails the all the 28 drives needs to be replicated on the other storage nodes and drives. Ex: If the drives are 6TB each then, the network is flooded with 168 TB of data at once. - The network is flooded with the data from the 28 drives to be replicated and the ongoing jobs will be delayed and are affected with the data replication - Spare Apollo 4200 data nodes might be required based on the amount of data in each data node. - With DS we have 4 path to each drive in the JBOD with dual IO controllers. - With DS we can sustain with 1 JBOD IO controller, 1 switch and 1DSA failures. 40Gb switches are required for BDRA. 10Gb switches for DS. OS maintenance on data nodes: If there an OS update required, then the entire node with 28 drives needs to be put under maintenance mode and that would also trigger replication of data across the other available storage nodes Firmware upgrades of storage nodes poses a similar issue as above Every read/write in a BDRA cluster is over the network One of the reasons that traditional Hadoop deployments stripe data across a large number of nodes with a few disks each, is that the ‘failure locality’ is more manageable. If a large amount of data is stored on a single node, the penalty for failure is tremendous which is a problem for BDRA. DriveScale Confidential Information © 2017

Summary ©2017 DriveScale Inc. All Rights Reserved.

DriveScale Confidential Information © 2017 DriveScale Summary First Enterprise-Class Architecture designed for Scale Out Infrastructure Eliminate Overprovisioning Lower TCO Improve Agility to workload changes Integrated with Cloudera Director GENE To wrap it all up, We believe there is a big market opportunity here as the industry moves to scale-out. We have the first enterprise architecture designed from the ground up for this new world. And finally, we have a team that knows how to build these kinds of products. DriveScale Confidential Information © 2017

DriveScale’s Core Value Propositions Save Money Lower Server and Storage CapEx Reduce Time to Value of your Big Data initiative Improve Utilization 3x better utilization of hardware resources Modify Infrastructure on demand to respond to changing workloads Multiple workloads on the same infrastructure Simplify Everything Commodity HW and fewer composable elements Software controlled Infra One-click deployment DriveScale value propositions are: First and foremost, alleviating pain around rigid infrastructure. We provide flexible and responsive physical infrastructure so that admins can provision exactly what’s needed and rebalance resources on demand Second, we provide a solution that is simple to deploy whether small or large - we provide a solution that is functionally and performance-wise equivalent to the standard rack server model, without changes required to an app stack. We provide a comprehensive set of REST APIs to make this system fully automatable, which is table stakes at scale. Finally, this has been designed from the start to be an enterprise grade solution, from high availability throughout the system to investments in security and an acknowledgement that people would want to bring their own server and JBOD hardware to the table.

Questions and Answers ©2017 DriveScale Inc. All Rights Reserved.

Frequently Asked Questions Q1) Can I continue to buy servers and storage from my preferred vendors? A1) Yes. Drivescale does not sell server or storage, or JBODS. But all the big vendors do. The only HW you will need from DriveScale is DSA (DriveScale Adaptor), to convert the JBOD SAS interface to Ethernet. 4 DSA (1RU chassis full) per high performance JBOD. Q2) Does DriveScale SDI Disaggregation work for more than just Hadoop? A2) Yes. It will work for any application. However, there may be performance implications for _non_ Big Data applications that use “small” blocks. Q3) What is the minimum storage required in “diskless” compute servers? A3) Any drive, as small as possible, just for the Operating System. Could be USB or small FLASH. All data stored on JBOD disks. Q4) Does DriveScale support SSD in JBODs? A4) Yes. The JBOD can have many types of heterogenous storage, as long as SAS compatible: HDD, High or low RPM, High or low Capacity, SSD, etc. Q5) Is there a network impact of accessing disks over iscsi/Ethernet vs direct attached SAS? A5) Minimal to zero impact; HDFS storage access peaks during copy of data from one node to another. This requires one node to read its disk (NIC RX port activity from disk) and transfer the data to another node (NIC TX port activity to other node), the receiving node does the inverse. In both nodes, there is an increase in network traffic to read or write the disk, but this increased traffic is on the opposite side of the bidirectional link used for internode traffic, so there is no performance impact. TestDFSIO performance testing results confirm this answer.

Thank You ©2017 DriveScale Inc. All Rights Reserved.