Succeeding with Big Data Analytics in the Cloud Using Remote Storage

Slides:



Advertisements
Similar presentations
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Advertisements

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Cloud Storage WA Blobs Schema Management APIs & Portal Web Roles Integration Pipeline 3 rd Party Web Services 3 rd Party Store 3 rd Party.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Building Social Games for Windows 8 with Windows Azure Name Title Microsoft Corporation.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

Mastering Azure Connectivity to the Microsoft Cloud
Azure.
Use relational database as a service
Dev and Test Solution reference architecture.
1/27/2018 5:13 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Build /26/2018 6:17 AM Building Resilient, Scalable Services with Microsoft Azure Service Fabric Érsek © 2015 Microsoft Corporation.
Business Continuity & Disaster Recovery
Mastering Azure Connectivity to the Microsoft Cloud Session 3.
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
5/14/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Dev and Test Solution reference architecture.
Dev and Test Solution reference architecture.
Bring innovation to your mission/services with Azure Government
You Bought Some Azure – Now What? (don’t panic)
Use any Amazon S3 application with Azure Blob Storage
Melbourne Azure Meetup
What has Azure to offer to IoT Developers?
Developing Hybrid Apps on Microsoft Azure Stack
IoT at the Edge Technical guidance deck.
Dev and Test Solution reference architecture.
Mastering Azure Connectivity to the Microsoft Cloud
Use IaaS as a starting point in your cloud journey
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
8/6/ :17 AM THR2214 Hybrid Cloud Activated A customer case study optimizing on-premises & Azure performance and cost Mor Cohen-Tal Senior Product.
Dev and Test Solution reference architecture.
Information Protection
Data Platform and Analytics Foundational Training
Microsoft /5/2018 6:52 PM Microsoft & RedHAT
Cyber Resiliency: Best Policy & Regulatory Practices Mike Yeh Assistant General Counsel, Middle East & Africa 20 November 2017.
Azure.
Business Continuity & Disaster Recovery
Melbourne Azure Meetup
IoT at the Edge Technical guidance deck.
Overview of Azure Data Lake Store
CloudSimplified.IO.
Setting up team development infrastructure for SharePoint 2013
Windows Azure 講師: 李智樺, Ruddy Lee
Design big data applications using Azure Storage
Melbourne Azure Meetup
Disaster Recovery as a Service
Title of Presentation 12/2/2018 3:48 PM
ARM and Compliance Vishwas Lele & Jason McNutt
Dev and Test Solution reference architecture.
TechEd /15/2019 8:08 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Context about the Data Warehouse
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Windows Azure Overview
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
Шитманов Дархан Қаражанұлы Тарих пәнінің
*AZs available across US, Europe and Asia
Title of Presentation 5/24/2019 1:26 PM
Making Windows Azure Relevant to IT Professionals
Microsoft Virtual Academy
Presentation transcript:

Succeeding with Big Data Analytics in the Cloud Using Remote Storage 9/14/2018 9:19 AM Succeeding with Big Data Analytics in the Cloud Using Remote Storage James Baker Principal Program Manager Microsoft Azure Storage jamesbak@microsoft.com © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

What are Converged & Disaggregated Storage Systems? Converged – Data and compute are converged on same machine/rack. Disaggregated – The compute and storage services are separated. Converged – Data locality is a measure of distance between compute and storage. Disaggregated – The compute and storage services are separated., both physically and from system architecture perspective.

A History of Data Locality 9/14/2018 9:19 AM A History of Data Locality Network was initial bottleneck Hadoop & HDFS added extensive support for data locality Data locality sensitive systems require additional scheduling logic Hadoop FileSystem design provided an abstraction layer for non-HDFS storage systems The quest for disk-locality is based on two assumptions: (a) disk bandwidths exceed network bandwidths, (b) disk I/O constitutes a considerable fraction of a task’s lifetime © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Initial Drivers for Disaggregated Storage in Analytics Cloud design point - compute and storage scale separately Cloud data center network design is more optimized Cloud object stores designed to be hyper-scalable NAS providers require disaggregated model

Hadoop File System and Azure Storage Applications Hadoop Shell Commands WASB Improvements Vastly reduce ramp up time for workloads from Azure HDInsight clusters minimizing throttling/timeouts due to ramp up Improved support for HDFS Flush and Sync semantics Improved handling of intermittent errors without failing HDI jobs Java Storage SDK version upgraded to provide larger blob size support Performance improvements for data reads / writes Hadoop FileSystem API Interface (Extensible) HDFS Azure Storage Integration (WASB) …

What Happened to the Network? Since 2000 network throughput has increased ~600x faster than HDD Flatter network architectures result in fewer network hops Can read 1MB of data from a remote memory 60x times faster than reading from a local HDD and 4x faster than from local SSD Ratio of Ethernet bandwidth relative to HDD bandwidth measured over time

What Does this Mean for my Analytics Workloads? Cloud storage services capable of delivering > 1Tbps aggregate bandwidth Common VMs > 20Gbps NIC Can meet performance requirements of largest analytics jobs Clusters are stateless, therefore transient and on-demand No scheduling contention for hot data

Is the Performance Real? Benchmark Method TeraGen duration (min) TeraSort duration (min) Remote (WASB) 29 191 Local HDFS 25 212

What Does this Mean for my Analytics Platform? Data is readily available for analytics engines outside cluster (eg. PaaS) Direct ingest and consumption of data Data integration becomes a no-op Ingest & ETL Streaming Analytics & Machine Learning Data Aggregation Presentation/Consumption Machine Learning Stream Analytics Batch Functions Data Lake Analytics Data Factory App Insights Log Analytics Monitor IoT Hub Event Hubs Data Warehouse CDN Search Power BI HDInsight Cognitive services Data Box Web App Blob Storage Blob Storage Pillars Open & Interoperable Manageable & Cost Efficient Scalable & Performant Secure & Compliant Durable & Available

Lower Transaction cost 9/14/2018 9:19 AM Tiered Storage Blob-Level Tiering Individual blob can move between tiers All tiers of blobs co-exist in the a storage account New Storage Tier – Archive Storage Cold storage for long term data Retrieval latency is hours Consistent API Among Storage Tiers Access through Blob REST API Support direct writes to Archive Blob REST API Blob REST API Other Tiers Future Additions Hot Tier Lower Transaction cost Cool Tier Lower Capacity cost Archive Tier Lowest Capacity cost Blob-Level Tiering © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Security - Encryption Storage Service Encryption at Rest Microsoft Build 2017 9/14/2018 9:19 AM Security - Encryption Data disclosure prevention from physical disk compromise Available now for Blob & File Storage with MS managed keys Customer managed encryption keys - Preview H2 2017 Encryption on for all accounts - H2 2017 Storage Service Encryption at Rest Storage Service Encryption in transit Storage REST APIs support HTTPS SAS Tokens can be restricted for HTTPS only (New) “Secure Transfer” option limits all access to HTTPS only. AES based, CBC mode encryption @ source with MS provided or KeyVault based keys Available in C#, Java, Python Range downloads supported Client side Encryption at source © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Security – Network Layered security for Storage Microsoft Build 2017 9/14/2018 9:19 AM Security – Network Layered security for Storage Protection from key disclosure threats Limit access to specific Azure VNets or public internet IP address ranges Public Preview available now © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure has the most comprehensive compliance coverage 9/14/2018 Compliance Azure has the most comprehensive compliance coverage CSA STAR Attestation CSA STAR Certification CSA STAR Self-Assessment ISO 22301 ISO 27001 ISO 27017 ISO 27018 SOC 1 Type 2 SOC 2 Type 2 Global U.S. Government CJIS DoD DISA SRG Level 2 DoD DISA SRG Level 4 DoD DISA SRG Level 5 FedRAMP FIPS 140-2 High JAB P-ATO IRS 1075 ITAR Moderate JAB P-ATO Section 508 VPAT SP 800-171 Industry CDSA FACT UK FERPA FFIEC FISC Japan GLBA GxP 21 CFR Part 11 HIPAA/HITECH HITRUST IG Toolkit UK MARS-E MPAA PCI DSS Level 1 Shared Assessments Regional Argentina PDPA Australia IRAP/CCSL Canada Privacy Laws China DJCP China GB 18030 China TRUCS ENISA IAF EU Model Clauses EU-US Privacy Shield Germany IT Grundschutz India MeitY Japan CS Mark Gold Japan My Number Act New Zealand GCIO Singapore MTCS Spain DPA Spain ENS UK G-Cloud © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Thank you Questions? jamesbak@microsoft.com 9/14/2018 9:19 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.