Download presentation
Presentation is loading. Please wait.
Published byGerard O’Neal’ Modified over 6 years ago
1
Succeeding with Big Data Analytics in the Cloud Using Remote Storage
9/14/2018 9:19 AM Succeeding with Big Data Analytics in the Cloud Using Remote Storage James Baker Principal Program Manager Microsoft Azure Storage © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
What are Converged & Disaggregated Storage Systems?
Converged – Data and compute are converged on same machine/rack. Disaggregated – The compute and storage services are separated. Converged – Data locality is a measure of distance between compute and storage. Disaggregated – The compute and storage services are separated., both physically and from system architecture perspective.
3
A History of Data Locality
9/14/2018 9:19 AM A History of Data Locality Network was initial bottleneck Hadoop & HDFS added extensive support for data locality Data locality sensitive systems require additional scheduling logic Hadoop FileSystem design provided an abstraction layer for non-HDFS storage systems The quest for disk-locality is based on two assumptions: (a) disk bandwidths exceed network bandwidths, (b) disk I/O constitutes a considerable fraction of a task’s lifetime © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4
Initial Drivers for Disaggregated Storage in Analytics
Cloud design point - compute and storage scale separately Cloud data center network design is more optimized Cloud object stores designed to be hyper-scalable NAS providers require disaggregated model
5
Hadoop File System and Azure Storage
Applications Hadoop Shell Commands WASB Improvements Vastly reduce ramp up time for workloads from Azure HDInsight clusters minimizing throttling/timeouts due to ramp up Improved support for HDFS Flush and Sync semantics Improved handling of intermittent errors without failing HDI jobs Java Storage SDK version upgraded to provide larger blob size support Performance improvements for data reads / writes Hadoop FileSystem API Interface (Extensible) HDFS Azure Storage Integration (WASB) …
6
What Happened to the Network?
Since 2000 network throughput has increased ~600x faster than HDD Flatter network architectures result in fewer network hops Can read 1MB of data from a remote memory 60x times faster than reading from a local HDD and 4x faster than from local SSD Ratio of Ethernet bandwidth relative to HDD bandwidth measured over time
7
What Does this Mean for my Analytics Workloads?
Cloud storage services capable of delivering > 1Tbps aggregate bandwidth Common VMs > 20Gbps NIC Can meet performance requirements of largest analytics jobs Clusters are stateless, therefore transient and on-demand No scheduling contention for hot data
8
Is the Performance Real?
Benchmark Method TeraGen duration (min) TeraSort duration (min) Remote (WASB) 29 191 Local HDFS 25 212
9
What Does this Mean for my Analytics Platform?
Data is readily available for analytics engines outside cluster (eg. PaaS) Direct ingest and consumption of data Data integration becomes a no-op Ingest & ETL Streaming Analytics & Machine Learning Data Aggregation Presentation/Consumption Machine Learning Stream Analytics Batch Functions Data Lake Analytics Data Factory App Insights Log Analytics Monitor IoT Hub Event Hubs Data Warehouse CDN Search Power BI HDInsight Cognitive services Data Box Web App Blob Storage Blob Storage Pillars Open & Interoperable Manageable & Cost Efficient Scalable & Performant Secure & Compliant Durable & Available
10
Lower Transaction cost
9/14/2018 9:19 AM Tiered Storage Blob-Level Tiering Individual blob can move between tiers All tiers of blobs co-exist in the a storage account New Storage Tier – Archive Storage Cold storage for long term data Retrieval latency is hours Consistent API Among Storage Tiers Access through Blob REST API Support direct writes to Archive Blob REST API Blob REST API Other Tiers Future Additions Hot Tier Lower Transaction cost Cool Tier Lower Capacity cost Archive Tier Lowest Capacity cost Blob-Level Tiering © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
Security - Encryption Storage Service Encryption at Rest
Microsoft Build 2017 9/14/2018 9:19 AM Security - Encryption Data disclosure prevention from physical disk compromise Available now for Blob & File Storage with MS managed keys Customer managed encryption keys - Preview H2 2017 Encryption on for all accounts - H2 2017 Storage Service Encryption at Rest Storage Service Encryption in transit Storage REST APIs support HTTPS SAS Tokens can be restricted for HTTPS only (New) “Secure Transfer” option limits all access to HTTPS only. AES based, CBC mode source with MS provided or KeyVault based keys Available in C#, Java, Python Range downloads supported Client side Encryption at source © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
Security – Network Layered security for Storage
Microsoft Build 2017 9/14/2018 9:19 AM Security – Network Layered security for Storage Protection from key disclosure threats Limit access to specific Azure VNets or public internet IP address ranges Public Preview available now © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Azure has the most comprehensive compliance coverage
9/14/2018 Compliance Azure has the most comprehensive compliance coverage CSA STAR Attestation CSA STAR Certification CSA STAR Self-Assessment ISO 22301 ISO 27001 ISO 27017 ISO 27018 SOC 1 Type 2 SOC 2 Type 2 Global U.S. Government CJIS DoD DISA SRG Level 2 DoD DISA SRG Level 4 DoD DISA SRG Level 5 FedRAMP FIPS 140-2 High JAB P-ATO IRS 1075 ITAR Moderate JAB P-ATO Section 508 VPAT SP Industry CDSA FACT UK FERPA FFIEC FISC Japan GLBA GxP 21 CFR Part 11 HIPAA/HITECH HITRUST IG Toolkit UK MARS-E MPAA PCI DSS Level 1 Shared Assessments Regional Argentina PDPA Australia IRAP/CCSL Canada Privacy Laws China DJCP China GB 18030 China TRUCS ENISA IAF EU Model Clauses EU-US Privacy Shield Germany IT Grundschutz India MeitY Japan CS Mark Gold Japan My Number Act New Zealand GCIO Singapore MTCS Spain DPA Spain ENS UK G-Cloud © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Thank you Questions? jamesbak@microsoft.com 9/14/2018 9:19 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.