Data Warehouse in the Cloud – Marketing or Reality?

Slides:



Advertisements
Similar presentations
Big Data Working with Terabytes in SQL Server Andrew Novick
Advertisements

System Center 2012 R2 Overview
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
Windows Azure SQL Database and Storage Name Title Organization.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Data platform market will be 36.2B by 2012Database key in hosted scenarios.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
SQL Server 2014: Overview Phil ssistalk.com.
Windows Azure Conference 2014 Designing Applications for Scalability.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
Preview JUNE 2012 Introduced Windows Azure Infrastructure Services General Availability APRIL 2013 Commercially-backed SLA and formal support agreements.
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
7 Strategies for Extracting, Transforming, and Loading.
Martin Cairney Hybrid data platform – making the most of Azure plus your on- prem kit DAT341 B.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Tweak Performance and Improve Availability of your Microsoft Azure VMs Rick
Windows Azure Overview for IT Pros Anton Boyko. Intro to Cloud Computing Intro to Windows Azure Cloud Services Web Sites Virtual Machines Workload Options.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
27-29 August NIMHANS Convention Centre, Bangalore, India. Modernizing The Data Platform Flying Through the Clouds Performance Tuning in Azure Joey.
Azure SQL Database Lori Clark SQL Saturday 10/17/2015.
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
Sql Server Architecture for World Domination Tristan Wilson.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
PHD Virtual Technologies “Reader’s Choice” Preferred product.
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
Use relational database as a service
IT06 – HAVE YOUR OWN DYNAMICS NAV TEST ENVIRONMENT IN 90 MINUTES
Run Azure Services in your datacenter
Deploying SQL Server With Microsoft Azure Virtual Machine
Data Platform and Analytics Foundational Training
Flash Storage 101 Revolutionizing Databases
Lead SQL BankofAmerica Blog: SQLHarry.com
Cloud Data platform (Cloud Application Development & Deployment)
Very Large Databases in your future
Example of a page header
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Windows Azure Migrating SQL Server Workloads
Installation and database instance essentials
Design and Implement Cloud Data Platform Solutions
Fast Start for Microsoft Azure – SQL Server IaaS Workshop
02 | Design and implement database
A developers guide to Azure SQL Data Warehouse
Business Continuity & Disaster Recovery
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Capitalize on modern technology
Welcome! Power BI User Group (PUG)
SQL On Azure Parikshit Savjani, Sr. Premier Field Engineer.
What is the Azure SQL Datawarehouse?
What Azure have to offer for your data
TechEd /23/ :44 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
A developers guide to Azure SQL Data Warehouse
Welcome! Power BI User Group (PUG)
Microsoft Virtual Academy
Outline Virtualization Cloud Computing Microsoft Azure Platform
20 Questions with Azure SQL Data Warehouse
Very large Databases in your future Eric Peterson.
BusinessObjects IN Cloud ……InfoSol’s story
MDC-B203 Deploying Applications in Microsoft System Center Virtual Machine Manager Using Services John Messec Program Manager Microsoft.
Power BI with Analysis Services
Windows Azure Hybrid Architectures and Patterns
Dell EMC SQL Server Solutions Doug Bernhardt
Azure Data Storage Options
Moving your on-prem data warehouse to cloud. What are your options?
06 | SQL Server and the Cloud
The Database World of Azure
Presentation transcript:

Data Warehouse in the Cloud – Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team

Data Warehouse we used to know High-End workload High-End hardware Special know-how

Reality is Thousands of departmental level DW Relatively low perf SLA *BeyeNetwork Big Data research

New BI demands Utilize external data sources Non Structured Data Origin is in the Cloud *BeyeNetwork Big Data research

New opportunity Platform is there “Closer” to data Iaas SQL VM Paas SQL Azure DB “Closer” to data Less administrative overhead Lower initial and TCO cost

SQL Server Data Warehousing in Windows Azure Virtual Machines Inspired by the Fast Track Reference Architecture guide Based on the High Memory images Up to 1TB MSDN: SQL Server Data Warehousing in Windows Azure Virtual Machines

High Memory VM in Azure

How to deploy Powershell script Windows Azure Gallery

The Azure Data Warehouse under the hood

Data Warehouse Lifecycle Thoughts on the architecture Creating DB Connectivity Populating Database Initial data loading OR Backup/Restore Incremental data loading Compression Query performance Architecture

Thoughts on the architecture Data Loading Minimize Log impact Scale loading streams Do not invent the wheel and follow the Data loading Performance guide Query Performance ! Do not invent the wheel and follow the Data loading Performance guide

Windows Azure VM Architecture Disks implemented as a shared multi-tenant service Built-in triple redundancy, optional geo-redundancy Performance less predictable than on-prem Host machines, storage services, network bandwidth shared between subscribers Perf can depend on where and when VM is provisioned Subject to maintenance operations Granular control & configurability vs. cost, simplicity, out of box redundancy Storage Stamp Stream Layer Partition Layer Front-ends LB Intra-stamp replication Geo-replication Storage Location Service To achieve the same level of redundancy in on-premises deployments, you would need to set up multiple disk arrays in multiple locations and a synchronization mechanism, such as, a storage area network (SAN) replication

Tweaks to improve IO Subsystem Database file initialization GPEdit.msc Data file placement SQL Striping for User Data and TempDB Aggregated throughput Set the size and data grow options wisely *You may do it differently. Then Create 350GB DB took ~3 hours Plan maintenance All options are there , but you need to double-check if they all correctly set and this why

Scaling IO Options Windows Storage Spaces SQL Data Files Log drive Not clear support story Spread File Group over all drives Windows 2012 Storage the Mention block sizes Separate the Log and Data disk

Scaling IO Options Data disk (read) LOG (write) SQLIO Single Data Disk (256K) SQLIO Windows Storage Spaces X3 Disks (256K) SQLIO SQL Striping x3 Disk CUMULATIVE DATA: throughput metrics: IOs/sec: 288 MBs/sec: 71.98 IOs/sec: 640.87 MBs/sec: 160.21 IOs/sec: 599.91 MBs/sec: 149.97 SQLIO Single Data Disk (64K) SQLIO Windows Storage Spaces X3 Disks (64K) SQLIO SQL Striping x3 Disk (64K) * CUMULATIVE DATA: throughput metrics: IOs/sec: 1215.13 MBs/sec: 75.94 IOs/sec: 2677.69 MBs/sec: 167.35 IOs/sec: 2742.22 MBs/sec: 171.38 * But we can access one file at the time!

Connectivity Options Windows Azure VM End Points Point-to-Site /Site-to-Site *Other options are also available ( FTP)

What and how we tested TPCH – star schema Workload is know, we wanted to see in the Cloud Size 200 GB

Getting initial data Copy backup to the Data Disks Backup/Restore to/from URL ETL to the new DB

URL is fast! Backup to the Local Data Disk Backup to the URL DB Size Time Speed 244GB 3 hours 22,978 MB/sec DB Size Time Speed 244GB 46 min 90,667 MB/sec Add Restore - have the throughput numbers

DB and Data Loading Data loading Query Performance Sizing Tools (BCP, SSIS..) Time SLA Query Performance Indexing strategy Sizing Compression

Loading Data in Azure Smaller batches (10K -15K rows) Retry logic Network latency is high Parallel loading!! Start with: SSIS for Hybrid Data Movement SSIS Performance and Operational guide Contrast cloud vs on-prem

Baseline Understand Data Sources performance Flat File in Azure VM ~60 MB/sec /reads SQLIO shows the max throughput of the IO subsystem on the DB side App performance can be different

Parallel Loading Flat file Max 60 MB/sec Flat file Max 60 MB/sec Mod(7) function 8 destinations to keep all CPU busy on the DW site

Begin to load

Monitoring Loading Performance You will be followed by TOP waits: ASYNC_NETWORK_IO PAGEIOLATCH_EX WRITELOG PAGEIOLATCH_UP SOS_SCHEDULER_YIELD PAGEIOLATCH_SH PAGELATCH_UP PREEMPTIVE_OLEDBOPS Network IO Disk IO CPU

Loading: table options Heap Heap compressed 780 772 573 rows Elapsed time: 01:06:15.313 780 772 573 rows Elapsed time: 05:12:06.094 Blob Finished, 1:39:05 PM, Elapsed time: 01:06:15.313 Finished, 3:01:24 PM, Elapsed time: 01:08:00.156 Finished, 8:58:09 AM, Elapsed time: 01:01:56.641 – heap blob

Loading: table options HEAP Clustered Index 780 772 573 rows Elapsed time: 01:06:15.313 Sort! Elapsed time: 01:20:12.547

Query Performance Heap Primary Key/Clustered Index Compression

Query performance: results

Please welcome on stage SQL 2014

What’s new? Data files to BLOBs Updateable Clustered Column Store index

Loading data Heap Clustered Column store Index 1 hour 1 min Load test 2 hours 16 min

SQL 2014

Query 19 Estimates vs Actual

And the winner is… SQL Server 2014!!

Summary Easy and fast deployment through he Gallery or PS scripts Azure Data Warehouse is consistent with the most of the best practices Query Loading Low Initial investments and TCO

THANK YOU! For attending this session and PASS SQLRally Nordic 2013, Stockholm