Download presentation
Presentation is loading. Please wait.
Published byKerry Rodgers Modified over 6 years ago
2
Data Warehouse in the Cloud – Marketing or Reality?
Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team
3
Data Warehouse we used to know
High-End workload High-End hardware Special know-how
4
Reality is Thousands of departmental level DW Relatively low perf SLA
*BeyeNetwork Big Data research
5
New BI demands Utilize external data sources Non Structured Data
Origin is in the Cloud *BeyeNetwork Big Data research
6
New opportunity Platform is there “Closer” to data
Iaas SQL VM Paas SQL Azure DB “Closer” to data Less administrative overhead Lower initial and TCO cost
7
SQL Server Data Warehousing in Windows Azure Virtual Machines
Inspired by the Fast Track Reference Architecture guide Based on the High Memory images Up to 1TB MSDN: SQL Server Data Warehousing in Windows Azure Virtual Machines
8
High Memory VM in Azure
9
How to deploy Powershell script Windows Azure Gallery
10
The Azure Data Warehouse under the hood
11
Data Warehouse Lifecycle
Thoughts on the architecture Creating DB Connectivity Populating Database Initial data loading OR Backup/Restore Incremental data loading Compression Query performance Architecture
12
Thoughts on the architecture
Data Loading Minimize Log impact Scale loading streams Do not invent the wheel and follow the Data loading Performance guide Query Performance ! Do not invent the wheel and follow the Data loading Performance guide
13
Windows Azure VM Architecture
Disks implemented as a shared multi-tenant service Built-in triple redundancy, optional geo-redundancy Performance less predictable than on-prem Host machines, storage services, network bandwidth shared between subscribers Perf can depend on where and when VM is provisioned Subject to maintenance operations Granular control & configurability vs. cost, simplicity, out of box redundancy Storage Stamp Stream Layer Partition Layer Front-ends LB Intra-stamp replication Geo-replication Storage Location Service To achieve the same level of redundancy in on-premises deployments, you would need to set up multiple disk arrays in multiple locations and a synchronization mechanism, such as, a storage area network (SAN) replication
14
Tweaks to improve IO Subsystem
Database file initialization GPEdit.msc Data file placement SQL Striping for User Data and TempDB Aggregated throughput Set the size and data grow options wisely *You may do it differently. Then Create 350GB DB took ~3 hours Plan maintenance All options are there , but you need to double-check if they all correctly set and this why
15
Scaling IO Options Windows Storage Spaces SQL Data Files Log drive
Not clear support story Spread File Group over all drives Windows 2012 Storage the Mention block sizes Separate the Log and Data disk
16
Scaling IO Options Data disk (read) LOG (write)
SQLIO Single Data Disk (256K) SQLIO Windows Storage Spaces X3 Disks (256K) SQLIO SQL Striping x3 Disk CUMULATIVE DATA: throughput metrics: IOs/sec: MBs/sec: IOs/sec: MBs/sec: IOs/sec: MBs/sec: SQLIO Single Data Disk (64K) SQLIO Windows Storage Spaces X3 Disks (64K) SQLIO SQL Striping x3 Disk (64K) * CUMULATIVE DATA: throughput metrics: IOs/sec: MBs/sec: IOs/sec: MBs/sec: IOs/sec: MBs/sec: * But we can access one file at the time!
17
Connectivity Options Windows Azure VM End Points
Point-to-Site /Site-to-Site *Other options are also available ( FTP)
18
What and how we tested TPCH – star schema
Workload is know, we wanted to see in the Cloud Size 200 GB
19
Getting initial data Copy backup to the Data Disks
Backup/Restore to/from URL ETL to the new DB
20
URL is fast! Backup to the Local Data Disk Backup to the URL DB Size
Time Speed 244GB 3 hours 22,978 MB/sec DB Size Time Speed 244GB 46 min 90,667 MB/sec Add Restore - have the throughput numbers
21
DB and Data Loading Data loading Query Performance Sizing
Tools (BCP, SSIS..) Time SLA Query Performance Indexing strategy Sizing Compression
22
Loading Data in Azure Smaller batches (10K -15K rows) Retry logic
Network latency is high Parallel loading!! Start with: SSIS for Hybrid Data Movement SSIS Performance and Operational guide Contrast cloud vs on-prem
23
Baseline Understand Data Sources performance
Flat File in Azure VM ~60 MB/sec /reads SQLIO shows the max throughput of the IO subsystem on the DB side App performance can be different
24
Parallel Loading Flat file Max 60 MB/sec Flat file Max 60 MB/sec
Mod(7) function 8 destinations to keep all CPU busy on the DW site
25
Begin to load
26
Monitoring Loading Performance
You will be followed by TOP waits: ASYNC_NETWORK_IO PAGEIOLATCH_EX WRITELOG PAGEIOLATCH_UP SOS_SCHEDULER_YIELD PAGEIOLATCH_SH PAGELATCH_UP PREEMPTIVE_OLEDBOPS Network IO Disk IO CPU
27
Loading: table options
Heap Heap compressed rows Elapsed time: 01:06:15.313 rows Elapsed time: 05:12:06.094 Blob Finished, 1:39:05 PM, Elapsed time: 01:06:15.313 Finished, 3:01:24 PM, Elapsed time: 01:08:00.156 Finished, 8:58:09 AM, Elapsed time: 01:01: – heap blob
28
Loading: table options
HEAP Clustered Index rows Elapsed time: 01:06:15.313 Sort! Elapsed time: 01:20:12.547
29
Query Performance Heap Primary Key/Clustered Index Compression
30
Query performance: results
31
Please welcome on stage SQL 2014
32
What’s new? Data files to BLOBs
Updateable Clustered Column Store index
33
Loading data Heap Clustered Column store Index 1 hour 1 min
Load test 2 hours 16 min
34
SQL 2014
35
Query 19 Estimates vs Actual
36
And the winner is… SQL Server 2014!!
37
Summary Easy and fast deployment through he Gallery or PS scripts
Azure Data Warehouse is consistent with the most of the best practices Query Loading Low Initial investments and TCO
38
THANK YOU! For attending this session and PASS SQLRally Nordic 2013, Stockholm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.