Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse in the Cloud – Marketing or Reality?

Similar presentations


Presentation on theme: "Data Warehouse in the Cloud – Marketing or Reality?"— Presentation transcript:

1

2 Data Warehouse in the Cloud – Marketing or Reality?
Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team

3 Data Warehouse we used to know
High-End workload High-End hardware Special know-how

4 Reality is Thousands of departmental level DW Relatively low perf SLA
*BeyeNetwork Big Data research

5 New BI demands Utilize external data sources Non Structured Data
Origin is in the Cloud *BeyeNetwork Big Data research

6 New opportunity Platform is there “Closer” to data
Iaas SQL VM Paas SQL Azure DB “Closer” to data Less administrative overhead Lower initial and TCO cost

7 SQL Server Data Warehousing in Windows Azure Virtual Machines
Inspired by the Fast Track Reference Architecture guide Based on the High Memory images Up to 1TB MSDN: SQL Server Data Warehousing in Windows Azure Virtual Machines

8 High Memory VM in Azure

9 How to deploy Powershell script Windows Azure Gallery

10 The Azure Data Warehouse under the hood

11 Data Warehouse Lifecycle
Thoughts on the architecture Creating DB Connectivity Populating Database Initial data loading OR Backup/Restore Incremental data loading Compression Query performance Architecture

12 Thoughts on the architecture
Data Loading Minimize Log impact Scale loading streams Do not invent the wheel and follow the Data loading Performance guide Query Performance ! Do not invent the wheel and follow the Data loading Performance guide

13 Windows Azure VM Architecture
Disks implemented as a shared multi-tenant service Built-in triple redundancy, optional geo-redundancy Performance less predictable than on-prem Host machines, storage services, network bandwidth shared between subscribers Perf can depend on where and when VM is provisioned Subject to maintenance operations Granular control & configurability vs. cost, simplicity, out of box redundancy Storage Stamp Stream Layer Partition Layer Front-ends LB Intra-stamp replication Geo-replication Storage Location Service To achieve the same level of redundancy in on-premises deployments, you would need to set up multiple disk arrays in multiple locations and a synchronization mechanism, such as, a storage area network (SAN) replication

14 Tweaks to improve IO Subsystem
Database file initialization GPEdit.msc Data file placement SQL Striping for User Data and TempDB Aggregated throughput Set the size and data grow options wisely *You may do it differently. Then Create 350GB DB took ~3 hours Plan maintenance All options are there , but you need to double-check if they all correctly set and this why

15 Scaling IO Options Windows Storage Spaces SQL Data Files Log drive
Not clear support story Spread File Group over all drives Windows 2012 Storage the Mention block sizes Separate the Log and Data disk

16 Scaling IO Options Data disk (read) LOG (write)
SQLIO Single Data Disk (256K) SQLIO Windows Storage Spaces X3 Disks (256K) SQLIO SQL Striping x3 Disk CUMULATIVE DATA: throughput metrics: IOs/sec: MBs/sec: IOs/sec: MBs/sec: IOs/sec: MBs/sec: SQLIO Single Data Disk (64K) SQLIO Windows Storage Spaces X3 Disks (64K) SQLIO SQL Striping x3 Disk (64K) * CUMULATIVE DATA: throughput metrics: IOs/sec: MBs/sec: IOs/sec: MBs/sec: IOs/sec: MBs/sec: * But we can access one file at the time!

17 Connectivity Options Windows Azure VM End Points
Point-to-Site /Site-to-Site *Other options are also available ( FTP)

18 What and how we tested TPCH – star schema
Workload is know, we wanted to see in the Cloud Size 200 GB

19 Getting initial data Copy backup to the Data Disks
Backup/Restore to/from URL ETL to the new DB

20 URL is fast! Backup to the Local Data Disk Backup to the URL DB Size
Time Speed 244GB 3 hours 22,978 MB/sec DB Size Time Speed 244GB 46 min 90,667 MB/sec Add Restore - have the throughput numbers

21 DB and Data Loading Data loading Query Performance Sizing
Tools (BCP, SSIS..) Time SLA Query Performance Indexing strategy Sizing Compression

22 Loading Data in Azure Smaller batches (10K -15K rows) Retry logic
Network latency is high Parallel loading!! Start with: SSIS for Hybrid Data Movement SSIS Performance and Operational guide Contrast cloud vs on-prem

23 Baseline Understand Data Sources performance
Flat File in Azure VM ~60 MB/sec /reads SQLIO shows the max throughput of the IO subsystem on the DB side App performance can be different

24 Parallel Loading Flat file Max 60 MB/sec Flat file Max 60 MB/sec
Mod(7) function 8 destinations to keep all CPU busy on the DW site

25 Begin to load

26 Monitoring Loading Performance
You will be followed by TOP waits: ASYNC_NETWORK_IO PAGEIOLATCH_EX WRITELOG PAGEIOLATCH_UP SOS_SCHEDULER_YIELD PAGEIOLATCH_SH PAGELATCH_UP PREEMPTIVE_OLEDBOPS Network IO Disk IO CPU

27 Loading: table options
Heap Heap compressed rows Elapsed time: 01:06:15.313 rows Elapsed time: 05:12:06.094 Blob Finished, 1:39:05 PM, Elapsed time: 01:06:15.313 Finished, 3:01:24 PM, Elapsed time: 01:08:00.156 Finished, 8:58:09 AM, Elapsed time: 01:01: – heap blob

28 Loading: table options
HEAP Clustered Index rows Elapsed time: 01:06:15.313 Sort! Elapsed time: 01:20:12.547

29 Query Performance Heap Primary Key/Clustered Index Compression

30 Query performance: results

31 Please welcome on stage SQL 2014

32 What’s new? Data files to BLOBs
Updateable Clustered Column Store index

33 Loading data Heap Clustered Column store Index 1 hour 1 min
Load test 2 hours 16 min

34 SQL 2014

35 Query 19 Estimates vs Actual

36 And the winner is… SQL Server 2014!!

37 Summary Easy and fast deployment through he Gallery or PS scripts
Azure Data Warehouse is consistent with the most of the best practices Query Loading Low Initial investments and TCO

38 THANK YOU! For attending this session and PASS SQLRally Nordic 2013, Stockholm


Download ppt "Data Warehouse in the Cloud – Marketing or Reality?"

Similar presentations


Ads by Google