20 Questions with Azure SQL Data Warehouse

20 Questions with Azure SQL Data Warehouse
Kevin Feasel

1 - Who Am I? What Am I Doing Here?
Curated SQL Tribal SQL

2. What is Azure SQL Data Warehouse?
Azure SQL Data Warehouse is a Platform as a Service relational data warehouse offering in Azure. Platform as a Service: you don’t manage the infrastructure. Backups are automatic using data snapshots. Relational data warehouse: built on a relational database and intended for large analytics queries like aggregations over a time frame.

3. How is this different from Azure SQL Database?
Azure SQL Database is an Online Transactional Processing (OLTP) system. Azure SQL DB Azure SQL DW Data structure Transactional (OLTP) Analytical (OLAP) Data insertion Many small, independent data modifications Larger, periodic insertions of data Max data size 1 TB No limit Concurrency 6400 concurrent queries 32 concurrent queries In-Memory OLTP? Yes No Polybase?

4. Why would I want to use Azure SQL Data Warehouse?
You have a large warehouse which takes a long time to process or retrieve data. You want to off-load analytics queries to another system. You need to do periodic processing of extra-large data, such as monthly or quarterly reports. Your analytics queries scan and typically aggregate large numbers of rows You are already using Redshift and have realized how painful it can be when you guess wrong on size.

5. What kinds of workloads work best?
Relatively slowly changing data Load once a day? Once a week? Relatively few queries running at a time Limit of 32 concurrent queries Kimball-style data warehouse Fact tables storing measures (numeric values) Dimension tables storing attributes which explain the facts Big data sets: terabytes of data Big queries: getting large swaths of aggregated data

6. What does the Azure SQL DW architecture look like?
Azure SQL Data Warehouse is broken down into control and compute nodes. These are modified SQL Server instances linked together with the Data Movement Service. These nodes sit on top of sixty blob storage containers. Each compute node “owns” 60/N containers. Rescaling means adding or removing compute nodes, changing the ratio.

7. Why separate compute from storage?
Can scale compute independently from storage: scaling to DWUs Mid-sized warehouse during business hours Reduced size during off hours Bump up the power for the overnight data load Turn it up to 11 to generate those quarterly reports Can turn the compute portion off when you aren’t using the warehouse Still pay for storage, but don’t pay for compute

8. Wait, what’s a DWU? Data Warehouse Unit: the throughput unit for Azure SQL DW

9. How do I know how many DWUs I need?
DWUs are the number of compute instances. 1 instance = 100 DWU Max out at 6000 DWU (60 compute nodes) DWU calculator: Compares your current on-prem workload Or just try it out! DWUs scale linearly, so if queries are slow, bump up DWUs

10. How much does this cost? Compute: $1.21 per hour per instance
100 DWUs for a month (750 hours) = $ / month 600 DWUs = $5445 / month Storage: $0.17 per terabyte per hour (premium storage cost) 1 TB for a month (750 hours) = $ / month 30 TB = $ / month DWUs calculated as max(DWU) for an hour

11. What can I do to reduce that cost?
When the DW is paused, you don’t pay for compute You still pay for storage! Scale up and down as needed Do you need all 600 after hours? How about on the weekend? Delete and re-create the warehouse Useful if you only need the warehouse occasionally Keep your long-term data in Azure Blob Storage (cheap mode) to minimize costs

12. So how do I actually use this thing?
DEMO TIME

13. What design decisions do I make?
Index type: Clustered columnstore index: fact tables (lots of aggregation) Heap: staging tables Clustered index: single-record lookups Can add non-clustered indexes as well

13. What design decisions do I make?
Distribution type: HASH: separates data based on a defined column. Great for joining big fact tables together. Hash column should have > 60 distinct values and ideally should have a fairly uniform distribution. ROUND_ROBIN: sprays records across the 60 storage buckets. Partitioning: Choose a partition column you normally use, like date

14. What tooling is available?
SQL Server Data Tools SQL Server Management Studio The Azure Portal UI Powershell

15. What maintenance do I need to do?
Backups are automatic and get stored for 7 days Server patches, etc. all taken care of behind the scenes Statistics must be maintained manually Maybe rebuild statistics after each data load

16. How can I integrate with this?
Azure SQL Data Warehouse supports several connection types, including ADO.Net, ODBC, and JDBC This means you can connect using pretty much any tool you normally would: External applications (e.g., web app or Web API) Power BI / Tableau SQL Server Reporting Services

17. What language limitations exist?
No recursive Common Table Expressions No identity or sequence columns No MERGE statement No cursors (WHILE loops are okay) Must use SET to modify variables; cannot use SELECT or UPDATE No schemabinding views, inserting into views, or creating indexed views Cannot use GROUP BY with ROLLUP / CUBE / GROUPING SETS

18. What system limitations exist?
No Entity Framework support No built-in R Services Cannot read JSON/XML data from file storage Columnstore indexed tables do not support MAX columns

19. What if I want a 366-page PDF describing Azure SQL Data Warehouse?
documents/live/sql-data-warehouse.pdf

20. Got anything shorter than 366 pages?
SQL Data Warehouse limitations James Serra on Azure SQL Data Warehouse Working with Azure SQL Data Warehouse (my post)

20 Questions with Azure SQL Data Warehouse

Similar presentations

Presentation on theme: "20 Questions with Azure SQL Data Warehouse"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

20 Questions with Azure SQL Data Warehouse

Similar presentations

Presentation on theme: "20 Questions with Azure SQL Data Warehouse"— Presentation transcript:

Similar presentations

About project

Feedback