Presentation is loading. Please wait.

Presentation is loading. Please wait.

20 Questions with Azure SQL Data Warehouse

Similar presentations


Presentation on theme: "20 Questions with Azure SQL Data Warehouse"— Presentation transcript:

1 20 Questions with Azure SQL Data Warehouse
Kevin Feasel

2 1 - Who Am I? What Am I Doing Here?
Curated SQL Tribal SQL

3 2. What is Azure SQL Data Warehouse?
Azure SQL Data Warehouse is a Platform as a Service relational data warehouse offering in Azure. Platform as a Service: you don’t manage the infrastructure. Backups are automatic using data snapshots. Relational data warehouse: built on a relational database and intended for large analytics queries like aggregations over a time frame.

4 3. How is this different from Azure SQL Database?
Azure SQL Database is an Online Transactional Processing (OLTP) system. Azure SQL DB Azure SQL DW Data structure Transactional (OLTP) Analytical (OLAP) Data insertion Many small, independent data modifications Larger, periodic insertions of data Max data size 1 TB No limit Concurrency 6400 concurrent queries 32 concurrent queries In-Memory OLTP? Yes No Polybase?

5 4. Why would I want to use Azure SQL Data Warehouse?
You have a large warehouse which takes a long time to process or retrieve data. You want to off-load analytics queries to another system. You need to do periodic processing of extra-large data, such as monthly or quarterly reports. Your analytics queries scan and typically aggregate large numbers of rows You are already using Redshift and have realized how painful it can be when you guess wrong on size.

6 5. What kinds of workloads work best?
Relatively slowly changing data Load once a day? Once a week? Relatively few queries running at a time Limit of 32 concurrent queries Kimball-style data warehouse Fact tables storing measures (numeric values) Dimension tables storing attributes which explain the facts Big data sets: terabytes of data Big queries: getting large swaths of aggregated data

7 6. What does the Azure SQL DW architecture look like?
Azure SQL Data Warehouse is broken down into control and compute nodes. These are modified SQL Server instances linked together with the Data Movement Service. These nodes sit on top of sixty blob storage containers. Each compute node “owns” 60/N containers. Rescaling means adding or removing compute nodes, changing the ratio.

8 7. Why separate compute from storage?
Can scale compute independently from storage: scaling to DWUs Mid-sized warehouse during business hours Reduced size during off hours Bump up the power for the overnight data load Turn it up to 11 to generate those quarterly reports Can turn the compute portion off when you aren’t using the warehouse Still pay for storage, but don’t pay for compute

9 8. Wait, what’s a DWU? Data Warehouse Unit: the throughput unit for Azure SQL DW

10 9. How do I know how many DWUs I need?
DWUs are the number of compute instances. 1 instance = 100 DWU Max out at 6000 DWU (60 compute nodes) DWU calculator: Compares your current on-prem workload Or just try it out! DWUs scale linearly, so if queries are slow, bump up DWUs

11 10. How much does this cost? Compute: $1.21 per hour per instance
100 DWUs for a month (750 hours) = $ / month 600 DWUs = $5445 / month Storage: $0.17 per terabyte per hour (premium storage cost) 1 TB for a month (750 hours) = $ / month 30 TB = $ / month DWUs calculated as max(DWU) for an hour

12 11. What can I do to reduce that cost?
When the DW is paused, you don’t pay for compute You still pay for storage! Scale up and down as needed Do you need all 600 after hours? How about on the weekend? Delete and re-create the warehouse Useful if you only need the warehouse occasionally Keep your long-term data in Azure Blob Storage (cheap mode) to minimize costs

13 12. So how do I actually use this thing?
DEMO TIME

14 13. What design decisions do I make?
Index type: Clustered columnstore index: fact tables (lots of aggregation) Heap: staging tables Clustered index: single-record lookups Can add non-clustered indexes as well

15 13. What design decisions do I make?
Distribution type: HASH: separates data based on a defined column. Great for joining big fact tables together. Hash column should have > 60 distinct values and ideally should have a fairly uniform distribution. ROUND_ROBIN: sprays records across the 60 storage buckets. Partitioning: Choose a partition column you normally use, like date

16 14. What tooling is available?
SQL Server Data Tools SQL Server Management Studio The Azure Portal UI Powershell

17 15. What maintenance do I need to do?
Backups are automatic and get stored for 7 days Server patches, etc. all taken care of behind the scenes Statistics must be maintained manually Maybe rebuild statistics after each data load

18 16. How can I integrate with this?
Azure SQL Data Warehouse supports several connection types, including ADO.Net, ODBC, and JDBC This means you can connect using pretty much any tool you normally would: External applications (e.g., web app or Web API) Power BI / Tableau SQL Server Reporting Services

19 17. What language limitations exist?
No recursive Common Table Expressions No identity or sequence columns No MERGE statement No cursors (WHILE loops are okay) Must use SET to modify variables; cannot use SELECT or UPDATE No schemabinding views, inserting into views, or creating indexed views Cannot use GROUP BY with ROLLUP / CUBE / GROUPING SETS

20 18. What system limitations exist?
No Entity Framework support No built-in R Services Cannot read JSON/XML data from file storage Columnstore indexed tables do not support MAX columns

21 19. What if I want a 366-page PDF describing Azure SQL Data Warehouse?
documents/live/sql-data-warehouse.pdf

22 20. Got anything shorter than 366 pages?
SQL Data Warehouse limitations James Serra on Azure SQL Data Warehouse Working with Azure SQL Data Warehouse (my post)


Download ppt "20 Questions with Azure SQL Data Warehouse"

Similar presentations


Ads by Google