Download presentation
Presentation is loading. Please wait.
1
Where Should My Data Live (and Why)?
Matt Gordon, Data Platform Solution Architect, DMI
2
Data Platform Solution Architect, DMI
15+ Years SQL Server Experience I began as a database developer, which evolved into a DBA role, then datacenter management, and now consulting. I’ve Managed Data and People I’ve managed servers, datacenters, and development teams in my career so I’ve approached these issues from many angles. Matt Gordon Data Platform Solution Architect, DMI I Drive Race Cars The picture to the left is of me and my car in Gasoline Alley at Indianapolis Motor Speedway this past June. I will talk your ear off about this if given the opportunity! /in/sqlatspeed @sqlatspeed
3
Agenda Where Does Our Data Live Now?
Why Does Our Data Live Where It Does? Cloud, On-Premises, or Both? Case Studies Wrap-up
4
Definitions
5
Discussion Points Cloud is just somebody else’s computer in somebody else’s datacenter Rapid development from cloud providers constantly expands options Are you locked into deployment locations for certain platforms? Database engine always on-premises Hadoop always in cloud Blending of technologies and platforms may/may not be the right answer
6
Where Does Our Data Live Now?
7
Where Does Our Data Live Now?
How many of us do not have a single data environment in the cloud? How many of us have only test/dev/QA data environments in the cloud? How many of us have a “trial” production data environment in the cloud? How many of us have all production data environments in the cloud? How many of us have all (or nearly all) data environments in the cloud?
8
Why Does Our Data Live Where It Does?
9
Why Does Our Data Live Where It Does?
On-Premises Pros Cost Leveraging “investments” Can cost less if uptime is not critical Comfort level “I can go see it” Physical control and security Data accessible even when all external telecom is down Licensing
10
Why Does Our Data Live Where It Does?
On-Premises Cons Generally requires large up-front investment Requires corresponding infrastructure Rackspace, cooling, cabling, telecom, fire suppression, etc. May require backup datacenter Depends on uptime requirements On-site personnel often needed to maintain operations More expensive from a resource perspective
11
Why Does Our Data Live Where It Does?
Cloud Pros Cost Buy only what you need Scalability (vertical and horizontal) Global redundancy Storage durability Data availability from all locations PaaS often satisfactory to government security audits/approvals High availability and disaster recovery often built-in*
12
Why Does Our Data Live Where It Does?
Cloud Cons Can require robust Internet connectivity VPN cost can be significant Minimal to no control over underlying infrastructure “Noisy neighbors” Design apps to deal with connection hiccups more efficiently Perception of lighter security “Things happen by magic”
13
Cloud, On-Premises, or Both?
14
Cloud Deployment Options (Azure)
SQL Server (IaaS) Azure SQL DW Azure SQL DB Hadoop Mimics on-premises behavior but resources are on Azure Full control of configuration Full control of maintenance MPP cloud-based, scale-out, relational database Separates storage and compute Can pause compute capacity when not needed PaaS flavor of SQL Server database Very limited control of maintenance Limited control of configuration Microsoft’s flavor is known as HDInsight Used for semi-structured data Can connect from database engine using PolyBase
15
Cloud Deployment Options (Amazon)
SQL Server on EC2 Amazon Redshift Amazon RDS Hadoop Mimics on-premises behavior but resources are on Amazon EC2 Full control of configuration Full control of maintenance Amazon equivalent of Azure SQL DW Fully managed Easily scalable Amazon PaaS offering Supports six database engines Minimal configuration control Amazon’s HDInsight equivalent is EMR Supports traditional Hadoop tooling Can connect from database engine using PolyBase
16
Cloud Deployment Options (Google)
Google Compute Engine DW? Google Cloud SQL Hadoop SQL Server on Google Cloud Platform is IaaS offering Full control of configuration Full control of maintenance Multiple versions and editions supported No real Google equivalent in this space yet PaaS flavor of database engines Supports MySQL and PostgreSQL (beta) Fully-managed Google’s fully-managed flavor is known as Google Cloud Dataproc Used for semi-structured data Can connect from database engine using PolyBase
17
On-Premises Deployment Options
SQL Server Microsoft APS PaaS Database Hadoop Traditional deployment of the database engine Full control of configuration Full control of maintenance MPP appliance Evolution of Parallel Data Warehouse Architecture of Azure SQL DW based on this design No on-premises equivalent of Azure SQL Database Microsoft’s flavor is known as HDInsight Many other non-Microsoft deployment options Can connect from database engine using PolyBase
18
Hybrid Deployment Options/Scenarios
On-Premises App Servers & Azure SQL DB Availability Groups with Azure Replica(s) Easy to create and destroy databases as needed for development and deployment Removes management responsibility from devs Good choice if DBA team short on resources Uses Azure as backup datacenter(s) Requires robust network infrastructure Good for minimum datacenter proximity requirements
19
Hybrid Deployment Options/Scenarios
Replication to Azure IaaS VM Replication to Azure SQL Database Tried and true technology in use Identical to doing this on-premises other than network portion Good way to ease into comfort with the cloud Azure SQL Database can be a replication subscriber Eases DBA team into cloud and PaaS interactions Straightforward setup
20
Setting up replication to Azure SQL Database
Demo Setting up replication to Azure SQL Database
21
Hybrid Deployment Options/Scenarios
Log Shipping to Azure IaaS VM PolyBase to Azure Blob Storage Popular with customers who want a copy of data stored completely off-site Straightforward setup Expands environment without requiring cluster or other complication infrastructure Great for querying large quantities of semi-structured data Good way to introduce team to PolyBase Subject of our first case study
22
Case Studies
23
Transportation Planning Agency
Statistical models generating TBs of output every year Storage costs spiraling upward and difficult to manage Output stored in relational database tables requiring constant maintenance Output generated as text files which were fed into the relational tables Loaded output files into Azure Blob Storage (cold) Query performance increased Storage costs decreased by 96% ($2k per year vs. $75k per year)
24
Querying statistical model outputs stored in Azure Blob Storage
Demo Querying statistical model outputs stored in Azure Blob Storage
25
Geospatial Research Center
Hosted Hadoop cluster Hosted HDFS storage storing Excel, CSV, XML, JSON, etc. SQL Server installed on Azure VMs Database engine, DQS, MDS, and SSAS in use PolyBase used to query semi-structured data from main SQL Server databases Data consumers presented with common interface to access heterogeneous data
26
Wrap-up
27
Discussion Points Cloud is just somebody else’s computer in somebody else’s datacenter Rapid development from cloud providers constantly expands options Are you locked into deployment locations for certain platforms? Database engine always on-premises Hadoop always in cloud Blending of technologies and platforms may/may not be the right answer
28
Recommendations Set expectations what cloud technologies are and what they can do Management Team HA/DR isn’t done by magic – it’s just different Stay abreast of new technologies Research Training Azure Stack Embrace it all!
29
Learn more from Matt Gordon
@sqlatspeed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.