Data Pipeline Best Practices for an Increasingly Cloudy World

Slides:



Advertisements
Similar presentations
1/17/20141 Leveraging Cloudbursting To Drive Down IT Costs Eric Burgener Senior Vice President, Product Marketing March 9, 2010.
Advertisements

The Total Cost of (Non) Ownership of Storage In The Cloud Jinesh Varia Technology Evangelist.
Chapter 22: Cloud Computing and Related Security Issues Guide to Computer Network Security.
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
C LOUD C OMPUTING Presented by Ye Chen. What is cloud computing? Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access.
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
Cloud Don McGregor Research Associate MOVES Institute
Cloud computing Tahani aljehani.
EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.
Cloud Computing. 2 A division of Konica Minolta Business Solutions USA Inc. What is Cloud Computing? A model for enabling convenient, on-demand network.
CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.
Introduction to Cloud Computing
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Adam Leidigh Brandon Pyle Bernardo Ruiz Daniel Nakamura Arianna Campos.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Preparing your Fabric & Apps for Windows Server 2003 End of Support Jeff Woolsey Principal Program Manager.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
Overview of Cloud Computing Sven Rosvall ACCU
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
How AWS Pricing Works Jinesh Varia Technology Evangelist.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
DaaS (Desktop as a Service) Last Update: July 15 th, 2015.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
1 NETE4631 Network Information Systems : Introduction to Cloud Computing Lecture Notes #2.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
Web Technologies Lecture 13 Introduction to cloud computing.
Scalability == Capacity * Density.
Template V.17, July 29, 2011 What’s the Cloud Got to do with HR Transformation? Heath Brownsworth, Director Technology Strategy.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Advanced cloud infrastructures and services SAULIUS ŽIŪKAS.
Canadian Bioinformatics Workshops
Cloud Computing Presented By: Mark Jordan. Agenda Definition Examples Which is Better Future.
Delivering on the Promise of a Virtualized Dynamic Data Center
CIS 700-5: The Design and Implementation of Cloud Networks
Business Continuity & Disaster Recovery
Reducing Risk with Cloud Storage
Amazon Web Services Submitted By- Section - B Group - 4
Cloud Computing Kelley Raines.
Chapter 21: Cloud Computing and Related Security Issues
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Cloud Computing.
Chapter 22: Cloud Computing Technology and Security
Exploring Azure Event Grid
Group 8 Virtualization of the Cloud
Business Continuity & Disaster Recovery
Replication Middleware for Cloud Based Storage Service
Server Innovation Accelerates IT Transformation
CNIT131 Internet Basics & Beginning HTML
Accelerate application delivery with a Cloud-native mindset
Azure Event Grid with Custom Events
Service Oriented Architecture for Cloud Based Travel Reservation Software as a Service Comp 684 – Rayna Burgess.
Outline Virtualization Cloud Computing Microsoft Azure Platform
Building a Database on S3
Intelligent Migration Solution Simplifies, Scales Products While Saving Cloud Computing Costs “After assessing a variety of cloud-based computing platforms,
INFO 344 Web Tools And Development
Computing Power and Storage in the Cloud Bring Web Mobility to Construction Estimating MINI-CASE STUDY “Microsoft Azure storage and scalability empower.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
AWS Cloud Computing Masaki.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Cloud: everything you wanted to know, but were afraid to ask
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Introduction to Cloud Computing
Cloud Computing: Concepts
SUSE CaaS and Dell EMC.
Cloud Computing Erasmus+ Project
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Lecture 03.5: Cloud Computing ( SAAS )
Michael Stephenson Microsoft MVP - Azure
Presentation transcript:

Data Pipeline Best Practices for an Increasingly Cloudy World Adam Machanic SQL Saturday Boston 2019

Adam Machanic SQL Saturday Boston 2019 Data Pipeline Best Practices Architecture for an Increasingly Cloudy World Adam Machanic SQL Saturday Boston 2019

Adam Machanic SQL Saturday Boston 2019 ETL Data Pipeline Best Practices Architecture for an Increasingly Cloudy World Adam Machanic SQL Saturday Boston 2019

Adam Machanic SQL Saturday Boston 2019 ETL Data Pipeline Best Practices Architecture for an Increasingly Cloudy World Overpriced, Underpowered Servers Adam Machanic SQL Saturday Boston 2019

ADAM MACHANIC A BRIEF TIMELINE Shifted Focus to OSS Early Stuff The SQL Years Birth Discovered SQL Server SQL MVP Contact: adam@dataeducation.com SQL Saturday Boston

WHY MIGRATE TO “THE CLOUD?” Reduce Spending Eliminate Data Center Costs Reduce Management Overhead CapEx Becomes OpEx Improve Scalability! Decrease Deployment Time Infrastructure As Code Because That’s What We’re Told To Do The CTO mandated that we migrate.

THE CLOUD, PROPERLY DEFINED Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. - National Institute of Standards and Technology “ ”

THE 6 R’S OF CLOUD MIGRATION RETIRE RE-PURCHASE RETAIN RE-HOST (a.k.a. “lift-and-shift”) (a.k.a. “someone else’s server”) RE-PLATFORM REFACTOR

LIFT-AND-SHIFT: SERVER COST REDUCTION? Dell PowerEdge R740xd2 16 cores (2.3 GHz), 64 GB RAM One-Time Retail Cost: $7,803 Data Center Est.: $50-300/mo 3 years… Maximum TCO: $18,603 Realistic* TCO: $9,842 *20% Server Discount $100/mo Data Center Amazon AWS EC2 m4.4xlarge 16 cores (2.4 GHz), 64 GB RAM $0.80 /hour $19.20 /day $7,008 /year $21,024 /3 years 100% CapEx CapEx OpEx

CLOUD REFACTORING CONSIDERATIONS Everything is a remote network service. FAILURES HAPPEN. A LOT. Availability is less important than scalability. FAILURES HAPPEN. A LOT. Availability and elasticity are more important than raw performance. THE “OLD WAY” MIGHT SEEM SLOW. RETRY LOGIC PROCESS RE-ENTRY AND IDEMPOTENCY

LINCHPIN CLOUD-NATIVE SERVICES Queuing and Messaging LOB Storage Transient Computing Infrastructure

QUEUES FOR THE WIN GATING AND EFFICIENCY

QUEUES FOR THE WIN ROUTING, DECOUPLING, SCALE CONTROL FILES SERVICE FILES SERVICE QUEUE

QUEUES FOR THE WIN HARDENING AND RETRY Hardening of Work Item Information (i.e. small packets of metadata, not actual work) Visibility and Timeouts One-at-a-Time Delivery Timeout? Return the Item to the Queue

SCALABLE STORAGE, THE LOB WAY Optimized for Scale and Availability NOT NECESSARILY OPTIMIZED FOR RAW PERFORMANCE Consider: Network Latency vs. Throughput Eventually Consistent, Maybe

LATENCY MATTERS! TEST: WRITE 1,000,000,000 BYTES: LOCAL VS S3 10,000 x 100,000 BYTES LOCAL 5s S3 890s 1,000 x 1,000,000 BYTES LOCAL 5s S3 112s 100 x 10,000,000 BYTES LOCAL 5s S3 32s 10 x 100,000,000 BYTES LOCAL 5s S3 13s

… BUT LOB STORAGE CAN SCALE! 1,000,000,000 BYTES USING 64 THREADS 10,000 x 100,000 BYTES LOCAL 11s S3 23s 1,000 x 1,000,000 BYTES LOCAL 6s S3 7s 100 x 10,000,000 BYTES LOCAL 6s S3 5s 10 x 100,000,000 BYTES LOCAL 8s S3 2s

LOB REFACTOR: STORAGE COST REDUCTION? 50 TB OF DATA Pure Storage 57.2 TB FlashArray One-Time Retail Cost: $349,700 + Data Center Costs Amazon S3, 50 TB Storage ($0.023/GB/month) $1,150/month; $13,800/year; $41,400/3 years Transfer (10 TB/month, US East) ($0.01/GB/month) $100/month; $1200/year; $3600/3 years Operations Writes (1,000,000/month) ($0.005/1000) == $5.00/month; $60.00/year; $180/3 years Reads (10,000,000/month) ($0.0004/1000) $4.00/month; $48.00/year; $144/3 years Total: $45,324

TRANSIENT COMPUTING SERVERLESS RESOURCES Resources Appear on Demand Resources Disappear When Demand Ends Server-Oriented or “Serverless” SERVERLESS RESOURCES No Server to Manage 4x-10x More Expensive Per Cycle Can be Slower Than Server-Oriented Resources

LEGACY MONOLITH ELT ARCHITECTURE BEGIN TRANSACTION; UPDATE DIM TABLE 1; UPDATE DIM TABLE 2; UPDATE DIM TABLE 3; UPDATE FACT TABLE 1; UPDATE FACT TABLE 2; … COMMIT; INTEGRATION SERVER Basic Transformation More Transformation File Watcher File Store Staging Database Destination Database

ACTUALLY BENEFIT FROM CLOUD SERVICES! LET’S REFACTOR! ACTUALLY SAVE MONEY! ACTUALLY BENEFIT FROM CLOUD SERVICES! MAKE IT SCALE!

HOW TO SCALE? Throttling REALITY Cloud Services Appear to Have Endless Resources REALITY They Have Way More Servers Than You (But Not Infinite) Throttling Queuing Eventual Consistency

TO SCALE WE MUST BUILD ON SCALE Local Hard Drive Network Attached Storage Device Cloud Provider LOB Storage An Average Database Cloud Provider Queue Cloud Provider Serverless Offering

A CLOUD-NATIVE PIPELINE TEMPLATE CHEAP, SCALABLE, AND (RELATIVELY) FAIL-SAFE Initial File Container Destination Once Per Target Set Serverless Event Trigger Initial Work Item Queue Transient Worker(s) Intermediate Results File Container Transient Worker Secondary Transformation Work Item Queue Basic Transformation File Container Serverless Event Trigger

SUMMARY Re-hosting in the Cloud is Probably a Waste of Time and Money (Don’t Tell Your CTO) Refactoring in the Cloud Brings a Variety of Benefits Building on Highly Scalable Components Yields a Highly Scalable End Result