Building Analytics At Scale With USQL and C#

Slides:

Advertisements

Similar presentations

Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.

Advertisements

This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.

Building and Diagnosing Applications using Visual Studio and Azure SDK Paul Yuknewicz Principal PM Manager.

PowerPoint Instructions These are not native PowerPoint objects. They are PNG objects. To change the color, you need to go to the Format Tab.

Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.

Andy Roberts Data Architect

AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October

Getting to know U-SQL Azhagappan Arunachalam.  Sr Applications Database Architect 

An Introduction To Big Data For The SQL Server DBA.

Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for

Microsoft Cognitive Services and Cortana Analytics

A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions.

9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Energy Management Solution

Big Data from Microsoft Azure Robert Turnage Data Solutions Architect

Mobile Application Solution

BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT

Energy Demand Forecasting

Microsoft Ignite /4/2018 1:44 PM BRK3105

Connected Infrastructure

AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.

4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Cortana Intelligence Suite Workshop

5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.

Data Platform and Analytics Foundational Training

Smart Building Solution

Cortana Intelligence Overview

Data-driven serverless apps with Azure functions

Working With Azure Batch AI

Orchestrating Data and Services with Azure Data Factory

Why Is My SQL DW Query Slow?

Machine Learning in practice

Smart Building Solution

Energy Demand Forecasting

Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,

Connected Infrastructure

Mobile Application Solution

Introduction to Operating System (OS)

Remote Monitoring solution

Energy Management Solution

Making your Data Lake smarter with Cognitive Services

Azure Machine Learning & ML Studio

Add intelligence to Dynamics AX with Cortana Intelligence suite

Cloudy with a Chance of Data

Exploring Azure Event Grid

Azure Infrastructure as a Service

9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.

Turning back time … … to 1998.

Overview of Azure Data Lake Store

Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.

U-SQL Object Model.

MyCloudIT Enables Partners to Drive Their Cloud Profitability Using CSP-Enabled Desktop Hosting Automation with Microsoft Azure and Office 365 MICROSOFT.

Near Real Time ETLs with Azure Serverless Architecture

Managing batch processing Transient Azure SQL Warehouse Resource

Learn. Imagine. Build. .NET Conf

XtremeData on the Microsoft Azure Cloud Platform:

Azure Data Lake for First Time Swimmers

Databricks: the new kid on the block

Analytics in the Cloud using Microsoft Azure

Technical Capabilities

2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Server & Tools Business

Office 365 Development July 2014.

Wimmer Solutions Team Justin Barbara Meg SQL and PowerBI Developer

Michael French Principal Consultant 5/18/2019

Architecture of modern data warehouse

Presentation transcript:

Building Analytics At Scale With USQL and C# Josh Fennessy Principal BlueGranite

Data Intelligence Action Data Information Management Big Data Stores Machine Learning and Analytics Intelligence People Data Sources Machine Learning Cognitive Services Data Factory Data Lake Store SQL Data Warehouse Data Lake Analytics Bot Framework Apps Web Mobile Bots Data Catalog Apps HDInsight (Hadoop and Spark) Event Hubs Cortana Sensors and devices Dashboards & Visualizations Stream Analytics Automate d Systems Power BI Data Data Intelligence Action

Azure Data lake Analytics Store Hyper-scale distributed storage Integrated with Azure Active Directory No file size or account size limits Compatible with WebHDFS Pay for what you use Analytics Clusterless distributed computing platform Based on C# and USQL Build complex data processing jobs in Visual Studio Pay per job instead of per hour

Requirements Required Recommended Azure Subscription Azure Data Lake Store Account Recommended Visual Studio 2015/2017 Azure Data Lake Tools

Creating an Account

Setting up Visual Studio

Setting up Visual STudio

Setting up Visual Studio

Now what? Process data for later analysis The T in ELT/ETL Do actual analysis of data and save results Give structure to un/semi-structured data for later use

BASIC Job Components ROWSET USQL Description of data stored in one or more files Read using an EXTRACTOR USQL Looks like TSQL, smells like TSQL, but used to do parallel batch processing of data

BASIC Job Components USQL OUTPUT Looks like TSQL, smells like TSQL, but used to do parallel batch processing of data OUTPUT Results of the transformation written back to storage Uses OUTPUTTERS to format the file

Demo Basic Job Structure

Pay Per Job ADLA is charged per job Each successful execution will incur charges Pay per Analytic Unit per compute hour Prorated to minute Let's take a look at the previous job that was run

Other job components USQL Tables User Defined Code Used to store data permanently that is accessed often. Supports partitioning, bucketing, and many other Big Data features used by other distributed processing environments User Defined Code Build custom processing tools to extend OOB capabilities All user defined code written in C# Deployed to job via assemblies

Demo Working with Tables

Demo Multiple Rowsets

EXTERNAL SCRIPTS Execute R or Python scripts Embed script inline or store in separate file Pass data to script Return data back to USQL and output directly or use in further transformations

Demo Call External R Script

Unstructured Data ADLA can also work with unstructured data Data that has no defined structure – but it has structure Cognitive Services can be useful to make sense of unstructured data

Demo Process Images with Cognitive Services

Operationalize Building a USQL job is only part of the process Azure Data Factory is the easiest way to schedule and implement USQL in production

Operationalize

Operationalize PowerShell can also be used to execute a USQL job

When Things go wrong Job Failures are bound to happen. Don't worry! There is a process: Browse to Job Management in the Azure Data Lake Analytics Portal

When Things go wrong Job Failures are bound to happen. Don't worry! There is a process: Find your failed job and select it

When Things go wrong Job Failures are bound to happen. Don't worry! There is a process: Review errors, inputs, and outputs to locate root cause and remediate

Optimizing performance Performance optimizing is a balance between job cost and total execution time Allocating more AUs may improve performance, but may also greatly increase cost. Controlling USQL code is the most important step to optimizing performance. Ask yourself: Is this the most efficient way to do this operation?

Optimizing performance Performance optimizing is a balance between job cost and total execution time AU efficiency is the most important metric to understand performance v. cost

Optimizing performance User Defined Objects Use UDOs sparingly. The optimizer is not able to help at all with performance issues related to UDOs. Consider replacing your logic in the UDO with SELECT…CROSS APPLY – that solves up to 90% of the case for a UDO UDO for EXTRACTORS or OUTPUTTERS are usually OK, but avoid for data transformation

Optimizing performance Final Thoughts Deeply understand the query lifecycle Monitor for data skew in your ADLA tables Use Partitioning wisely Avoid UDOs Optimize for the right balance of cost / performance Good performance at small scale != good performance at large scale. Do full scale testing an analysis too!

RECAP

Azure DatA Lake Analytics Designed for ETL/ELT on extremely large data Familiar to SQL and/or C# users Priced per job, no charge when idle Linked to a Data Lake Store account Rowsets for data stored in files

Azure DatA Lake Analytics Load distributed tables for data that is referenced frequently, or is extremely large Integrate with external languages: R or Python Cognitive Services built-in; no extra charge! Operationalize with Azure Data Factory or PowerShell

Azure DatA Lake Analytics Performance subjective based on balance between cost and execution time Learn query lifecycle to truly understand performance characteristics