BR013.

Slides:



Advertisements
Similar presentations
Jingren Zhou Microsoft Corp.. Large-scale Distributed Computing Large data centers (x1000 machines): storage and computation Key technology for search.
Advertisements

Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Microsoft Office System UK Developers Conference Radisson Edwardian, Heathrow 29 th & 30 th June 2005.
Differences between C# and C++ Dr. Catherine Stringfellow Dr. Stewart Carpenter.
CNIT 133 Interactive Web Pags – JavaScript and AJAX JavaScript Environment.
Introducing Reporting Services for SQL Server 2005.
Introducing Dynamic Data DemosRoadmap Feedback and Q&A.
Rich Internet Applications 2. Core JavaScript. The importance of JavaScript Many choices open to the developer for server-side Can choose server technology.
The 2007 Microsoft Office System Servers Enterprise Content Management, Workflow and Forms Martin Parry Developer and Platform Group, Microsoft Ltd
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
PHP using MySQL Database for Web Development (part II)
Building 1 million predictions per second using SQL-R
Authoring for Performance
Office 365 Development July 2014.
Top 10 Entity Framework Features Every Developer Should Know
Joy Rathnayake Senior Architect – Virtusa Pvt. Ltd.
ASP.NET Programming with C# and SQL Server First Edition
Building ARM IaaS Application Environment
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
SQL Server deployments
What's new in the world of SharePoint development and deployment
Unit testing your metro style apps built using XAML
Bridging the Data Science and SQL Divide for Practitioners
Chris Menegay Sr. Consultant TECHSYS Business Solutions
6/16/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Developing Hybrid Apps on Microsoft Azure Stack
6/17/2018 8:38 PM BRK3350 Run Python, R and .NET code at Data Lake scale with U-SQL in Azure Data Lake Michael Rys Principal Program Manager, Big Data.
Bring Big Data to the masses with U-SQL
9/12/ :12 PM BRK3323 Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent Michael Rys Principal Program.
Building Analytics At Scale With USQL and C#
Searching Business Data with MOSS 2007 Enterprise Search
SQL Server 2016 JSON Support FOR Data Warehousing
Excel Services Deployment and Administration
Customizing your device experience with assigned access
Add intelligence to Dynamics AX with Cortana Intelligence suite
Adaptive Code Umamaheswaran Senior Software Engineer
SharePoint-Hosted Apps and JavaScript
Build /21/2018 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION.
Searching Business Data with MOSS 2007 Enterprise Search
ISC440: Web Programming 2 Server-side Scripting PHP 3
Microsoft Connect /17/2018 5:15 AM
The Challenges of moving Document Creation to the Cloud
U-SQL Object Model.
Matt Masson Software Development Engineer Microsoft Corporation
Microsoft Connect /24/ :05 AM
Using Visual Studio and VS Code for Embedded C/C++ Development
12/5/ :36 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
PHP.
Visual Studio 2005 Tools For Office: Creating A Multi-tier Application
Microsoft SharePoint Conference 2009 Jon Flanders
Build data-driven collection and list apps using ListView in HTML5
2010 Microsoft BI Conference
Microsoft Connect /22/2019 9:54 PM
Contents Preface I Introduction Lesson Objectives I-2
Microsoft Build /27/2019 2:26 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Deep Dive into Azure API Apps and Logic Apps
An introduction to the SharePoint Patterns and Patterns initiative
4/12/2019 5:27 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/20/ :00 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/28/2019 6:13 PM HW-889P Advanced driver code analysis techniques Tips and tricks to develop more secure & reliable drivers Dave Sielaff Principal Software.
PHP an introduction.
DEV322 Visual Studio 2005 C# IDE Enhancements
ADO.NET Entity Framework
5/25/2019 2:40 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Skype for Business Assessment Results
Mark Quirk Head of Technology Developer & Platform Group
Visual Data Flows – Azure Data Factory v2
Visual Data Flows – Azure Data Factory v2
VNet and Cross-Premises Connectivity
Presentation transcript:

BR013

Killer Scenarios with Data Lake in Azure with U-SQL 11/13/2018 7:31 PM Killer Scenarios with Data Lake in Azure with U-SQL Michael Rys Principal Program Manager Big Data @MikeDoesBigData usql@microsoft.com http://aka.ms/azuredatalake © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Agenda Today (BR013): Killer extensibility in Azure Data Lake with U-SQL Custom rowset aggregation How to do JSON processing Image processing How to call R from U-SQL Yesterday (BR014): Introduction to Azure Data Lake and U-SQL What is Azure Data Lake? Why U-SQL? Core concepts Schema on read on file and file sets C# extensibility SQL with U-SQL Script level execution and optimization Tool usage

U-SQL extensibility Built-in operators, function, aggregates Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined functions (UDFs) User-defined aggregates (UDAGGs) User-defined operators (UDOs)

What are UDOs? User-Defined Extractors User-Defined Outputters User-Defined Processors Take one row and produce one row Pass-through versus transforming User-Defined Appliers Take one row and produce 0 to n rows Used with OUTER/CROSS APPLY User-Defined Combiners Combines rowsets (like a user-defined join) User-Defined Reducers Take n rows and produce m rows (normally m<n) Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): EXTRACT OUTPUT PROCESS COMBINE REDUCE

How to specify UDOs? Code behind vs Assemblies

Managing Assemblies CREATE ASSEMBLY db.assembly FROM @path; CREATE ASSEMBLY db.assembly FROM byte[]; Can also include additional resource files REFERENCE ASSEMBLY db.assembly; Referencing .Net Framework Assemblies Always accessible system namespaces: U-SQL specific (e.g., for SQL.MAP) All provided by system.dll system.core.dll system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq) Add all other .Net Framework Assemblies with: REFERENCE SYSTEM ASSEMBLY [System.XML]; Enumerating Assemblies Powershell command U-SQL Studio Server Explorer DROP ASSEMBLY db.assembly; Create assemblies Reference assemblies Enumerate assemblies Drop assemblies VisualStudio makes registration easy!

USING clause 'USING' csharp_namespace | Alias '=' csharp_namespace_or_class. Examples: DECLARE @ input string = "somejsonfile.json"; REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; USING Microsoft.Analytics.Samples.Formats.Json; @data0 = EXTRACT IPAddresses string FROM @input USING new JsonExtractor("Devices[*]"); USING json = [Microsoft.Analytics.Samples.Formats.Json.JsonExtractor]; @data1 = USING new json("Devices[*]"); Allows shortening and disambiguating C# namespace and class names

Overlapping Range Aggregation Explain problem https://blogs.msdn.microsoft.com/azuredatalake/2016/06/27/how-do-i-combine-overlapping-ranges-using-u-sql-introducing-u-sql-reducer-udos

Overlapping Range Aggregation Explain Code

JSON Processing Explain problem

JSON Processing Architecture of JSON assemblies Flat vs nested vs array JSON Single doc vs multidoc JSON JSON DOM vs JSONReader Processing

Image Processing Explain problem

Image Processing Architecture of Image processing assemblies Memory limits Memory pressures: UDFs vs Processor vs Extractor

R Processing Explain problem

U-SQL Processing with R Architecture of R processing assemblies (and similar extensions: Python/JVM) Interop challenges No external access from UDOs Future work: More generic samples, More automatic experiences

Summary of U-SQL UDOs 11/13/2018 7:31 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

UDO model Marking UDOs Parameterizing UDOs UDO signature [SqlUserDefinedExtractor] public class DriverExtractor : IExtractor { private byte[] _row_delim; private string _col_delim; private Encoding _encoding; // Define a non-default constructor since I want to pass in my own parameters public DriverExtractor( string row_delim = "\r\n", string col_delim = ",“ , Encoding encoding = null ) _encoding = encoding == null ? Encoding.UTF8 : encoding; _row_delim = _encoding.GetBytes(row_delim); _col_delim = col_delim; } // DriverExtractor // Converting text to target schema private void OutputValueAtCol_I(string c, int i, IUpdatableRow outputrow) var schema = outputrow.Schema; if (schema[i].Type == typeof(int)) var tmp = Convert.ToInt32(c); outputrow.Set(i, tmp); } ... } //SerializeCol public override IEnumerable<IRow> Extract( IUnstructuredReader input , IUpdatableRow outputrow) foreach (var row in input.Split(_row_delim)) using(var s = new StreamReader(row, _encoding)) int i = 0; foreach (var c in s.ReadToEnd().Split(new[] { _col_delim }, StringSplitOptions.None)) OutputValueAtCol_I(c, i++, outputrow); } // foreach } // using yield return outputrow.AsReadOnly(); } // Extract } // class DriverExtractor UDO model Marking UDOs Parameterizing UDOs UDO signature UDO-specific processing pattern Rowsets and their schemas in UDOs Setting results By position By name

UDO Tips and Warnings Warnings and better alternatives: Tips when Using UDOs: READONLY clause to allow pushing predicates through UDOs REQUIRED clause to allow column pruning through UDOs PRESORT on REDUCE if you need global order Hint Cardinality if it does choose the wrong plan Warnings and better alternatives: Use SELECT with UDFs instead of PROCESS Use User-defined Aggregators instead of REDUCE Learn to use Windowing Functions (OVER expression) Good use-cases for PROCESS/REDUCE/COMBINE: The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO You need an ordered Aggregator or produce more than 1 row per group UDO Tips and Warnings

Additional Resources Blogs and community page: http://usql.io (U-SQL Github) http://blogs.msdn.microsoft.com/azuredatalake/ http://blogs.msdn.microsoft.com/mrys/ https://channel9.msdn.com/Search?term=U-SQL#ch9Search Documentation and articles: http://aka.ms/usql_reference https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 ADL forums and feedback http://aka.ms/adlfeedback https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql

11/13/2018 7:31 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.