Download presentation
Presentation is loading. Please wait.
1
BR013
2
Killer Scenarios with Data Lake in Azure with U-SQL
11/13/2018 7:31 PM Killer Scenarios with Data Lake in Azure with U-SQL Michael Rys Principal Program Manager Big © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Agenda Today (BR013): Killer extensibility in Azure Data Lake with U-SQL Custom rowset aggregation How to do JSON processing Image processing How to call R from U-SQL Yesterday (BR014): Introduction to Azure Data Lake and U-SQL What is Azure Data Lake? Why U-SQL? Core concepts Schema on read on file and file sets C# extensibility SQL with U-SQL Script level execution and optimization Tool usage
4
U-SQL extensibility Built-in operators, function, aggregates
Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined functions (UDFs) User-defined aggregates (UDAGGs) User-defined operators (UDOs)
5
What are UDOs? User-Defined Extractors User-Defined Outputters
User-Defined Processors Take one row and produce one row Pass-through versus transforming User-Defined Appliers Take one row and produce 0 to n rows Used with OUTER/CROSS APPLY User-Defined Combiners Combines rowsets (like a user-defined join) User-Defined Reducers Take n rows and produce m rows (normally m<n) Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): EXTRACT OUTPUT PROCESS COMBINE REDUCE
6
How to specify UDOs? Code behind vs Assemblies
7
Managing Assemblies CREATE ASSEMBLY db.assembly FROM @path;
CREATE ASSEMBLY db.assembly FROM byte[]; Can also include additional resource files REFERENCE ASSEMBLY db.assembly; Referencing .Net Framework Assemblies Always accessible system namespaces: U-SQL specific (e.g., for SQL.MAP) All provided by system.dll system.core.dll system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq) Add all other .Net Framework Assemblies with: REFERENCE SYSTEM ASSEMBLY [System.XML]; Enumerating Assemblies Powershell command U-SQL Studio Server Explorer DROP ASSEMBLY db.assembly; Create assemblies Reference assemblies Enumerate assemblies Drop assemblies VisualStudio makes registration easy!
8
USING clause 'USING' csharp_namespace | Alias '=' csharp_namespace_or_class. Examples: input string = "somejsonfile.json"; REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; USING Microsoft.Analytics.Samples.Formats.Json; @data0 = EXTRACT IPAddresses string USING new JsonExtractor("Devices[*]"); USING json = [Microsoft.Analytics.Samples.Formats.Json.JsonExtractor]; @data1 = USING new json("Devices[*]"); Allows shortening and disambiguating C# namespace and class names
9
Overlapping Range Aggregation
Explain problem
10
Overlapping Range Aggregation
Explain Code
11
JSON Processing Explain problem
12
JSON Processing Architecture of JSON assemblies
Flat vs nested vs array JSON Single doc vs multidoc JSON JSON DOM vs JSONReader Processing
13
Image Processing Explain problem
14
Image Processing Architecture of Image processing assemblies
Memory limits Memory pressures: UDFs vs Processor vs Extractor
15
R Processing Explain problem
16
U-SQL Processing with R
Architecture of R processing assemblies (and similar extensions: Python/JVM) Interop challenges No external access from UDOs Future work: More generic samples, More automatic experiences
17
Summary of U-SQL UDOs 11/13/2018 7:31 PM
© 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
UDO model Marking UDOs Parameterizing UDOs UDO signature
[SqlUserDefinedExtractor] public class DriverExtractor : IExtractor { private byte[] _row_delim; private string _col_delim; private Encoding _encoding; // Define a non-default constructor since I want to pass in my own parameters public DriverExtractor( string row_delim = "\r\n", string col_delim = ",“ , Encoding encoding = null ) _encoding = encoding == null ? Encoding.UTF8 : encoding; _row_delim = _encoding.GetBytes(row_delim); _col_delim = col_delim; } // DriverExtractor // Converting text to target schema private void OutputValueAtCol_I(string c, int i, IUpdatableRow outputrow) var schema = outputrow.Schema; if (schema[i].Type == typeof(int)) var tmp = Convert.ToInt32(c); outputrow.Set(i, tmp); } ... } //SerializeCol public override IEnumerable<IRow> Extract( IUnstructuredReader input , IUpdatableRow outputrow) foreach (var row in input.Split(_row_delim)) using(var s = new StreamReader(row, _encoding)) int i = 0; foreach (var c in s.ReadToEnd().Split(new[] { _col_delim }, StringSplitOptions.None)) OutputValueAtCol_I(c, i++, outputrow); } // foreach } // using yield return outputrow.AsReadOnly(); } // Extract } // class DriverExtractor UDO model Marking UDOs Parameterizing UDOs UDO signature UDO-specific processing pattern Rowsets and their schemas in UDOs Setting results By position By name
19
UDO Tips and Warnings Warnings and better alternatives:
Tips when Using UDOs: READONLY clause to allow pushing predicates through UDOs REQUIRED clause to allow column pruning through UDOs PRESORT on REDUCE if you need global order Hint Cardinality if it does choose the wrong plan Warnings and better alternatives: Use SELECT with UDFs instead of PROCESS Use User-defined Aggregators instead of REDUCE Learn to use Windowing Functions (OVER expression) Good use-cases for PROCESS/REDUCE/COMBINE: The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO You need an ordered Aggregator or produce more than 1 row per group UDO Tips and Warnings
20
Additional Resources Blogs and community page:
(U-SQL Github) Documentation and articles: ADL forums and feedback
21
11/13/2018 7:31 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.