Presentation is loading. Please wait.

Presentation is loading. Please wait.

BR013.

Similar presentations


Presentation on theme: "BR013."— Presentation transcript:

1 BR013

2 Killer Scenarios with Data Lake in Azure with U-SQL
11/13/2018 7:31 PM Killer Scenarios with Data Lake in Azure with U-SQL Michael Rys Principal Program Manager Big © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Agenda Today (BR013): Killer extensibility in Azure Data Lake with U-SQL Custom rowset aggregation How to do JSON processing Image processing How to call R from U-SQL Yesterday (BR014): Introduction to Azure Data Lake and U-SQL What is Azure Data Lake? Why U-SQL? Core concepts Schema on read on file and file sets C# extensibility SQL with U-SQL Script level execution and optimization Tool usage

4 U-SQL extensibility Built-in operators, function, aggregates
Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined functions (UDFs) User-defined aggregates (UDAGGs) User-defined operators (UDOs)

5 What are UDOs? User-Defined Extractors User-Defined Outputters
User-Defined Processors Take one row and produce one row Pass-through versus transforming User-Defined Appliers Take one row and produce 0 to n rows Used with OUTER/CROSS APPLY User-Defined Combiners Combines rowsets (like a user-defined join) User-Defined Reducers Take n rows and produce m rows (normally m<n) Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): EXTRACT OUTPUT PROCESS COMBINE REDUCE

6 How to specify UDOs? Code behind vs Assemblies

7 Managing Assemblies CREATE ASSEMBLY db.assembly FROM @path;
CREATE ASSEMBLY db.assembly FROM byte[]; Can also include additional resource files REFERENCE ASSEMBLY db.assembly; Referencing .Net Framework Assemblies Always accessible system namespaces: U-SQL specific (e.g., for SQL.MAP) All provided by system.dll system.core.dll system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq) Add all other .Net Framework Assemblies with: REFERENCE SYSTEM ASSEMBLY [System.XML]; Enumerating Assemblies Powershell command U-SQL Studio Server Explorer DROP ASSEMBLY db.assembly; Create assemblies Reference assemblies Enumerate assemblies Drop assemblies VisualStudio makes registration easy!

8 USING clause 'USING' csharp_namespace | Alias '=' csharp_namespace_or_class. Examples: input string = "somejsonfile.json"; REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; USING Microsoft.Analytics.Samples.Formats.Json; @data0 = EXTRACT IPAddresses string USING new JsonExtractor("Devices[*]"); USING json = [Microsoft.Analytics.Samples.Formats.Json.JsonExtractor]; @data1 = USING new json("Devices[*]"); Allows shortening and disambiguating C# namespace and class names

9 Overlapping Range Aggregation
Explain problem

10 Overlapping Range Aggregation
Explain Code

11 JSON Processing Explain problem

12 JSON Processing Architecture of JSON assemblies
Flat vs nested vs array JSON Single doc vs multidoc JSON JSON DOM vs JSONReader Processing

13 Image Processing Explain problem

14 Image Processing Architecture of Image processing assemblies
Memory limits Memory pressures: UDFs vs Processor vs Extractor

15 R Processing Explain problem

16 U-SQL Processing with R
Architecture of R processing assemblies (and similar extensions: Python/JVM) Interop challenges No external access from UDOs Future work: More generic samples, More automatic experiences

17 Summary of U-SQL UDOs 11/13/2018 7:31 PM
© 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 UDO model Marking UDOs Parameterizing UDOs UDO signature
[SqlUserDefinedExtractor] public class DriverExtractor : IExtractor { private byte[] _row_delim; private string _col_delim; private Encoding _encoding; // Define a non-default constructor since I want to pass in my own parameters public DriverExtractor( string row_delim = "\r\n", string col_delim = ",“ , Encoding encoding = null ) _encoding = encoding == null ? Encoding.UTF8 : encoding; _row_delim = _encoding.GetBytes(row_delim); _col_delim = col_delim; } // DriverExtractor // Converting text to target schema private void OutputValueAtCol_I(string c, int i, IUpdatableRow outputrow) var schema = outputrow.Schema; if (schema[i].Type == typeof(int)) var tmp = Convert.ToInt32(c); outputrow.Set(i, tmp); } ... } //SerializeCol public override IEnumerable<IRow> Extract( IUnstructuredReader input , IUpdatableRow outputrow) foreach (var row in input.Split(_row_delim)) using(var s = new StreamReader(row, _encoding)) int i = 0; foreach (var c in s.ReadToEnd().Split(new[] { _col_delim }, StringSplitOptions.None)) OutputValueAtCol_I(c, i++, outputrow); } // foreach } // using yield return outputrow.AsReadOnly(); } // Extract } // class DriverExtractor UDO model Marking UDOs Parameterizing UDOs UDO signature UDO-specific processing pattern Rowsets and their schemas in UDOs Setting results By position By name

19 UDO Tips and Warnings Warnings and better alternatives:
Tips when Using UDOs: READONLY clause to allow pushing predicates through UDOs REQUIRED clause to allow column pruning through UDOs PRESORT on REDUCE if you need global order Hint Cardinality if it does choose the wrong plan Warnings and better alternatives: Use SELECT with UDFs instead of PROCESS Use User-defined Aggregators instead of REDUCE Learn to use Windowing Functions (OVER expression) Good use-cases for PROCESS/REDUCE/COMBINE: The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO You need an ordered Aggregator or produce more than 1 row per group UDO Tips and Warnings

20 Additional Resources Blogs and community page:
(U-SQL Github) Documentation and articles: ADL forums and feedback

21 11/13/2018 7:31 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "BR013."

Similar presentations


Ads by Google