Bring Big Data to the masses with U-SQL

Slides:



Advertisements
Similar presentations
Review DirectQuery in SSAS 2016, best practices and use cases
Advertisements

C# and VB code-focused development with Visual Studio
2/20/2018 7:04 PM BRK1038 Meet Azure Information Protection customers and learn about their success stories Jeffrey Kalfut Strategy & Architecture Manager,
Data Platform and Analytics Foundational Training
Microsoft Ignite /30/2018 9:28 PM BRK3174
Transform yourself and build your IT cloud career path
Deliver business insights with Microsoft Dynamics AX and Power BI
Make your app a native part of Office with Add-ins
Data Platform and Analytics Foundational Training
Examine information management in Cortana Intelligence
From IT Pros to IT Heroes - with Azure DevTest Labs
5/22/2018 1:39 AM BRK2156 Power BI Report Server: Self-service BI and enterprise reporting on-premises Christopher Finlan Senior Program Manager © Microsoft.
Develop, debug and deploy containerized applications with Docker
Creating Enterprise Grade BI Models with Azure Analysis Services
Operational Analytics in SQL Server 2016 and Azure SQL Database
Azure on Steroids: Full Automation with PowerShell
Build interactive data analysis environments using Apache Spark
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Use any Amazon S3 application with Azure Blob Storage
BRK3288-Discover data-driven apps that learn and adapt
Windows Server* 2016 & Intel® Technologies
Configure and Manage Your Hybrid Cloud Environment at Scale
Review the Nutanix Cloud Platform System Standard solution
Microsoft Ignite /11/2018 1:18 AM BRK4017
6/12/2018 2:19 PM BRK3245 DirectQuery in Analysis Services: best practices, performance, and use cases Marco Russo SQLBI © Microsoft Corporation. All rights.
Developing Hybrid Apps on Microsoft Azure Stack
AI development using Data Science Virtual Machines (DSVM) in Azure
Migrating your IaaS infrastructure from ASM to ARM without downtime
Microsoft /23/2018 1:11 AM BRK3180 Migrate CRM OnPremise organizations to CRM Online cloud using Dynamics Lifecycle Services (LCS) Aditya Varma Ganapathy.
Web development productivity with Visual Studio
9/12/ :12 PM BRK3323 Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent Michael Rys Principal Program.
Get Typed with TypeScript!
Microsoft Ignite /22/2018 3:27 PM BRK2121
Secure Remote Access to on-premises Web Apps using Azure AD
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
BRK2264 Move 13,000+ global Dynamics CRM users from on-premises to Online at Caterpillar Inc. Todd Byrne & John Finney 1 Business Unit Name Here.
Master Modern PaaS for the Enterprise with Azure App Service
Excel and Power BI Better Together Democratization of data
Get Started with Common Data Model (CDM) and PowerApps
Building Analytics At Scale With USQL and C#
Design Seamless Upgrades to SQL Server 2016 with Query Store
SQL Server Data Tools for Visual Studio Part I: Core SQL Server Tools
Microsoft /8/2018 4:45 PM BRK3062 BRK3062- Build smarter and scalable applications using Microsoft Azure Database Services Moshe Gutman CEO, GeoSafe.
Microsoft Ignite /16/2018 2:39 PM BRK3307
Add intelligence to Dynamics AX with Cortana Intelligence suite
Use server-based personal desktops in Windows Server 2016
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Accelerate Your Transition from Traditional IT to the Cloud
BR013.
Explore web development with Microsoft ASP.NET Core 1.0
Microsoft Ignite /14/ :21 AM BRK2101
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Migrate to CRM Online - Tips and Tricks
Searching for Rio: Azure Search, NBC Sports, and the Olympics
Determine your role in a managed service
Dive into Predictive Maintenance using Cortana Intelligence Suite
The Challenges of moving Document Creation to the Cloud
U-SQL Object Model.
Microsoft Ignite /22/2018 3:58 PM BRK2254
Automating Windows 10 and software deployments from the Cloud
Effective report authoring using Power BI Desktop
Microsoft Connect /24/ :05 AM
Mobile Center and VSTS:​ Better together for your Mobile DevOps
Task recorder in Dynamics AX
Learn how to use and customize the Dynamics AX interactive help system
Power-up NoSQL with Azure Cosmos DB
Microsoft Virtual Academy
Fewer cursors since SQL Server 2012 Came Along
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
Server & Tools Business
Presentation transcript:

Bring Big Data to the masses with U-SQL Microsoft 2016 6/20/2018 6:58 AM BRK3185 Bring Big Data to the masses with U-SQL Michael Rys Principal Program Manager Big Data Team @MikeDoesBigData usql@microsoft.com © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Data Lake approach Store all data Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop and ADLA Devices Social Batch queries Devices LOB applications Video Interactive queries Social LOB applications Real-time analytics Sensors Web Sensors Video Machine Learning Relational Web Clickstream Data warehouse Relational Clickstream

Introducing Azure Data Lake Big Data Made Easy Analytics on any data, any size All users productive on day one Ready for your enterprise

Azure Data Lake (Store, HDInsight, Analytics) ADL Analytics ADL HDInsight YARN Hive U-SQL Storage WebHDFS Store 1

Teaser: Azure Data Lake and U-SQL at Scale SMSG Readiness 6/20/2018 Teaser: Azure Data Lake and U-SQL at Scale © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Session Objectives And Takeaways Tech Ready 15 6/20/2018 Session Objectives And Takeaways Session Objective(s): Introduce U-SQL: The Why and What Show the philosophy of U-SQL Demonstrate the power, scale and simplicity of U-SQL Key Takeaways: You understand why U-SQL is the best language for Big Data Processing Understand how U-SQL scripts process data in a highly scalable way You can use U-SQL to process unstructured and structured data You can use U-SQL’s C# integration to extend your big data processing with custom-code You can explain some main differences between U-SQL and T-SQL You know what data sources can be joined in U-SQL You can use VisualStudio’s ADL tooling to explore and analyze highly scaled out U-SQL jobs © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Characteristics of Big Data Analytics Some sample use cases Digital Crime Unit – Analyze complex attack patterns to understand BotNets and to predict and mitigate future attacks by analyzing log records with complex custom algorithms Image Processing – Large-scale image feature extraction and classification using custom code Shopping Recommendation – Complex pattern analysis and prediction over shopping records using proprietary algorithms Characteristics of Big Data Analytics Requires processing of any type of data Allow use of custom algorithms Scale to any size and be efficient

Status Quo: SQL for Big Data Declarativity does scaling and parallelization for you Extensibility is bolted on and not “native” hard to work with anything other than structured data difficult to extend with custom code Status Quo: SQL for Big Data

Status Quo: Programming Languages for Big Data Extensibility through custom code is “native” Declarativity is bolted on and not “native” User often has to care about scale and performance SQL is 2nd class within string Often no code reuse/ sharing across queries Status Quo: Programming Languages for Big Data

 Declarativity and Extensibility are equally native to the language! Get benefits of both! Makes it easy for you by unifying: Unstructured and structured data processing Declarative SQL and custom imperative Code (C#) Local and remote Queries Increase productivity and agility from Day 1 and at Day 100 for YOU! Why U-SQL?

The origins of U-SQL SCOPE – Microsoft’s internal Big Data language SQL and C# integration model Optimization and Scaling model Runs 100’000s of jobs daily Hive Complex data types (Maps, Arrays) Data format alignment for text files T-SQL/ANSI SQL Many of the SQL capabilities (windowing functions, meta data model etc.) The origins of U-SQL U-SQL SCOPE Hive T-SQL/ ANSI SQL

Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits Avoid moving large amounts of data across the network between stores Single view of data irrespective of physical location Minimize data proliferation issues caused by maintaining multiple copies Single query language for all data Each data store maintains its own sovereignty Design choices based on the need Push SQL expressions to remote SQL sources Projections Filters Joins Azure Data Lake Storage Azure Storage Blobs Query Write Query Write Azure SQL in VMs Azure Data Lake Analytics Query U-SQL Query Query Query Azure SQL DB Azure SQL Data Warehouse

SMSG Readiness 6/20/2018 Show me U-SQL! https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

“Unstructured” Files EXTRACT Expression OUTPUT Expression @s = EXTRACT a string, b int FROM "filepath/file.csv" USING Extractors.Csv(encoding: Encoding.Unicode); Built-in Extractors: Csv, Tsv, Text with lots of options Custom Extractors: e.g., JSON, XML, etc. OUTPUT Expression OUTPUT @s TO "filepath/file.csv" USING Outputters.Csv(); Built-in Outputters: Csv, Tsv, Text Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io) Filepath URIs Relative URI to default ADL Storage account: "filepath/file.csv" Absolute URIs: ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv" WASB: "wasb://container@account/filepath/file.csv" Schema on Read Write to File Built-in and custom Extractors and Outputters ADL Storage and Azure Blob Storage

U-SQL extensibility Built-in operators, function, aggregates Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined functions (UDFs) User-defined aggregates (UDAGGs) User-defined operators (UDOs)

SMSG Readiness 6/20/2018 Extending U-SQL! https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Managing Assemblies CREATE ASSEMBLY db.assembly FROM @path; CREATE ASSEMBLY db.assembly FROM byte[]; Can also include additional resource files REFERENCE ASSEMBLY db.assembly; Referencing .Net Framework Assemblies Always accessible system namespaces: U-SQL specific (e.g., for SQL.MAP) All provided by system.dll system.core.dll system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq) Add all other .Net Framework Assemblies with: REFERENCE SYSTEM ASSEMBLY [System.XML]; Enumerating Assemblies Powershell command U-SQL Studio Server Explorer DROP ASSEMBLY db.assembly; Create assemblies Reference assemblies Enumerate assemblies Drop assemblies VisualStudio makes registration easy!

USING clause 'USING' csharp_namespace | Alias '=' csharp_namespace_or_class. Examples: DECLARE @ input string = "somejsonfile.json"; REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; USING Microsoft.Analytics.Samples.Formats.Json; @data0 = EXTRACT IPAddresses string FROM @input USING new JsonExtractor("Devices[*]"); USING json = [Microsoft.Analytics.Samples.Formats.Json.JsonExtractor]; @data1 = USING new json("Devices[*]"); Allows shortening and disambiguating C# namespace and class names

SMSG Readiness 6/20/2018 Show me File Sets! https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

File Sets Simple pattern language on filename and path Virtual columns @pattern string = "/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}"; Binds two columns date and suffix Wildcards the filename Limits on number of files (Current limit 800 and 3000 being increased in next refresh) Virtual columns EXTRACT name string , suffix string // virtual column , date DateTime // virtual column FROM @pattern USING Extractors.Csv(); Refer to virtual columns in query predicates to get partition elimination (otherwise you will get a warning) Simple Patterns Virtual Columns Only on EXTRACT for now (On OUTPUT by end of year)

Create shareable data and code SMSG Readiness 6/20/2018 Create shareable data and code https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Meta Data Object Model ADLA Account/Catalog [1,n] C# Extractors C# Reducers Database C# Processors C# Fns C# UDTs C# UDAgg C# Applier [1,n] C# Combiners C# Outputters Creden-tials Data Source Schema C# Assemblies [0,n] Ext. tables tables views TVFs Procedures Table Types Statistics Clustered Index Legend Abstract objects User objects MD Name C# Name partitions Contains Refers to Implemented and named by

U-SQL Catalog Naming Discovery Sharing Securing Naming Discovery Default Database and Schema context: master.dbo Quote identifiers with []: [my table] Stores data in ADL Storage /catalog folder Discovery Visual Studio Server Explorer Azure Data Lake Analytics Portal SDKs and Azure Powershell commands Sharing Within an Azure Data Lake Analytics account Securing Secured with AAD principals at catalog and Database level Naming Discovery Sharing Securing

VIEWs and TVFs Views Table-Valued Functions (TVFs) CREATE VIEW V AS EXTRACT… CREATE VIEW V AS SELECT … Cannot contain user-defined objects (e.g. UDF or UDOs)! Will be inlined Table-Valued Functions (TVFs) CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END; Provides parameterization One or more results Can contain multiple statements Can contain user-code (needs assembly reference) Will always be inlined Infers schema or checks against specified return schema Views for simple cases TVFs for parameterization and most cases

Procedures Allows encapsulation of U-SQL scripts CREATE PROCEDURE P (@arg string = "default“) AS BEGIN …; OUTPUT @res TO …; INSERT INTO T …; END; Provides parameterization No result but writes into file or table Can contain multiple statements Can contain user-code (needs assembly reference) Will always be inlined Can contain DDL (but no CREATE, DROP FUNCTION/PROCEDURE)

Tables CREATE TABLE CREATE TABLE AS SELECT CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITION BY (col1) DISTRIBUTED BY HASH (driver_id) ); Structured Data, built-in Data types only (no UDTs) Clustered Index (needs to be specified): row-oriented Fine-grained distribution (needs to be specified): HASH, DIRECT HASH, RANGE, ROUND ROBIN Addressable Partitions (optional) CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); Infer the schema from the query Still requires index and distribution (does not support partitioning) CREATE TABLE CREATE TABLE AS SELECT

When to use Tables Benefits of Table clustering and distribution Faster lookup of data provided by distribution and clustering when right distribution/cluster is chosen Data distribution provides better localized scale out Used for filters, joins and grouping Benefits of Table partitioning Provides data life cycle management (“expire” old partitions) Partial re-computation of data at partition level Query predicates can provide partition elimination Do not use when… No filters, joins and grouping No reuse of the data for future queries If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.

Evolving Tables ALTER TABLE ADD/DROP COLUMN ALTER TABLE T ADD COLUMN eventName string; ALTER TABLE T DROP COLUMN col3; ALTER TABLE T ADD COLUMN result string, clientId string, payload int?; ALTER TABLE T DROP COLUMN clientId, result; Meta-data only operation Existing rows will get Non-nullable types: C# data type default value (e.g., int will be 0) Nullable types: null ALTER TABLE ADD/DROP COLUMN

Let’s do some SQL with U-SQL! SMSG Readiness 6/20/2018 Let’s do some SQL with U-SQL! https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

U-SQL Joins Join operators INNER JOIN LEFT or RIGHT or FULL OUTER JOIN CROSS JOIN SEMIJOIN equivalent to IN subquery ANTISEMIJOIN Equivalent to NOT IN subquery Notes ON clause comparisons need to be of the simple form: rowset.column == rowset.column or AND conjunctions of the simple equality comparison If a comparand is not a column, wrap it into a column in a previous SELECT If the comparison operation is not ==, put it into the WHERE clause turn the join into a CROSS JOIN if no equality comparison Reason: Syntax calls out which joins are efficient

U-SQL Analytics Windowing Expression Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ] [ Order_By_Clause ] [ Row _Clause ] ')'. Window_Function_Call := Aggregate_Function_Call | Analytic_Function_Call | Ranking_Function_Call. Windowing Aggregate Functions ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP Analytics Functions CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, LEAD, LAG Ranking Functions DENSE_RANK, NTILE, RANK, ROW_NUMBER U-SQL Analytics

“Top 5”s Surprises for SQL Users AS is not as C# keywords and SQL keywords overlap Costly to make case-insensitive -> Better build capabilities than tinker with syntax = != == Remember: C# expression language null IS NOT NULL C# nulls are two-valued PROCEDURES but no WHILE No UPDATE nor MERGE

Show me U-SQL UDOs! Start Time - End Time - User Name 5:00 AM - 6:00 AM - ABC 5:00 AM - 6:00 AM - XYZ 8:00 AM - 9:00 AM - ABC 8:00 AM - 10:00 AM - ABC 10:00 AM - 2:00 PM - ABC 7:00 AM - 11:00 AM - ABC 9:00 AM - 11:00 AM - ABC 11:00 AM - 11:30 AM - ABC 11:40 PM - 11:59 PM - FOO 11:50 PM - 0:40 AM - FOO Start Time - End Time - User Name 5:00 AM - 6:00 AM - ABC 5:00 AM - 6:00 AM - XYZ 7:00 AM - 2:00 PM - ABC 11:40 PM - 0:40 AM - FOO https://blogs.msdn.microsoft.com/azuredatalake/2016/06/27/how-do-i-combine-overlapping-ranges-using-u-sql-introducing-u-sql-reducer-udos/

What are UDOs? Custom Operator Extensions Scaled out by U-SQL User-Defined Extractors User-Defined Outputters User-Defined Processors Take one row and produce one row Pass-through versus transforming User-Defined Appliers Take one row and produce 0 to n rows Used with OUTER/CROSS APPLY User-Defined Combiners Combines rowsets (like a user-defined join) User-Defined Reducers Take n rows and produce m rows (normally m<n) Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): EXTRACT OUTPUT PROCESS COMBINE REDUCE Custom Operator Extensions Scaled out by U-SQL

UDO Tips and Warnings Warnings and better alternatives: Tips when Using UDOs: READONLY clause to allow pushing predicates through UDOs REQUIRED clause to allow column pruning through UDOs PRESORT on REDUCE if you need global order Hint Cardinality if it does choose the wrong plan Warnings and better alternatives: Use SELECT with UDFs instead of PROCESS Use User-defined Aggregators instead of REDUCE Learn to use Windowing Functions (OVER expression) Good use-cases for PROCESS/REDUCE/COMBINE: The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO You need an ordered Aggregator or produce more than 1 row per group UDO Tips and Warnings

U-SQL Language Philosophy Declarative Query and Transformation Language: Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions Optimizable, Scalable Expression-flow programming style: Easy to use functional lambda composition Composable, globally optimizable Operates on Unstructured & Structured Data Schema on read over files Relational metadata objects (e.g. database, table) Extensible from ground up: Type system is based on C# Expression language IS C# User-defined functions (U-SQL and C#) User-defined Aggregators (C#) User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;

Explore Azure Data Lake and U-SQL at Scale SMSG Readiness 6/20/2018 Explore Azure Data Lake and U-SQL at Scale © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Expression-flow Programming Style Automatic "in-lining" of U-SQL expressions – whole script leads to a single execution model. Execution plan that is optimized out-of-the-box and w/o user intervention. Per job and user driven level of parallelization. Detail visibility into execution steps, for debugging. Heatmap like functionality to identify performance bottlenecks.

This is why U-SQL! Unifies natively SQL’s declarativity and C#’s extensibility Unifies querying structured and unstructured Unifies local and remote queries Increase productivity and agility from Day 1 forward for YOU! Sign up for an Azure Data Lake account and join the Public Preview http://www.azure.com/datalake and give us your feedback via http://aka.ms/adlfeedback or at http://aka.ms/u-sql-survey!

In Review: Session Objectives & Takeaways Tech Ready 15 6/20/2018 In Review: Session Objectives & Takeaways Session Objective(s): Introduce U-SQL: The Why and What Show the philosophy of U-SQL Demonstrate the power, scale and simplicity of U-SQL Key Takeaways: You understand why U-SQL is the best language for Big Data Processing Understand how U-SQL scripts process data in a highly scalable way You can use U-SQL to process unstructured and structured data You can use U-SQL’s C# integration to extend your big data processing with custom-code You can explain some main differences between U-SQL and T-SQL You know what data sources can be joined in U-SQL You can use VisualStudio’s ADL tooling to explore and analyze highly scaled out U-SQL jobs © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Additional Resources Blogs and community page: http://usql.io (U-SQL Github) http://blogs.msdn.microsoft.com/mrys/ http://blogs.msdn.microsoft.com/azuredatalake/ https://channel9.msdn.com/Search?term=U-SQL#ch9Search Documentation and articles: http://aka.ms/usql_reference https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 ADL forums and feedback http://aka.ms/adlfeedback https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql

Please evaluate this session 6/20/2018 6:58 AM Please evaluate this session Your feedback is important to us! From your PC or Tablet visit MyIgnite at http://myignite.microsoft.com From your phone download and use the Ignite Mobile App by scanning the QR code above or visiting https://aka.ms/ignite.mobileapp © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Free IT Pro resources To advance your career in cloud technology Microsoft Ignite 2016 6/20/2018 6:58 AM Free IT Pro resources To advance your career in cloud technology Plan your career path Microsoft IT Pro Career Center www.microsoft.com/itprocareercenter Cloud role mapping Expert advice on skills needed Self-paced curriculum by cloud role $300 Azure credits and extended trials Pluralsight 3 month subscription (10 courses) Phone support incident Weekly short videos and insights from Microsoft’s leaders and engineers Connect with community of peers and Microsoft experts Get started with Azure Microsoft IT Pro Cloud Essentials www.microsoft.com/itprocloudessentials Demos and how-to videos Microsoft Mechanics www.microsoft.com/mechanics Connect with peers and experts Microsoft Tech Community https://techcommunity.microsoft.com © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/20/2018 6:58 AM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.