U-SQL Object Model
Meta Data Object Model Legend Procedures Table Types ADLA Catalog C# Extractors C# Reducers Database C# Processors C# Fns C# UDTs C# UDAgg C# Applier [1,n] C# Combiners C# Outputters Creden-tials Data Source Schema C# Assemblies [0,n] Ext. tables tables views TVFs Procedures Table Types Legend Statistics Clustered Index Abstract objects User objects MD Name C# Name partitions Contains Refers to Implemented and named by
U-SQL Catalog Naming Discovery Sharing Naming Securing Discovery Default database and schema context: master.dbo Quote identifiers with []: [my table] Stores data in ADL Storage /catalog folder Discovery Visual Studio Server Explorer Azure Data Lake Analytics Portal SDKs and Azure PowerShell commands Sharing Within an Azure Data Lake Analytics account Securing Secured with AAD principals at catalog level (inherited from ADL Storage) Naming Discovery Sharing Securing
DEMO Create shareable data and code
Views and TVFs Views for simple cases CREATE VIEW V AS EXTRACT… CREATE VIEW V AS SELECT … Cannot contain user-defined objects (such as UDFs or UDOs) Will be inlined Table-Valued Functions (TVFs) CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END; Provides parameterization One or more results Can contain multiple statements Can contain user-code (needs assembly reference) Will always be inlined Infers schema or checks against specified return schema Views for simple cases TVFs for parameterization and most cases
Procedures Allows encapsulation of non-DDL scripts CREATE PROCEDURE P (@arg string = "default“) AS BEGIN …; OUTPUT @res TO …; INSERT INTO T …; END; Provides parameterization No result but writes into file or table Can contain multiple statements Can contain user code (needs assembly reference) Will always be inlined Cannot contain DDL (no CREATE, DROP) Allows encapsulation of non-DDL scripts
Table types Enables you to name a table schema Provides reuse for function/procedure definitions CREATE TYPE T AS TABLE(c1 string, c2 int ); CREATE FUNCTION F (@table_arg T) RETURNS @res T AS BEGIN … @res = … END;
Tables CREATE TABLE CREATE TABLE AS SELECT CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col1 ASC) PARTITIONED BY HASH (driver_id) ); Structured Data Built-in Data types only (no UDTs) Clustered index (must be specified): row-oriented Fine-grained partitioning (must be specified): HASH, DIRECT HASH, RANGE, ROUND ROBIN CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); Infer the schema from the query Still requires index and partitioning CREATE TABLE CREATE TABLE AS SELECT
http://aka.ms/AzureDataLake