Making your Data Lake smarter with Cognitive Services Helge Rege Gårdsvoll Data Manager, Hafslund Strøm @dataHelge
Our awesome sponsors! Please visit the sponsor area in the break and interact with them. They are the reason we can hold this conference free of charge!
Azure Data Lake has three components
Data Lake Store Data Lake Store Hafslund’s Data Lake Store is divided by subsidiary, and then organized with Input folder for input formats Staging folder for processed data Reference for reference data Sandbox for sandboxing and experimentation Data Lake Store Data Lake Store is a high capacity storage for all types of data We ingest data into the Data Lake Store without changing the format Processed data is written into the Data Lake Store for storage and analytics Parts of the Data Lake Store is a sandbox Access is limited by Access Control Lists (ACL) in Active Directory Only analysts and super users that access data in the Data Lake Store directly Auditing is performed with built in functions Data is encrypted in transit and in storage
Data Lake Analytics Data Lake Analytics Data Lake Analytics is the primary data transformation method for Hafslund Strøm, and business logic should be implemented in Data Lake Analytics Data Lake Analytics Data Lake Analytics is a highly scalable analytics service for transforming data. Data is transformed with U- SQL scripts, that unifies SQL and C# The job service provides flexibility for cost/value considerations, and scalable performance. Data is typically read from input or staging folders/tables of Data Lake Store and written into staging files and tables Scheduling of Data Lake Analytics is handled by Data Factory Jobs are run as batch, with a given number of Analytics Units. Cost and time are considered when setting the number of units
The typical U-SQL example @rows = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "mylake/orders.csv" USING Extractors.Csv(); @rows = SELECT * FROM @rows WHERE Amount > 1000; OUTPUT @rows TO "mylake/orders_copy.txt" USING Outputter.Csv();
Cognitive Services: APIs to see, hear, understand and interpet your data Data Lake Support Data Lake Support Data Lake Support Data Lake Support
Getting started Go to Sample Scripts for your Data Lake Analytics Account Select «Install U-SQL Extensions» This will add new assemblies to your account Cognitive R Python
Demo: Images
Demo: Text
Want to learn more? Usql.io U-SQL tutorial: https://saveenr.gitbooks.io/usql-tutorial/content/ U-SQL Cognitive tutorial: https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-cognitive
Helge Rege Gårdsvoll, helge.gardsvoll@hafslundstrom.no Thank you! Helge Rege Gårdsvoll, helge.gardsvoll@hafslundstrom.no @dataHelge