Download presentation
Presentation is loading. Please wait.
1
Azure Data Lake Store & Analytics
Neck deep in the drink…
2
Agenda Introductions What is this Thing? Azure Data Lake Store Quick End to End Demo Environment Setup Azure Data Lake Analytics & U-SQL Discussion
3
Who Am I? Audrey Hammonds
Practice Lead, BI & Analytics at Innovative Architects Recently moved from Atlanta to West Palm Beach Organizer, Palm Beach Data Meetup Board of Directors, Palm Beach Tech Association LinkedIn: Blog: Datachix.com (with Julie Smith)
4
Twitter: @InnovArchitects Facebook: facebook.com/InnovativeArchitects
Blog: blog.innovativearchitects.com/ Founded in Atlanta 13 years in project-based consulting ~120 employees #6 best small business to work for in ATL in 2017 People who don’t suck Focus on: Data Integration Mobile App Dev Cloud Infrastructure Being Awesome
5
The Future is Here!
6
What is this Data Lake thing?
7
The Main Parts Not a single technology, more of an approach
8
We transform the data into that schema
We Hold These Truths… A database has a schema We transform the data into that schema Data conforms to the schema we define The schema defines the business Boxes within boxes is our design instinct – what if we turned that on its head?
9
Schema Constraints Data Transform The Mind-Blowing Proposition…
Data Lake, not Database. Fluid, unstructured, ebbs & flows with the weather, etc., etc.
10
Data Formats Structured Unstructured Semi-Structured
Data Lake, not Database. Fluid, unstructured, ebbs & flows with the weather, etc., etc.
11
Contrasting Philosophies
ETL ISA ETL – Extract, Transform, Load ISA – Ingest, Store, Analyze Load fast & flexible – worry about structure and integration later Extract Ingest Transform Store Load Analyze
12
Azure Data Lake Store
13
The What
14
Traditional data warehousing is the antithesis of agile
The Whys Why Data Lake? Traditional data warehousing is the antithesis of agile ETL is costly, time-consuming, and fraught with peril Schema is hard to change – future-proof it Scale performance as needed
15
Scale compute and store independently
The Whys Why on Azure? Scale compute and store independently Leverage MPP capabilities (even against a single object) Secure with Azure Active Directory
16
Native Support for file/folder operations
The Whys Why HDFS? Native Support for file/folder operations Compatible with Spark, Storm, Flume, Sqoop, Kafka, R, etc. Open standards mean better long-term integration
17
What stores can I load from Data Lake Store?
The Hows What stores can I load from Data Lake Store? Azure Blob Storage Data Lake Store Cosmos DB SQL Database SQL Data Warehouse Search Index Table storage Databases SQL Server Oracle File File System
18
What stores can I load to Data Lake Store?
The Hows What stores can I load to Data Lake Store? Azure Blob Storage Cosmos DB Data Lake Store SQL Database SQL Data Warehouse Table storage Databases SQL Server Oracle Amazon Redshift DB2 MySQL PostgreSQL SAP Business Warehouse SAP HANA Sybase Teradata File File System Amazon S3 FTP HDFS SFTP NoSQL Cassandra MongoDB Others Generic HTTP Generic OData Generic ODBC Salesforce Web Table (from HTML) GE Historian
19
Streaming Analytics Service
The Hows How do I load data? U-SQL Azure Data Factory Storm Streaming Analytics Service Event Hub SSIS Sqoop SSIS Resource:
20
Automatically replicated (3 copies in a single region)*
The Hows How do I manage DR? Automatically replicated (3 copies in a single region)* *This is where Hadoop/HDFS architecture comes in handy
21
How do I secure this thing?
The Hows How do I secure this thing?
22
Schema is defined on read Breathe… It’s all going to be okay.
The Hows Seriously? No Schema? Schema is defined on read Breathe… It’s all going to be okay.
23
Demo for Context! Run through Explore Sample Jobs here…
24
Environment Setup
25
Getting Started Step 1: Get an Azure Account
Step 2: Provision a Data Lake Store Step 3: Set up Data Lake Analytics
26
Getting Samples Go to Data Lake Analytics
Click on “Explore interactive Tutorials” Click “Copy Sample Data” Expand “Website Log Analysis”
27
First Jobs Go to Data Lake Analytics Click on “Explore sample jobs”
Click “Query a TSV file” Follow instructions to run it and 3 other jobs
28
Visual Studio Integration
Pre-Requisites Visual Studio 2012 or higher Azure SDK for .NET or higher Install: Show sample data and sample Ambulance jobs
29
Sample Project Show sample data and sample Ambulance jobs Remember when we copied sample data in the Azure Portal? It is used by the sample U-SQL solution in Visual Studio
30
Emulating the Cloud Azure SDK allows you to emulate the cloud for development purposes Show sample data and sample Ambulance jobs
31
Azure Data Lake Analytics
32
The Basic Process Develop U-SQL Script Save U-SQL Script
Azure Data Lake Store Azure Blob Storage Local Workstation Azure Data Lake Store Azure Blob Storage Submit the Job Azure Portal Azure PowerShell .NET SDK CLI Read/Write
33
The What Distributed service built on Apache YARN*
Provides dynamic scaling of compute processes Key component of Cortana Analytics Suite Works with Azure SQL Data Warehouse, Power BI, Azure Data Factory *YARN – Yet Another Resource Negotiator
34
The Whys Familiar tools (SQL/C# and Visual Studio) to lower barriers to entry for Big Data Configurable scaling allows developer to control cost vs. processing time AAD integration simplifies security and allows integration with existing artifacts
35
The Hows U-SQL for Jobs – 1 script = 1 job Run jobs from: Azure Portal
Visual Studio Azure PowerShell Command-Line Interface (CLI) Show job details/summary, vertices, replay, heat maps, job graph, etc.
36
A Job in Action Show job details/summary, vertices, replay, heat maps, job graph, etc.
37
Heat Maps Show job details/summary, vertices, replay, heat maps, job graph, etc.
38
Other Cool Things… Job Resource View Vertex Execution View
Show job details/summary, vertices, replay, heat maps, job graph, etc.
39
U-SQL
40
If T-SQL and C# had a baby…
Microsoft’s internal Big Data Language Unified Structured & Unstructured data processing Type system is based on C# (this will be an adjustment for us) Case sensitive
41
Resources Data Lake Store: Data Lake Analytics: U-SQL Reference:
42
Contact Info & Discussion Thank you! Slides, sources, and scripts at the SQL Saturday site me at: Speed is often confused with insight. When I start running earlier than the others, I appear faster -- Johan Cruyff
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.