Presentation is loading. Please wait.

Presentation is loading. Please wait.

Azure Data Lake Store & Analytics

Similar presentations


Presentation on theme: "Azure Data Lake Store & Analytics"— Presentation transcript:

1 Azure Data Lake Store & Analytics
Neck deep in the drink…

2 Agenda Introductions What is this Thing? Azure Data Lake Store Quick End to End Demo Environment Setup Azure Data Lake Analytics & U-SQL Discussion

3 Who Am I? Audrey Hammonds
Practice Lead, BI & Analytics at Innovative Architects Recently moved from Atlanta to West Palm Beach Organizer, Palm Beach Data Meetup Board of Directors, Palm Beach Tech Association LinkedIn: Blog: Datachix.com (with Julie Smith)

4 Twitter: @InnovArchitects Facebook: facebook.com/InnovativeArchitects
Blog: blog.innovativearchitects.com/ Founded in Atlanta 13 years in project-based consulting ~120 employees #6 best small business to work for in ATL in 2017 People who don’t suck Focus on: Data Integration Mobile App Dev Cloud Infrastructure Being Awesome

5 The Future is Here!

6 What is this Data Lake thing?

7 The Main Parts Not a single technology, more of an approach

8 We transform the data into that schema
We Hold These Truths… A database has a schema We transform the data into that schema Data conforms to the schema we define The schema defines the business Boxes within boxes is our design instinct – what if we turned that on its head?

9 Schema Constraints Data Transform The Mind-Blowing Proposition…
Data Lake, not Database. Fluid, unstructured, ebbs & flows with the weather, etc., etc.

10 Data Formats Structured Unstructured Semi-Structured
Data Lake, not Database. Fluid, unstructured, ebbs & flows with the weather, etc., etc.

11 Contrasting Philosophies
ETL ISA ETL – Extract, Transform, Load ISA – Ingest, Store, Analyze Load fast & flexible – worry about structure and integration later Extract Ingest Transform Store Load Analyze

12 Azure Data Lake Store

13 The What

14 Traditional data warehousing is the antithesis of agile
The Whys Why Data Lake? Traditional data warehousing is the antithesis of agile ETL is costly, time-consuming, and fraught with peril Schema is hard to change – future-proof it Scale performance as needed

15 Scale compute and store independently
The Whys Why on Azure? Scale compute and store independently Leverage MPP capabilities (even against a single object) Secure with Azure Active Directory

16 Native Support for file/folder operations
The Whys Why HDFS? Native Support for file/folder operations Compatible with Spark, Storm, Flume, Sqoop, Kafka, R, etc. Open standards mean better long-term integration

17 What stores can I load from Data Lake Store?
The Hows What stores can I load from Data Lake Store? Azure Blob Storage Data Lake Store Cosmos DB SQL Database SQL Data Warehouse Search Index Table storage Databases SQL Server Oracle File File System

18 What stores can I load to Data Lake Store?
The Hows What stores can I load to Data Lake Store? Azure Blob Storage Cosmos DB Data Lake Store SQL Database SQL Data Warehouse Table storage Databases SQL Server Oracle Amazon Redshift DB2 MySQL PostgreSQL SAP Business Warehouse SAP HANA Sybase Teradata File File System Amazon S3 FTP HDFS SFTP NoSQL Cassandra MongoDB Others Generic HTTP Generic OData Generic ODBC Salesforce Web Table (from HTML) GE Historian

19 Streaming Analytics Service
The Hows How do I load data? U-SQL Azure Data Factory Storm Streaming Analytics Service Event Hub SSIS Sqoop SSIS Resource:

20 Automatically replicated (3 copies in a single region)*
The Hows How do I manage DR? Automatically replicated (3 copies in a single region)* *This is where Hadoop/HDFS architecture comes in handy

21 How do I secure this thing?
The Hows How do I secure this thing?

22 Schema is defined on read Breathe… It’s all going to be okay.
The Hows Seriously? No Schema? Schema is defined on read Breathe… It’s all going to be okay.

23 Demo for Context! Run through Explore Sample Jobs here…

24 Environment Setup

25 Getting Started Step 1: Get an Azure Account
Step 2: Provision a Data Lake Store Step 3: Set up Data Lake Analytics

26 Getting Samples Go to Data Lake Analytics
Click on “Explore interactive Tutorials” Click “Copy Sample Data” Expand “Website Log Analysis”

27 First Jobs Go to Data Lake Analytics Click on “Explore sample jobs”
Click “Query a TSV file” Follow instructions to run it and 3 other jobs

28 Visual Studio Integration
Pre-Requisites Visual Studio 2012 or higher Azure SDK for .NET or higher Install: Show sample data and sample Ambulance jobs

29 Sample Project Show sample data and sample Ambulance jobs Remember when we copied sample data in the Azure Portal? It is used by the sample U-SQL solution in Visual Studio

30 Emulating the Cloud Azure SDK allows you to emulate the cloud for development purposes Show sample data and sample Ambulance jobs

31 Azure Data Lake Analytics

32 The Basic Process Develop U-SQL Script Save U-SQL Script
Azure Data Lake Store Azure Blob Storage Local Workstation Azure Data Lake Store Azure Blob Storage Submit the Job Azure Portal Azure PowerShell .NET SDK CLI Read/Write

33 The What Distributed service built on Apache YARN*
Provides dynamic scaling of compute processes Key component of Cortana Analytics Suite Works with Azure SQL Data Warehouse, Power BI, Azure Data Factory *YARN – Yet Another Resource Negotiator

34 The Whys Familiar tools (SQL/C# and Visual Studio) to lower barriers to entry for Big Data Configurable scaling allows developer to control cost vs. processing time AAD integration simplifies security and allows integration with existing artifacts

35 The Hows U-SQL for Jobs – 1 script = 1 job Run jobs from: Azure Portal
Visual Studio Azure PowerShell Command-Line Interface (CLI) Show job details/summary, vertices, replay, heat maps, job graph, etc.

36 A Job in Action Show job details/summary, vertices, replay, heat maps, job graph, etc.

37 Heat Maps Show job details/summary, vertices, replay, heat maps, job graph, etc.

38 Other Cool Things… Job Resource View Vertex Execution View
Show job details/summary, vertices, replay, heat maps, job graph, etc.

39 U-SQL

40 If T-SQL and C# had a baby…
Microsoft’s internal Big Data Language Unified Structured & Unstructured data processing Type system is based on C# (this will be an adjustment for us) Case sensitive

41 Resources Data Lake Store: Data Lake Analytics: U-SQL Reference:

42 Contact Info & Discussion Thank you! Slides, sources, and scripts at the SQL Saturday site me at: Speed is often confused with insight. When I start running earlier than the others, I appear faster -- Johan Cruyff


Download ppt "Azure Data Lake Store & Analytics"

Similar presentations


Ads by Google