Microsoft Machine Learning & Data Science Summit

Slides:



Advertisements
Similar presentations
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Advertisements

Platinum Sponsors Titanium Sponsors. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
Breaking points of traditional approach What if you could handle big data?
Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.
Andy Roberts Data Architect
Microsoft Cognitive Services and Cortana Analytics
Business Insights Play briefing deck.
Microsoft Machine Learning & Data Science Summit
Connected Infrastructure
Fan Engagement Solution
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
Data Platform Modernization
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connected Living Connected Living What to look for Architecture
5/13/2018 1:53 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Data Platform and Analytics Foundational Training
Smart Building Solution
Examine information management in Cortana Intelligence
Cortana Intelligence Overview
The story of an IoT solution
S4 Solution Specialist Sales Summit
Creating Enterprise Grade BI Models with Azure Analysis Services
Microsoft Machine Learning & Data Science Summit
Orchestrating Data and Services with Azure Data Factory
Why Is My SQL DW Query Slow?
What has Azure to offer to IoT Developers?
Machine Learning in practice
Enable the Hybrid Data Platform
ADF & SSIS: New Capabilities for Data Integration in the Cloud
IoT at the Edge Technical guidance deck.
Smart Building Solution
7/4/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Connected Infrastructure
Azure ML and Cognitive Services
Extensible Platform Microsoft Dynamics 365
Data Platform and Analytics Foundational Training
Overview of the Microsoft Azure serverless platform
Remote Monitoring solution
Integrate Power BI with Microsoft Dynamics
Create and publish reports with Power BI for desktop
9/14/ :46 AM BRK3293 How the Portland Trail Blazers Use Personalization and Acxiom Data to Target Customers Chris Hoder Program Manager, AI + Research.
Add intelligence to Dynamics AX with Cortana Intelligence suite
Microsoft Ignite NZ October 2016 SKYCITY, Auckland
Cloudy with a Chance of Data
Mikael Hakansson IoT – Common patterns and practices Integration MVP
Microsoft Azure Certified
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Turning back time … … to 1998.
Microsoft Virtual Academy
IoT at the Edge Technical guidance deck.
Azure Data Catalog Adoption Patterns and Best Practices
Power Apps & Flow for Microsoft Dynamics SL
Overview of Azure Data Lake Store
Business Intelligence for Project Server/Online
Dive into Predictive Maintenance using Cortana Intelligence Suite
Data Platform Modernization
11/22/2018 1:43 PM THR3005 How to provide business insight from your data using Azure Analysis Services Peter Myers Bitwise Solutions © Microsoft Corporation.
The Internet of Things (IoT) from the back-end perspective
Context about the Data Warehouse
2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Virtual Reality with Azure and Unity
HDInsight Tools for Visual Studio
Office 365 Development July 2014.
Customer 360.
Big Data Clusters SQL Server 2019 Meets Big Data
Presentation transcript:

Microsoft Machine Learning & Data Science Summit September 26 – 27 | Atlanta, GA

Patterns & Practices for Data Integration at Scale Anand Subbaraj (anandsub@microsoft.com), Principal Program Manager Gaurav Malhotra (gamal@microsoft.com), Senior Program Manager

Today’s topics Traditional Analytics Platforms and Evolving Approaches to Analytics Data Integration Architectural Patterns (Stream & Batch) Data Integration Use Cases

Evolving Approaches to Analytics 1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Apps © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Evolving Approaches to Analytics 1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Ingest (EL) Devices Apps Sensors Original Data Web © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Evolving Approaches to Analytics 1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc.) OLTP … ETL Tool (SSIS, etc.) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Devices Ingest (EL) Scale-out Storage & Compute (DataLake Store, HDFS, Blob Storage, etc.) Apps Sensors Original Data Web Streaming data Transform & Load © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Cortana Intelligence Suite Transform data into intelligent action Build 2015 1/24/2018 4:33 PM Cortana Intelligence Suite Transform data into intelligent action Information Management Azure Data Factory Data Catalog Event Hub Big Data Stores Azure Data Lake SQL Data Warehouse Machine Learning and Analytics Azure Machine Learning HDInsight (Hadoop) Stream Analytics Dashboards and Visualizations Power BI Business apps Custom apps Sensors and devices Personal Digital Assistant Cortana People Perceptual Intelligence Face, vision Speech, text Automated Systems Azure Data Lake Analytics Business Scenarios Recommendations, customer churn, forecasting, etc. DATA INTELLIGENCE ACTION © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Integration Architectural Patterns 1/24/2018 4:33 PM Data Integration Architectural Patterns © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Batch On-Prem Storage Orchestration, Movement, Scheduling, Monitoring DATA SOURCES INGEST PREPARE ANALYZE PUBLISH CONSUME Ex: SQL Server, Oracle, DB2, MySQL On-Prem Storage Ex: Azure Data Factory, Oozie Orchestration, Movement, Scheduling, Monitoring Ex: Custom Code, Azure Data Lake Analytics, Hadoop Transform, Combine, Clean, etc. Ex: Azure Data Lake Analytics, Hadoop, Azure Machine Learning, Mahout Data Aggregation, Data Science, etc. Batch Data Movement Ex: SQL Server, Oracle, DB2, MySQL SaaS Data Sources Ex: SSIS, Azure Data Factory, Sqoop Presentation, Dash boarding Ex: Azure Data Lake, Azure SQL DB, Azure SQL DW, HDFS, Cassandra, HBase Cloud Storage

Managing Customer Churn is Important Retail Returning customer Designing Loyalty programs Long checkout time Dis-satisfied with customer service, bad experience in retail store Telecommunication Mobile Number Portability – Customers switching to another mobile service provider, with number retention Handset/device choices Friends switching to another provider Banking and Finance Service Quality Attractiveness of banking rates

Customers Likely to Churn Customer Churn for Telco - Solution Pattern Data Set (Collection of files, DB table, etc) Pipeline: a logical group of activities Activity: a processing step (Hadoop job, custom code, ML model, etc) Data Sources Ingest Transform & Analyze Publish On Premises Data Mart Call Log Files Customer Table Cortana Azure Data Lake Call Log Files Customer Table Customer Call Details Customers Likely to Churn Transform, Combine, etc Analyze Move Azure SQL DW Customer Churn Table Visualize

Demo Customer Churn Anand Subbaraj Gaurav Malhotra

The need for trusted information Marketing campaign analysis Interactive Entertainment User and product profiling Interactive Entertainment/Retail Customer sentiment analysis Interactive Entertainment/Retail Personalized product recommendation Retail Customer shopping behavior analysis Retail Pricing optimization Retail Corrective and predictive maintenance and repairs Manufacturing (IOT) Operational telemetry and health reporting Online Services Actuarial modelling and reporting automation Financial Services Financial risk modelling and analysis Financial Services

Real Time Streaming Architecture Tech Ready 15 Real Time Streaming Architecture 1/24/2018 Ingestor (broker) Collection Presentation and action Event producers Transformation Event hubs Storage adapters Stream processing Cloud gateways (web APIs) Field gateways Applications Legacy IOT (custom protocols) Devices IP-capable devices (Windows/Linux) Low-power devices (RTOS) Search and query Data analytics (Excel) Web/thick client dashboards (PowerBI) Stream Analytics Devices to take action Real Time Visualization © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Real Time Analytics Use Cases Real-time fraud detection Connected car scenario Click-stream analysis Real-time financial portfolio alerts Smart grid CRM alerting sales with customer scenario Data and identity protection services Real-time financial sales tracking

Lambda Architecture DATA SOURCES INGEST PREPARE ANALYZE PUBLISH CONSUME Machine Learning Social Diagnostic streaming Devices Sensors Event hubs Cortana Web Stream Analytics Stream Analytics Data in motion Data at rest PowerBI HDInsight HDInsight Machine Learning Azure SQL Data Warehouse Business apps Azure DataLake Vehicle Catalog Data Factory: Move Data, Orchestrate, Schedule, and Monitor Data Catalog: Register, Annotate, Understand, Discover Data Sets

Connected car market The connected car market is growing 45% compound annual growth rate over 5 years 10x faster than overall car market 75% of cars shipped globally by 2020 will have necessary hardware to connect to the Internet Connected car technology is split between two approaches Put the Internet connection in the car (embedded connections) Does not require a phone data plan to operate Provides access to more features and data Rely on a secondary device Embedded connections win, because auto companies will be able to Collect data on the performance of cars Send updates and patches to cars remotely Avoid recalls related to the car's software

75% Connected car market 1/24/2018 4:33 PM Vehicle diagnostic Usage-based insurance Roadside assistance Fleet management Engine emission control Eco-driving Engine performance remapping 75% of the cars shipped globally by 2020 will be built with the necessary hardware to connect to the Internet © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo Connected Car Anand Subbaraj Gaurav Malhotra

Lambda Architecture DATA SOURCES INGEST PREPARE ANALYZE PUBLISH CONSUME Machine Learning Social Diagnostic streaming Devices Sensors Event hubs Cortana Web Stream Analytics Stream Analytics Data in motion Data at rest PowerBI HDInsight HDInsight Machine Learning Azure SQL Data Warehouse Business apps Azure DataLake/Azure Blob Storage Device Catalog Data Factory: Move Data, Orchestrate, Schedule, and Monitor Data Catalog: Register, Annotate, Understand, Discover Data Sets

https://gallery.cortanaintelligence.com/solutionTemplates Solutions: https://gallery.cortanaintelligence.com/solutionTemplates

Batch Pattern: Data Integration Scenarios 1/24/2018 4:33 PM Batch Pattern: Data Integration Scenarios © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Time-Series Tumbling Window Scenarios: Log Processing  Major Telco: update fact table from last hours call logs Telemetry/Operational Health Systems  Microsoft Intune: service health monitoring/reporting Customer Behavior/Profiling  Major EU Retailer: process daily weblogs & update product recommendation table Data Characteristics: Sliceable by time (in the data or data location), restatement by slice (including downstream) Modelled as an infinitely appending data set Process a slice or batch of slices, often not required to process the entire time-series Data flows from on activity to the next (aka. “data flow”), data readiness drives orchestration Daily Data Set 1 Daily Data Set 2 Run 1 Mon Activity Mon Success … Run 2 Tues Success Run 3 Tues Wed Failed Wed

Custom Flow aka. “Control Flow” Scenarios: Dynamic & conditional workflows (e.g. for each row in table X do Y).  E.g. Big Mattress Company dynamic ingest scenario. Mix resource mgmt. w/ data flow  E.g. custom archive/cleanup workflow after copy Data Replication [w/ ref integrity constraints)  copy a set of Dim & Fact tables from SQL to Blob The “escape hatch” of data integration platforms like Informatica, Talend, SSIS, etc. Data Characteristics: “un-opinionated about the data model”; model data flows parameterized by arbitrary params Precedence constraints driven by activity execution, not data Often used to model “delta flows” (change set moves between activities) My Pipeline 1 For Each… My Pipeline 2 Trigger Conditions: Event Wall Clock Data Time Success, params Success, params Activity 3 Activity 1 Activity 2 Activity 4 Error, params “On Error” Activity 1 …

Event Driven Flow Scenarios: Execute flow when event (file arrival) occurs  Major Manufacturing company: run parts workflow when OEM excel file arrives Data Characteristics: Flow is parameterized by the data item as well as the location (folder, filename,etc) Often requires low dispatch latency among activities Mix of data manipulation steps (run Hive) and resource mgmt. steps (file clean up, archive, etc) Trigger File arrived in blob storage On success, params params Activity 1 Activity Activity N On error, params Activity

Delta Flow “Change Data Capture” Scenarios: Ingestion optimization: ingest & process only changed items from a source  Major Manufacturing Company: Parts List ingestion Data Characteristics: Delta may or may not be able to be represented as a query over the source data set State (watermark) may need to be kept by the workflow between two delta copy executions Efficient delta calculation from relational systems is an art form today (e.g. see Attunity Replicate) Trigger Conditions: Event Wall Clock params Delta Copy Activity If changes exist, changeset, params If no changes Activity

Data Ingest: Referential Integrity Scenarios: Move all or part of a dimensional model (w/ ref integrity constraints on)  Major Car Manufacturer: (copy dim/facts) to blob for joining with engine log data Data Characteristics: Movement needs to be automatically ordered to honor constraints Many objects to copy at once (tedious to author a pipeline for each) May require schema/object migration (create table on dest side w/ existing views) params Trigger Conditions: Event Wall clock Copy …

Data Migration Move from storage system A to system B, decommission system A Scenarios: Move from Oracle to SQL Server Move from redshift to SQL DW Data Characteristics: Often a 6 step process: Compatibility check: can all the items in the source be represented in the target Move objects: tables, views, triggers, configurations, etc. 1-time bulk move data from source to destination Run a delta-flow from source to destination so the destination keeps up-to-date with the source Move apps, flows, etc. to target the destination system & stop the delta flows in lock step Decommission the source storage system

How do I get started? microsoft.com/cortanaanalytics Find or become a trained partner cortanaanalyticspartners.microsoft.com Try CA services today Microsoft.com/cortanaanalytics Links on “What’s Included” page Stay tuned to our blog http://blogs.technet.com/b/machinelearning/

1/24/2018 Questions © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

1/24/2018 4:33 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.