Download presentation
Presentation is loading. Please wait.
1
Microsoft Machine Learning & Data Science Summit
September 26 – 27 | Atlanta, GA
2
Patterns & Practices for Data Integration at Scale
Anand Subbaraj Principal Program Manager Gaurav Malhotra Senior Program Manager
3
Today’s topics Traditional Analytics Platforms and Evolving Approaches to Analytics Data Integration Architectural Patterns (Stream & Batch) Data Integration Use Cases
4
Evolving Approaches to Analytics
1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Apps © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
5
Evolving Approaches to Analytics
1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Ingest (EL) Devices Apps Sensors Original Data Web © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
Evolving Approaches to Analytics
1/24/2018 Evolving Approaches to Analytics Extract Transform Load EDW (SQL Svr, Teradata, etc.) OLTP … ETL Tool (SSIS, etc.) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Devices Ingest (EL) Scale-out Storage & Compute (DataLake Store, HDFS, Blob Storage, etc.) Apps Sensors Original Data Web Streaming data Transform & Load © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7
Cortana Intelligence Suite Transform data into intelligent action
Build 2015 1/24/2018 4:33 PM Cortana Intelligence Suite Transform data into intelligent action Information Management Azure Data Factory Data Catalog Event Hub Big Data Stores Azure Data Lake SQL Data Warehouse Machine Learning and Analytics Azure Machine Learning HDInsight (Hadoop) Stream Analytics Dashboards and Visualizations Power BI Business apps Custom apps Sensors and devices Personal Digital Assistant Cortana People Perceptual Intelligence Face, vision Speech, text Automated Systems Azure Data Lake Analytics Business Scenarios Recommendations, customer churn, forecasting, etc. DATA INTELLIGENCE ACTION © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8
Data Integration Architectural Patterns
1/24/2018 4:33 PM Data Integration Architectural Patterns © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Batch On-Prem Storage Orchestration, Movement, Scheduling, Monitoring
DATA SOURCES INGEST PREPARE ANALYZE PUBLISH CONSUME Ex: SQL Server, Oracle, DB2, MySQL On-Prem Storage Ex: Azure Data Factory, Oozie Orchestration, Movement, Scheduling, Monitoring Ex: Custom Code, Azure Data Lake Analytics, Hadoop Transform, Combine, Clean, etc. Ex: Azure Data Lake Analytics, Hadoop, Azure Machine Learning, Mahout Data Aggregation, Data Science, etc. Batch Data Movement Ex: SQL Server, Oracle, DB2, MySQL SaaS Data Sources Ex: SSIS, Azure Data Factory, Sqoop Presentation, Dash boarding Ex: Azure Data Lake, Azure SQL DB, Azure SQL DW, HDFS, Cassandra, HBase Cloud Storage
10
Managing Customer Churn is Important
Retail Returning customer Designing Loyalty programs Long checkout time Dis-satisfied with customer service, bad experience in retail store Telecommunication Mobile Number Portability – Customers switching to another mobile service provider, with number retention Handset/device choices Friends switching to another provider Banking and Finance Service Quality Attractiveness of banking rates
11
Customers Likely to Churn
Customer Churn for Telco - Solution Pattern Data Set (Collection of files, DB table, etc) Pipeline: a logical group of activities Activity: a processing step (Hadoop job, custom code, ML model, etc) Data Sources Ingest Transform & Analyze Publish On Premises Data Mart Call Log Files Customer Table Cortana Azure Data Lake Call Log Files Customer Table Customer Call Details Customers Likely to Churn Transform, Combine, etc Analyze Move Azure SQL DW Customer Churn Table Visualize
12
Demo Customer Churn Anand Subbaraj Gaurav Malhotra
13
The need for trusted information
Marketing campaign analysis Interactive Entertainment User and product profiling Interactive Entertainment/Retail Customer sentiment analysis Interactive Entertainment/Retail Personalized product recommendation Retail Customer shopping behavior analysis Retail Pricing optimization Retail Corrective and predictive maintenance and repairs Manufacturing (IOT) Operational telemetry and health reporting Online Services Actuarial modelling and reporting automation Financial Services Financial risk modelling and analysis Financial Services
14
Real Time Streaming Architecture
Tech Ready 15 Real Time Streaming Architecture 1/24/2018 Ingestor (broker) Collection Presentation and action Event producers Transformation Event hubs Storage adapters Stream processing Cloud gateways (web APIs) Field gateways Applications Legacy IOT (custom protocols) Devices IP-capable devices (Windows/Linux) Low-power devices (RTOS) Search and query Data analytics (Excel) Web/thick client dashboards (PowerBI) Stream Analytics Devices to take action Real Time Visualization © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
Real Time Analytics Use Cases
Real-time fraud detection Connected car scenario Click-stream analysis Real-time financial portfolio alerts Smart grid CRM alerting sales with customer scenario Data and identity protection services Real-time financial sales tracking
16
Lambda Architecture DATA SOURCES INGEST PREPARE ANALYZE PUBLISH
CONSUME Machine Learning Social Diagnostic streaming Devices Sensors Event hubs Cortana Web Stream Analytics Stream Analytics Data in motion Data at rest PowerBI HDInsight HDInsight Machine Learning Azure SQL Data Warehouse Business apps Azure DataLake Vehicle Catalog Data Factory: Move Data, Orchestrate, Schedule, and Monitor Data Catalog: Register, Annotate, Understand, Discover Data Sets
17
Connected car market The connected car market is growing
45% compound annual growth rate over 5 years 10x faster than overall car market 75% of cars shipped globally by 2020 will have necessary hardware to connect to the Internet Connected car technology is split between two approaches Put the Internet connection in the car (embedded connections) Does not require a phone data plan to operate Provides access to more features and data Rely on a secondary device Embedded connections win, because auto companies will be able to Collect data on the performance of cars Send updates and patches to cars remotely Avoid recalls related to the car's software
18
75% Connected car market 1/24/2018 4:33 PM
Vehicle diagnostic Usage-based insurance Roadside assistance Fleet management Engine emission control Eco-driving Engine performance remapping 75% of the cars shipped globally by 2020 will be built with the necessary hardware to connect to the Internet © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Demo Connected Car Anand Subbaraj Gaurav Malhotra
20
Lambda Architecture DATA SOURCES INGEST PREPARE ANALYZE PUBLISH
CONSUME Machine Learning Social Diagnostic streaming Devices Sensors Event hubs Cortana Web Stream Analytics Stream Analytics Data in motion Data at rest PowerBI HDInsight HDInsight Machine Learning Azure SQL Data Warehouse Business apps Azure DataLake/Azure Blob Storage Device Catalog Data Factory: Move Data, Orchestrate, Schedule, and Monitor Data Catalog: Register, Annotate, Understand, Discover Data Sets
21
https://gallery.cortanaintelligence.com/solutionTemplates
Solutions:
22
Batch Pattern: Data Integration Scenarios
1/24/2018 4:33 PM Batch Pattern: Data Integration Scenarios © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
Time-Series Tumbling Window
Scenarios: Log Processing Major Telco: update fact table from last hours call logs Telemetry/Operational Health Systems Microsoft Intune: service health monitoring/reporting Customer Behavior/Profiling Major EU Retailer: process daily weblogs & update product recommendation table Data Characteristics: Sliceable by time (in the data or data location), restatement by slice (including downstream) Modelled as an infinitely appending data set Process a slice or batch of slices, often not required to process the entire time-series Data flows from on activity to the next (aka. “data flow”), data readiness drives orchestration Daily Data Set 1 Daily Data Set 2 Run 1 Mon Activity Mon Success … Run 2 Tues Success Run 3 Tues Wed Failed Wed
24
Custom Flow aka. “Control Flow”
Scenarios: Dynamic & conditional workflows (e.g. for each row in table X do Y). E.g. Big Mattress Company dynamic ingest scenario. Mix resource mgmt. w/ data flow E.g. custom archive/cleanup workflow after copy Data Replication [w/ ref integrity constraints) copy a set of Dim & Fact tables from SQL to Blob The “escape hatch” of data integration platforms like Informatica, Talend, SSIS, etc. Data Characteristics: “un-opinionated about the data model”; model data flows parameterized by arbitrary params Precedence constraints driven by activity execution, not data Often used to model “delta flows” (change set moves between activities) My Pipeline 1 For Each… My Pipeline 2 Trigger Conditions: Event Wall Clock Data Time Success, params Success, params Activity 3 Activity 1 Activity 2 Activity 4 Error, params “On Error” Activity 1 …
25
Event Driven Flow Scenarios:
Execute flow when event (file arrival) occurs Major Manufacturing company: run parts workflow when OEM excel file arrives Data Characteristics: Flow is parameterized by the data item as well as the location (folder, filename,etc) Often requires low dispatch latency among activities Mix of data manipulation steps (run Hive) and resource mgmt. steps (file clean up, archive, etc) Trigger File arrived in blob storage On success, params params Activity 1 Activity Activity N On error, params Activity
26
Delta Flow “Change Data Capture”
Scenarios: Ingestion optimization: ingest & process only changed items from a source Major Manufacturing Company: Parts List ingestion Data Characteristics: Delta may or may not be able to be represented as a query over the source data set State (watermark) may need to be kept by the workflow between two delta copy executions Efficient delta calculation from relational systems is an art form today (e.g. see Attunity Replicate) Trigger Conditions: Event Wall Clock params Delta Copy Activity If changes exist, changeset, params If no changes Activity
27
Data Ingest: Referential Integrity
Scenarios: Move all or part of a dimensional model (w/ ref integrity constraints on) Major Car Manufacturer: (copy dim/facts) to blob for joining with engine log data Data Characteristics: Movement needs to be automatically ordered to honor constraints Many objects to copy at once (tedious to author a pipeline for each) May require schema/object migration (create table on dest side w/ existing views) params Trigger Conditions: Event Wall clock Copy …
28
Data Migration Move from storage system A to system B, decommission system A
Scenarios: Move from Oracle to SQL Server Move from redshift to SQL DW Data Characteristics: Often a 6 step process: Compatibility check: can all the items in the source be represented in the target Move objects: tables, views, triggers, configurations, etc. 1-time bulk move data from source to destination Run a delta-flow from source to destination so the destination keeps up-to-date with the source Move apps, flows, etc. to target the destination system & stop the delta flows in lock step Decommission the source storage system
29
How do I get started? microsoft.com/cortanaanalytics
Find or become a trained partner cortanaanalyticspartners.microsoft.com Try CA services today Microsoft.com/cortanaanalytics Links on “What’s Included” page Stay tuned to our blog
30
1/24/2018 Questions © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
31
1/24/2018 4:33 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.