Microsoft Ignite NZ 25-28 October 2016 SKYCITY, Auckland 9/17/2018 10:57 PM Microsoft Ignite NZ 25-28 October 2016 SKYCITY, Auckland © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Solving the “big legacy data” problem Microsoft Ignite 2016 9/17/2018 10:57 PM Solving the “big legacy data” problem M343 Shweta Gupta © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Objective Understanding the big legacy data problem, its impact & some ways to resolve it
Identifying the big legacy data problem Microsoft Ignite 2016 9/17/2018 10:57 PM Identifying the big legacy data problem © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
What is big data Microsoft Ignite 2016 9/17/2018 10:57 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Evolution of Big Data: the traditional Data Warehouse 9/17/2018 Evolution of Big Data: the traditional Data Warehouse BI and analytics Dashboards Reporting … Data warehouses as they exist today, in most enterprises & product companies. Data warehouse ETL Data sources OLTP ERP CRM LOB © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
What has changed? BI and analytics Real-time data 2 Data warehouse ETL 9/17/2018 What has changed? BI and analytics Dashboards Reporting Real-time data 2 Data warehouse ETL Increasing data volumes 1 New data sources & types 3 Cloud-born data 4 Data sources OLTP ERP CRM LOB Non-Relational Data Devices Web Sensors Social © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big data EDW Extract Transform Load ETL Tool BI Tools Data Marts 9/17/2018 Big data Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Apps © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big data EDW Extract Transform Load ETL Tool BI Tools Data Marts 9/17/2018 Big data Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Devices Ingest (EL) Apps Sensors Original Data Web © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Scale-out Storage & Compute 9/17/2018 Big data problem Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Devices Ingest (EL) Scale-out Storage & Compute (HDFS, Blob Storage, etc) Apps Sensors Original Data Web Streaming data Transform & Load © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Scale-out Storage & Compute 9/17/2018 Big data problem Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP … ETL Tool (SSIS, etc) Transformed Data BI Tools Original Data ERP LOB Data Marts Data Lake(s) Dashboards Social Devices Ingest (EL) Scale-out Storage & Compute (HDFS, Blob Storage, etc) Apps Sensors Original Data Web Streaming data Transform & Load © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big legacy data problem Microsoft Ignite 2016 9/17/2018 10:57 PM Big data problem Big legacy data problem OLTP ERP … Extract Transform Load ERP LOB ETL Tool Transformed Data Original Data EDW (SQL Svr, Teradata, etc) OLTP LOB … ETL Tool LOB OLTP ERP BI Tools Data Marts Data Lake(s) Dashboards Ingest (EL) Social Original Data Devices Scale-out Storage & Compute (HDFS, Blob Storage, etc) Apps Sensors Web Streaming data Transform & Load © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big Legacy Data Problem Your data ends up being: Repetitive with different interpretations Inconsistent Incomplete & In multiple formats
Microsoft Ignite 2016 9/17/2018 10:57 PM Data drift It’s the changes in data, schema, software over time that introduces inconsistency, inaccuracy and incompleteness in the data. Structural Schema changes, such as addition of new fields, deletion of older ones Semantic Meaning of same data is different in different versions/systems. For ex: a/c numbers Infrastructure Different underlying software systems, representing same data in multiple formats, structures & frequency updates Source: http://www.cmswire.com/big-data/big-datas-hidden-scourge-data-drift/ © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Legacy/stringent ETL Systems Microsoft Ignite 2016 9/17/2018 10:57 PM Legacy/stringent ETL Systems Each system relies strongly on an input and output data schema, so each time a new data repository/system was added, each of them dealt with data differently, resulting in: Varied inputs Each system could have its own input, stream, files, RDBMS (Oracle, SQL, MySQL) Varied outputs Output into different systems, data repositories Varied structures CSV, XML, Binary, JSON, Unstructured sets Varied formats Different field/key/tag names for the same data Source: http://www.cmswire.com/big-data/big-datas-hidden-scourge-data-drift/ © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Addressing the problem Microsoft Ignite 2016 9/17/2018 10:57 PM Addressing the problem © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Embracing cloud It brings in: Scale Security Cost optimizations Microsoft Ignite 2016 9/17/2018 10:57 PM Embracing cloud It brings in: Scale Security Cost optimizations © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
There is no one right path! Microsoft Ignite 2016 9/17/2018 10:57 PM There is no one right path! There is no one right path that will fit all. Every scenario is different and will need to be addressed differently. What one can do is, identify the issues & fix it with the right tools. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Azure Data Factory
ADF: Azure Data Factory Microsoft Ignite 2016 9/17/2018 10:57 PM ADF: Azure Data Factory © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Data sources & sinks – Not exhaustive Microsoft Ignite 2016 9/17/2018 10:57 PM Data sources & sinks – Not exhaustive © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Copy Activity From/to-cloud From/to – on premise
Transformation Not limited to these… we will see how.
ADF: Hybrid support Microsoft Ignite 2016 9/17/2018 10:57 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Streamsets Microsoft Ignite 2016 9/17/2018 10:57 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Ignite 2016 Streamsets 9/17/2018 10:57 PM Source: http://streamsets.com/wp-content/uploads/2016/10/StreamSets-Data-Collector-Product-Brief.pdf © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Streamsets – Data sources Microsoft Ignite 2016 9/17/2018 10:57 PM Streamsets – Data sources © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Streamsets – Data processing Microsoft Ignite 2016 9/17/2018 10:57 PM Streamsets – Data processing © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Case study Microsoft Ignite 2016 9/17/2018 10:57 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Case Study : They have 350 such different file types Microsoft Ignite 2016 9/17/2018 10:57 PM Case Study : Medical equipment manufacturer Each equipment had its own division, their own formats Each equipment outputs monitoring data, some common parameters and some equipment specific XML, CSV, Tab separated files The tag names, headers etc all differed from each other They have 350 such different file types © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Monitoring & Servicing System Microsoft Ignite 2016 9/17/2018 10:57 PM File parsers store data in Oracle SQL Database Triggers check for anomalies /errors in the data Monitoring & Servicing System 350 parsers Schedule processor checks for any alerts Java process picks up the message and passes the file to the specific parser Their service request system API is invoked to create service tickets Java scheduler monitors & inserts metadata of file in queue for processing Each facility has a gateway server that each medical instrument sends data to. Files are uploaded using secured FTP Service personnel attend to the instrument Gateway server collates data files and zips into one file every 10 minutes © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Challenges Cannot scale Adding a instrument type would take 3-6 months Microsoft Ignite 2016 Challenges 9/17/2018 10:57 PM Cannot scale The system currently monitors 10,000 devices and they want to offer this service to more than 100,000 devices. Adding a instrument type would take 3-6 months Each device type having its own file format and structure, addition needed additional parsers, triggers to be built. Expensive Infrastructure costs of running the systems is enormous. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Case Study Solution
Pain points Microsoft Ignite 2016 9/17/2018 10:57 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
The first problem… multiple file parsers
Transform & not Parse Preserves names, hierarchies, values Microsoft Ignite 2016 9/17/2018 10:57 PM Transform & not Parse <Body> <Data Type="Data"> <SpecimenID NoRead="FALSE" IsShortTubeSpecimen="FALSE"> <AccessionNumber IDInfo="1512700030" /> <CarrierID IDInfo="00021" NoReadFlagInfo="FALSE" /> </SpecimenID> <RequestType value="Patient" /> <SpecimenType value="WholeBlood" /> <ActivityType IDInfo="Activity.CND" /> <FormattedResultPreference> <UnitSet Name="US-1" /> </FormattedResultPreference> <FormattedResultList> <FormattedResult Type="Parameter.AlgorithmBuildLabel" Value="ALGORITHMS_BARRACUDA-INT-V1R0-1598" /> <FormattedResult Type="Parameter.AlgorithmName" Value="CbcNrbcDiff" /> [ "TriggerTimeStamp": "5/9/2015 6:17:21 PM", "Body": {"Data": { "Type": "Data","SpecimenID": {“NoRead": "FALSE", "IsShortTubeSpecimen": "FALSE", "AccessionNumber": { "IDInfo": "1512700030", "value1": ""},"CarrierID": { "IDInfo": "00021", "NoReadFlagInfo": "FALSE", "value1": "" }}, "RequestType": { "value": "Patient", "value1": "" },"SpecimenType": { "value": "WholeBlood",”value1": ""}, "ActivityType": { "IDInfo": "Activity.CND", "value1": "" } "FormattedResultPreference": "","FormattedResultList": {"FormattedResult": [ { "Type": "Parameter.AlgorithmBuildLabel", "Value": "ALGORITHMS_BARRACUDA-INT-V1R0-1598", "value1": "" }, { "Type": "Parameter.AlgorithmName", "Value": "CbcNrbcDiff", "value1": "" }, Transform to JSON Preserves names, hierarchies, values Adds metadata like instrument type, ID that is in SOAP message header © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Ignite 2016 9/17/2018 10:57 PM Solution Webjob unzips the files extracts metadata & stores it in Azure blob storage Azure batch activity that transforms data & stores in JSON Data copy activity copes data to on- premise system Upload files to the gateway server. Stream analytics run queries on data uploaded Pushes records that need attention to blob storage © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Solution Invoke service request API using Service bus relay Microsoft Ignite 2016 9/17/2018 10:57 PM Solution Invoke service request API using Service bus relay ADF data copy or Event hub streaming © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Case Study Solution – Demo
To conclude – Systems that could solve the legacy data problem Microsoft Ignite 2016 9/17/2018 10:57 PM To conclude – Systems that could solve the legacy data problem Minimal or no dependence on schema of the data Ability to work with multiple data sources/technologies Ability to work with multiple data formats Support multiple processing options Support multiple locations, on-premises/cloud Monitoring & alerts on bad data Scale to process large scale data sizes Data-warehousing capability/support © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9/17/2018 10:57 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.