Big Data and HADOOP.

Slides:



Advertisements
Similar presentations
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
Advertisements

BIG DATA – WHAT’S THE BIG DEAL The call would start soon, please be on mute. Thanks for your time and patience.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
SQL SERVER 2012 FOR THE NEW WORLD OF DATA Doug Leland General Manager SQL Server Marketing.
Business Insights Play briefing deck.
IT Operations Management
Connected Infrastructure
Data Platform and Analytics Foundational Training
Big Data 101 Seriously, it is just 101
Data Platform and Analytics Foundational Training
Data Platform Modernization
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connected Living Connected Living What to look for Architecture
5/13/2018 1:53 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Presenter Date | Location
Data Platform and Analytics Foundational Training
Examine information management in Cortana Intelligence
Predicting Azure Consumption using Ensemble Learning
System Center Marketing
Cortana Intelligence Overview
S4 Solution Specialist Sales Summit
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Orchestrating Data and Services with Azure Data Factory
Microsoft Azure: The only consistent Hybrid Cloud
New England SQL Server Big Data 101 Paresh Motiwala SPONSORED BY.
Modern application lifecycle with DevOps
Enable the Hybrid Data Platform
Data Platform and Analytics Foundational Training
IoT at the Edge Technical guidance deck.
7/4/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Excel and Power BI Better Together Democratization of data
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Customer-facing apps Transactional apps
Data Platform and Analytics Foundational Training
IT Operations Management
Remote Monitoring solution
Create and publish reports with Power BI for desktop
Add intelligence to Dynamics AX with Cortana Intelligence suite
Microsoft Ignite NZ October 2016 SKYCITY, Auckland
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Melbourne Azure Meetup
Turning back time … … to 1998.
IoT at the Edge Technical guidance deck.
Azure Data Catalog Adoption Patterns and Best Practices
Power Apps & Flow for Microsoft Dynamics SL
Overview of Azure Data Lake Store
Business Intelligence for Project Server/Online
Dive into Predictive Maintenance using Cortana Intelligence Suite
Data Platform Modernization
Microsoft Ignite /22/2018 3:58 PM BRK2254
Matt Masson Software Development Engineer Microsoft Corporation
Web Site Web App Web API microsoftazurepass.com
The Internet of Things (IoT) from the back-end perspective
Michael Tejedor | Sr. Product Marketing Manager
Context about the Data Warehouse
Virtual Reality with Azure and Unity
2010 Microsoft BI Conference
04 | Performance and the Premium SKU
*AZs available across US, Europe and Asia
Sessions about to start – Get your rig on!
Microsoft Virtual Academy
Data Wrangling for ETL enthusiasts
Customer 360.
Big Data Clusters SQL Server 2019 Meets Big Data
Architecture of modern data warehouse
Presentation transcript:

Big Data and HADOOP

BIG DATA AND HADOOP

BIG DATA AND HADOOP Paresh Motiwala, PMP ® 781 254 4096 DBA Manager at Nuance Communications pareshmotiwala@gmail.com http://www.linkedin.com/in/pareshmotiwala Twitter: @pareshmotiwala www.circlesofgrowth.com Chapter Leader: PASS DBA VC, NESQL, Boston_BI, PASSDBA VC, PASS PD Co-organizer: Providence SQL Saturday, Global Azure Bootcamp 781 254 4096

BIG DATA AND HADOOP Who should attend DBAs CIO Marketing peeps Developers Big Data Enthusiasts Who should not attend

BIG DATA AND HADOOP Let’s grab a byte Brontobyte

BIG DATA AND HADOOP

BIG DATA AND HADOOP Misc info on Big Data Sources- Bread Crumbs Definition Privacy concerns Data Lake Storing- Hadoop Processing – MapReduce Presentation Data Science and Scientists Few Hadoop stacks Summary

BIG DATA AND HADOOP So why should I care about this? Data is the new Electricity (Satya Nadella, Spring 2016) https://www.microsoft.com/en-us/sql-server/data-driven Companies Generate data, Distribute, Meter, and Use it Where is data stored? Current: SQL Server, Oracle, Teradata, DB2, Netezza, Open Source Databases; Casandra, MySQL, MongoDB Unstructured: Hadoop, Spark, Data Lakes What type of data is stored? Traditional: Rows and Columns Big Data Explosion: Images, streaming data, internet-connected devices (IoT), Machine data BIG DATA AND HADOOP Source: Microsoft

Big Data: driving transformative changes Traditional Big Data Relational data with highly modeled schema All data with schema agility Data characteristics Costs Specialized HW Commodity HW Culture Operational reporting Focus on rear-view analysis Experimentation leading to intelligent action With machine learning, graph, a/b testing Source: Microsoft

Big Data: Decision making Today’s Big Data Rearview Mirror Forward Looking Effect < 10% Any and All Data Used Batch, Incomplete and Disjointed Real-time, Correlated, Governed Quality Purpose Business Monitoring Business Optimization Source: The Big Data by Schmarzo

BIG DATA AND HADOOP Sources – Bread Crumbs Cell Phones Social Media Credit Cards GPSs IoT Wearables

BIG DATA AND HADOOP

BIG DATA AND HADOOP Value

BIG DATA AND HADOOP Desired Properties: Robustness- Fault Tolerance Low Latency Scalability Generalization Extensibility Ad hoc Queries Minimal Maintenance Debuggability

WAS CREATED IN PAST 2 YEARS BIG DATA AND HADOOP Flow Collection Pre-processing Hygiene Intervention Visualization Analysis OVER 90% OF TODAY’S DATA WAS CREATED IN PAST 2 YEARS

BIG DATA AND HADOOP 5 Rs of Data Quality Relevancy Recency Range Robustness Reliability

BIG DATA AND HADOOP Privacy of Data If I collect the data, is it mine? Ownership Vs Rights Share Answers not Data Let them know Why you are collecting What you are collecting

BIG DATA AND HADOOP FIPP- Fair Information Privacy Principles Individual Control Transparency Respect for Context Security Access and Accuracy Focused Collection FERPA- Family Education Rights and Privacy Act

BIG DATA AND HADOOP What is a data lake? ---Courtesy : James serra the Parallel Data Warehouse Appliance 11/11/2018 BIG DATA AND HADOOP What is a data lake? ---Courtesy : James serra A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. A place to store unlimited amounts of data in any format inexpensively, especially for archive purposes Allows collection of data that you may or may not use later: “just in case” A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW Frees up expensive EDW resources (storage and processing), especially for data refinement Allows for data exploration to be performed without waiting for the EDW team to model and load the data (quick user access) Some processing in better done with Hadoop tools than ETL tools like SSIS Easily scalable Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera) http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ http://www.jamesserra.com/archive/2014/12/the-modern-data-warehouse/ http://adtmag.com/articles/2014/07/28/gartner-warns-on-data-lakes.aspx http://intellyx.com/2015/01/30/make-sure-your-data-lake-is-both-just-in-case-and-just-in-time/ http://www.blue-granite.com/blog/bid/402596/Top-Five-Differences-between-Data-Lakes-and-Data-Warehouses http://www.martinsights.com/?p=1088 http://data-informed.com/hadoop-vs-data-warehouse-comparing-apples-oranges/ http://www.martinsights.com/?p=1082 http://www.martinsights.com/?p=1094 http://www.martinsights.com/?p=1102 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

BIG DATA AND HADOOP The “data lake” Uses A Bottoms-Up Approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Devices Social Batch queries Devices LOB apps Video Interactive queries Social LOB applications Real-time analytics Sensors Web Sensors Video Relational Machine Learning Web Clickstream Data warehouse Relational Clickstream Data Lake quickly turns into a data swamp if you don’t invest in data quality Courtesy : James Serra

BIG DATA AND HADOOP Doug Cutting and Mike Cafarella In 2005

BIG DATA AND HADOOP Benefits of hadoop

BIG DATA AND HADOOP

BIG DATA AND HADOOP Data Lake Big Data

BIG DATA AND HADOOP

BIG DATA AND HADOOP MapReduce Map –Sends Queries Reduce – Collects Results Job Tracker Task Tracker YARN

BIG DATA AND HADOOP

Base Architecture : Big Data Advanced Analytics Pipeline 11/11/2018 1:56 PM Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) OnPrem Data Azure Services Near Realtime Data Analytics Pipeline using Azure Steam Analytics Machine Learning (Anomaly Detection) Data Stream Telemetry Event Hub Stream Analytics (real-time analytics) Live / real-time data stats, Anomalies and aggregates PowerBI dashboard Data in Motion Data at Rest Interactive Analytics and Predictive Pipeline using Azure Data Factory Realtime Readings and Operational Data HDI Custom ETL Aggregate /Partition Machine Learning Local DB Sensor Readings Local DB Logs Customer MIS dashboard of predictions / alerts (Replaced by Azure SQL) Legacy Azure Storage Blob Azure SQL (Predictions) Historic Laser Data (1 time drop) Fault and Maintenance Data (1 time drop) Scheduled hourly transfer using Azure Data Factory Big Data Analytics Pipeline using Azure Data Lake Sensor Readings Device Health dashboard of operational stats Azure Data Lake Storage Azure Data Lake Analytics (Big Data Processing) Azure SQL Operational Logs © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Vision for Big Data and Data Warehousing Bing SMB Advertisers – Search Ads 11/11/2018 BIG DATA AND HADOOP Data Warehouse “Big Data” Microsoft Azure Microsoft Azure Vision for Big Data and Data Warehousing Cloud VMs HADOOP Data Lake Devices Relational Sensors Video LOB applications Web Social Clickstream VMs SQL DW Azure Data Factory + Federated Query Microsoft SQL Server On-Premises APS SQL Server HDP APS

BIG DATA AND HADOOP Presentation R Python Power BI Power BI Desktop

BIG DATA AND HADOOP Data Science and Scientist

BIG DATA AND HADOOP

BIG DATA AND HADOOP

Big Data101

BIG DATA AND HADOOP Summary: Misc info on Big Data Sources Definition Privacy concerns Data Lake Storing- Hadoop Processing – MapReduce Presentation Data Science and Scientists Few Hadoop stacks

BIG DATA AND HADOOP- Conclusion SQL Server is the best Relational Database The world is much bigger than any one relational database What is your company’s data strategy? What is your company’s cloud strategy? Learn adjacent technologies that will make you valuable. Power BI? Hadoop? NoSQL?

Someday Big Data will just become data Thank You BIG DATA AND HADOOP Someday Big Data will just become data Thank You

Paresh Motiwala, PMP ® pareshmotiwala@gmail.com http://www.linkedin.com/in/pareshmotiwala @pareshmotiwala www.circlesofgrowth.com 781 254 4096

BIG DATA AND HADOOP http://www.datasciencecentral.com/ BIBLIOGRAPHY – http://www.datasciencecentral.com/ https://www.youtube.com/playlist?list=PLt- 0mOCwxJ6B_OxTlpevxJNAa7GfCLd3l https://www.dezyre.com/article/hadoop- components-and-architecture-big-data-and- hadoop-training/114 MIT Big Data Analytics Course Data Lake presentation by James Serra Future of Data…..(or something like that) by George Walters Big Data Analytics with Microsoft HADOOP in 24 Hours