Microsoft Big Data Essentials Module 1 - Introduction to Big Data

Slides:



Advertisements
Similar presentations
HDFS & MapReduce Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer.
Advertisements

James Serra – Data Warehouse/BI/MDM Architect
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Microsoft SQL Server x 46% 900+ For Hosting Service Providers
Big Data Analytics Module 2 – Data Visualizations with Power View and Power Map Saptak Sen, Microsoft Bill Ramos, Advaiya.
Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.
Server & Tools Business
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Windows Azure SQL Database and Storage Name Title Organization.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Module 1: Introduction to Microsoft SQL Server 7.0.
Microsoft Azure Introduction ISYS 512. Microsoft Azure Microsoft Azure is a cloud.
SQL Server 2014: The Data Platform for the Cloud.
Windows Azure featureISO 27001SSAE 16 SOC 1 Type 2 EU Model Clauses HIPAA BAA Web Sites Virtual Machines Cloud Services Storage (Tables,
Server Files Server RUNTIME Code.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
An Introduction to HDInsight June 27 th,
MULTIMEDIA DATABASES -Define data -Define databases.
Fitting Microsoft Hadoop Into Your Enterprise BI Strategy Cindy Gross | SQLCAT PM
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Server & Tools Business
Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Jeff Einig, CPA Technical Solutions Professional.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Microsoft Power Query: an Excel Users Dream for Data Extraction and Cleansing Presented by: Belinda Allen Smith & Allen Consulting, Inc.
Microsoft Power BI Stack
Microsoft Power Query 101 Belinda Allen Smith & Allen Consulting, Inc.
Agenda Integration points between Excel and Power BI How can I decide between the two technologies Do I need to chose? Q&A.
Microsoft Ignite /28/2017 6:07 PM
Develop for the Cloud - Windows Azure Microsoft gi Pranav Rastogi.
UBAX09 Exploring the Possibilites with PowerPivot and Microsoft Dynamics AX Cristian Nicola, Yvonne Haarloev,
Internal Modern Data Platform Somnath Data Platform Architect.
Windows Azure™ Marketplace datasets Ad-hoc reporting Statutory and financial reports Charting and infographics Production reports Excel with Microsoft.
SQL Server Analysis Services Fundamentals
Connected Infrastructure
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Convergence /6/2018 © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Microsoft Machine Learning & Data Science Summit
Connected Infrastructure
Windows Azure Migrating SQL Server Workloads
Modern data warehouse: HDInsight
Azure Machine Learning & ML Studio
HDInsight makes Hadoop Easy
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
07 | Analyzing Big Data with Excel
SQL Server Analysis Services Fundamentals
SQL Server Analysis Services Fundamentals
Server & Tools Business
Microsoft Connect /22/2018 9:50 PM
Microsoft Connect /24/ :05 AM
Power BI for large databases
Azure Data Lake for First Time Swimmers
Cluster Computing Donald E. Knuth, Literate Programming, 1984
MS AZURE By Sauras Pandey.
SQL Server 2019: What’s new? Eugene Meidinger
HDInsight & Power BI By Łukasz Gołębiewski.
Big-Data Analytics with Azure HDInsight
Server & Tools Business
Server & Tools Business
Sql Server 2019: what’s new?.
Server & Tools Business
SQL Server 2019 Bringing Apache Spark to SQL Server
Presentation transcript:

Microsoft Big Data Essentials Module 1 - Introduction to Big Data Server & Tools Business 4/19/2017 Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya My name is Saptak Sen and welcome this introduction session for the Microsoft Big Data Boot Camp. This session sets the stage for the three days of training. Each session follows a similar format where I’ll introduce the topic and then provide a set of demonstrations on how the technology works. Let’s get started. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Server & Tools Business 4/19/2017 Agenda Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service In this introduction session, I’m going to first give you a broad overview of the Microsoft Cloud OS data platform story and walk through the three pillars for the upcoming SQL Server 2014 release along with the new features that relate to the Big Data story. Next, I’ll introduce the Lambda Architecture. This is community driven architecture that helps provide a framework for how various Big Data components work together for specific scenarios. I’ll also show how the various Microsoft Big Data platform components like HDInsight fit into the Lambda Architecture. I’ll next go over the Windows Azure’s high level architecture and components and then give an overview of the Table and Blog storage components that relate to Big Data solutions. At then end, I’ll demo how to create a Windows Azure storage account and HDInsight cluster. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Business Imperative 1. 2. 3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve

CAP Theorem Consistency C Partition Tolerance P Availability A

Server & Tools Business 4/19/2017 Big Data Lambda Architecture Let’s now look at the Bid Data Lambda Architecture © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Big Data Lambda Architecture Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer Batch Layer Speed Layer Talk Track: In order to make sense of how various Big Data technologies fit together, the Open Source community has developed what is know as the Big Data Lambda Architecture. The “lambda architecture” provides an architectural model that scales and which has both the advantages of long-term batch processing and the freshness of a real-time system, with data updated in seconds time. The lambda architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, the speed layer, and the serving layer. Let’s take a look at each of the three layers. The Batch Layer stores the Master Dataset for you solution – typically in append mode – that is handles new data coming in. The Batch layer is usually: Read only database. No random writes required. It is Horizontal scalable with unrestrained computation and High Latency. Speed Layer - Stream processing and Continuous computation. It provides fast incremental algorithms. Batch layer eventually overrides speed layer. All the complexity is isolated in the Speed layer. If anything goes wrong, it’s auto-corrected. The views are stored in Read & Write database. • MS SQL Server • Column Store • Cassandra • … • Much more complex than a read only view. Service Layer: The service layer provides the merged outcome of data streams coming from the Batch layer and the speed .This layer queries the Batch & Real Time views and merges it. PolyBase is a great fit. Key Points: Lambda Architecture with three layer The Batch Layer -Stores Master Dataset The Speed layer –Stream Processing for real time view The Service Layer-merged outcome of data streams coming from the Batch layer and the speed layer References: Big Data Lambda Architecture: http://www.databasetube.com/database/big-data-lambda-architecture/ Speaker notes from: http://sqlug.be/files/sqlug/bd/MicrosoftBigDataSQLUG.pdf Serving Layer

The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency Batch views Master dataset Incoming data streams Talk Track: The portion of the lambda architecture that precomputes the batch views is called the batch layer. The batch layer stores the master copy of the dataset and precomputes batch views on that master dataset. The master dataset can be thought of us a very large list of records. The batch layer needs to be able to do two things to do its job: store an immutable, constantly growing master dataset, and compute arbitrary functions on that dataset. The key word here is arbitrary. If you’re going to precompute views on a dataset, you need to be able to do so for any view and any dataset.  The nice thing about the batch layer is that it’s simple to use. Batch computations are written like single-threaded programs yet automatically parallelize across a cluster of machines. This implicit parallelization makes batch layer computations scale to datasets of any size. It’s easy to write robust, highly scalable computations on the batch layer. The batch view enables you to get the values you need from it very quickly because it’s indexed. Think of technologies like Hadoop and Pig/Hive for use on the Batch layer. Data warehouse database technologies can also be associated with the Batch layer. Key Points: The Batch Layer -Stores Master Dataset and precomputes batch views on that master dataset Store an immutable, constantly growing master dataset, and compute arbitrary functions on that dataset Read only database. No random writes required. References: Big Data Lambda Architecture: http://www.databasetube.com/database/big-data- lambda-architecture/

The Speed Layer Stream processing of data Stores a limited window of data Dynamic computation Real-time views Incoming data streams Talk Track: You can think of the speed layer as similar to the batch layer in that it produces views based on data it receives. There are some key differences, though. One big difference is that, in order to achieve the fastest latencies possible, the speed layer doesn’t look at all the new data at once. Instead, it updates the real-time view as it receives new data instead of recomputing them like the batch layer does. The speed layer requires typically requires databases that support random reads and random writes. Because these databases support random writes, they are more complex than the databases you use in the serving layer, both in terms of implementation and operation. Most of the application complexity tends to be isolated in the Speed layer. Technologies typically considered for the speed layer include in-memory transaction databases and complex event processing engines. Key Points: Stream processing. Continuous computation Transactional. Storing a limited window of data. Compensating for the last few hours of data. All the complexity is isolated in the Speed layer. If anything goes wrong, it’s auto-corrected. Some algorithms are hard to implement in real time References: Big Data Lambda Architecture: http://www.databasetube.com/database/big-data- lambda-architecture/ Process stream Increment views Real-time increments

The Serving Layer Queries the batch and real-time views Merges the results Batch views Output Querying and merging Talk Track: Finally, the serving layer indexes the batch view and loads it up so it can be efficiently queried to get particular values out of the view. The serving layer is typically considered as a specialized distributed database that loads in batch views, makes them able to be queried, and continuously swaps in new versions of a batch view as they’re computed by the batch layer. A serving layer database only requires batch updates and random reads. Most notably, it does not need to support random writes. The serving layer job is to queries the Batch & Real Time views and merges it. Typically the technologies associated with the serving layer include on-line analytic processing databases like Analysis Services and PowerPivot. It can also be considered as the “last mile” technology for producing usable results for your solutions. Key Points: Service Layer queries the Batch & Real Time views and merges it References: Big Data Lambda Architecture: http://www.databasetube.com/database/big-data- lambda-architecture/ Real-time views

Microsoft Lambda Architecture Support Server & Tools Business 4/19/2017 Microsoft Lambda Architecture Support Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Federations in Windows Azure SQL Database Azure tables Memcached/MongoDB SQL Server database engine SQL Server VM: Columnstore indexes Analysis Services StreamInsight Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Talk Track: The Microsoft’s Data Platform stack fully supports each of the layers in the Big Data Lambda Architecture. For the batch layer, Microsoft provides multiple options for the storage and processing of batch oriented data. These include Windows Azure HDInsight and Azure Blob Storage to hold the input data. The SQL Server data warehousing capabilities can also be associated with the batch layer. For processing the data and view management, Microsoft supports processing of Hadoop data through MapReduce jobs along with Hive, Pig, and Oozie. For data warehousing, you can use traditional SQL views and stored procedures. For the speed layer, Microsoft supports real-time processing of data through technologies like Federations in Windows Azure SQL Database, Azure Tables, Memcached/MongoDB, SQL Server database engine and SQL Server VM along with Columnstore Indexes, Analysis Services, StreamInsight. Finally, with the serving layer, which provides the merged outcome of data streams coming from the Batch layer and the speed layer, you can use tools like PowerPivot, Power View, Power Query, Power Map, Reporting Services, LINQ to Hive and Analysis Services technologies. Key Points: Microsoft provides a complete BI solution, which can be entirely aligned with all the three layers of the Lambda Architecture. References: Big Data Lambda Architecture: http://www.databasetube.com/database/big-data- lambda-architecture/ © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Server & Tools Business 4/19/2017 Yahoo! Batch Layer Speed Layer Serving Layer Apache Hadoop Staging Database SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot Other BI Tools and Custom Applications Hadoop Data SQL Server Connector (Hadoop Hive ODBC) Talk Track: Using SQL Server 2008 R2, Yahoo! enhanced its Targeting, Analytics and Optimization (TAO) infrastructure (a powerful, scalable advertising analytics tool), which now takes data from a Hadoop cluster into a third-party database, where it is loaded into a SQL Server 2008 R2 Analysis Services cube. The cube then connects to client applications such as Tableau Desktop business analytics software and in-house custom applications. Employees use this software to create interactive data dashboards and perform ad hoc analysis. Microsoft has developed the SQL Server Connector for Apache Hadoop, which is designed to facilitate efficient data transfer between Hadoop and SQL Server 2008 R2. Key Points: With Big Data technology, Yahoo experienced the following benefits: Improved ad campaign effectiveness and increased advertiser spending. Cube producing 24 terabytes of data quarterly, making it the world’s largest SQL Server Analysis Services cube. Ability to handle more than 3.5 billion daily ad impressions, with hourly refresh rates. References: Microsoft case study: Yahoo! Improves Campaign Effectiveness, Boosts Ad Revenue with Big Data Solution: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707 Third Party Database SQL Server Analysis Services (SSAS Cube) + Custom Applications Microsoft Excel & PowerPivot for Excel © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Ferranti Computer Systems Server & Tools Business 4/19/2017 Ferranti Computer Systems Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Reactive Extensions (Rx) SQL Server Database (In- Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Talk Track: Ferranti and Microsoft designed a solution that uses Windows Azure HDInsight Service and nonrelational technologies to perform fast searches on business data and provide the information to the business processes in MECOMS™ (a business support system for the energy and utility industry) and Microsoft Dynamics AX. Searches of the memory-optimized tables are distributed between groups of computers, called clusters, which are managed by HDInsight. In-Memory OLTP makes access to SQL Server databases dramatically faster by optimizing queries and procedures, and moving heavily used tables into application memory—referred to as memory- optimized tables. Reactive Extension (Rx) was implemented to verify and process the incoming raw data, and then to send the aggregated data to SQL Server for quick storage in memory-optimized tables.  SQL Server analyzes the aggregated data, and sends the results of the analysis to Microsoft Dynamics AX for demand-side business processes such as scheduling service calls, terminating service, and invoicing. HDInsight also offers full compatibility with Microsoft business intelligence technology such as SQL Server 2012 Analysis Services and SQL Server 2012 Reporting Services. Key Points: With Big Data technology, Ferranti experienced the following benefits: Increased Sustained Database Write Speed to 200 Million Rows in 15 Minutes Discovered ways to access and analyze more of the data generated by the smart meters, providing new business opportunities References: Microsoft Case Studies: Ferranti Computer Systems - Utilities ISV Scales to Meet Customer Needs for Storage and Analysis of Big Data http://www.microsoft.com/casestudies/Microsoft-SQL-Server- 2012/Ferranti-Computer-Systems/Utilities-ISV-Scales-to-Meet-Customer-Needs-for-Storage-and- Analysis-of-Big-Data/710000003000 Reactive Extensions (Rx) Data Feed from Smart Meters Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Server & Tools Business 4/19/2017 Windows Azure Storage Let’s now look at Windows Azure storage. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo 1: Setting up the Windows Azure storage account Server & Tools Business 4/19/2017 Demo 1: Setting up the Windows Azure storage account Batch Layer Speed Layer Serving Layer Azure Blob storage Azure Storage Explorer Talk Track: That’s enough talk for now. Let’s get to this sessions demo. For each of the boot camp demos, I’ll put the technologies that I’ll show off in context with the Big Data Lamba architecture. At the end of each presentation, you will get a chance to try out the demos yourself as hands-on-lab exercises. Here, we will setup a Windows Azure storage account that will be used for the batch layer. The blob store information will be served up using the Azure Storage Explorer available on Codeplex. I’ll then show how to access the storage account using the Azure Storage Explorer. In this demo, you will setup a Windows Azure Storage account for your storage related activities. You will also discover some of the new features that Windows Azure Storage Account has to offer. Besides, you will also learn using Azure Storage Explorer for exploring the Windows Azure Storage. Here end-users interact with the Windows Azure Blob storage via the Azure Storage Explorer tool as a front end interface. Azure Storage Explorer Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs Blob Container Account Images PIC01.JPG Video VID1.AVI http://<account>.blob.core.windows.net/<container>/<blobname> Pages/ Blocks Block/Page PIC02.JPG Contoso Talk Track: Let’s now take a look at the hierarchy of Blob storage The Blob service provides storage for entities, such as binary files and text files. The REST API for the Blob service exposes two resources: Containers Blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs: Block blobs, which are optimized for streaming. Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Blobs can be read by calling the Get Blob operation. A client may read the entire blob, or an arbitrary range of bytes. Block blobs less than or equal to 64 MB in size can be uploaded by calling the Put Blob operation. Block blobs larger than 64 MB must be uploaded as a set of blocks, each of which must be less than or equal to 4 MB in size. Page blobs are created and initialized with a maximum size with a call to Put Blob. To write content to a page blob, you call the Put Page operation. The maximum size currently supported for a page blob is 1 TB. Codeplex tools like the Azure Storage Explorer make managing blobs easy. There is also a rich API build to manage storage with PowerShell via the Rest based API. Key Points: The Blob service defines two types of blobs: Block blobs, and Page blobs Accessible via REST APIs, Windows Azure Storage Client library or using Windows Azure drives Stores large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system References: Data Management and Business Analytics: http://www.windowsazure.com/en-us/develop/net/fundamentals/cloud- storage/#blob

Server & Tools Business 4/19/2017 Getting started with HDInsight Service Let’s now look at how to get started with the Windows Azure HDInsight Service © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo 2: Setting up the Windows Azure HDInsight cluster Server & Tools Business 4/19/2017 Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console Talk Track: In this demo, I’ll show you how easy it is to setup an HDInsight cluster that uses the Blob Storage as a Hadoop File System. Here, the HDInsight cluster will be part of the Batch layer and I’ll show you the essentials for accessing the cluster using the HDInsight console. A Microsoft HDInsight cluster is associated with a Windows Azure Storage account or some affinity group. End users can use the HDInsight Console to interact with the HDInsight cluster and also the Windows Azure Storage account associated with this cluster. HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo 3: Loading data into Windows Azure storage for use with HDInsight Server & Tools Business 4/19/2017 Demo 3: Loading data into Windows Azure storage for use with HDInsight Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console Talk Track: In the last demo for this presentation, I’ll show how you can prepare and upload data into the Hadoop cluster – specifically the Windows Azure Blob storage that is associated with our HDInsight cluster. As described in earlier demo, the HDInsight cluster is associated with a Windows Azure Storage account or some affinity group. End users can use the HDInsight Console to interact with the HDInsight cluster and also the Windows Azure Storage account associated with this cluster. HDInsight Console Windows Azure HDInsight https://<ClusterName>.azurehdinsight.net/ CSV files from local disk Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Server & Tools Business 4/19/2017 Easy Access to Data, Big & Small Let’s now see how Microsoft Big Data solutions allow you to work with any data. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Easy Access to Data, Big & Small Server & Tools Business 4/19/2017 Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Easily preview, shape, & format your data Key Features Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Talk Track: Lets now talk about how technologies like Data Explore in Excel, Window Azure Marketplace and HDInsight service, and Polybase can provide you with easy access to all data – both big and small. With Power Query, you have an intuitive and consistent experience for discovering, combining, and refining any data, including relational, structured and semi-structured, OData, Web, Hadoop, Azure Marketplace, and more. Power Query also provides you with the ability to search for public data from sources such as Wikipedia. The Windows Azure HDInsight Service makes Apache Hadoop available as a service in the cloud, provides a software framework designed to manage, analyze and report on Big Data. As a cloud-based service, it makes these resources available in a simpler, more scalable, and cost efficient environment. As a part of Microsoft’s overall Big Data strategy, SQL Server 2012 Parallel Data Warehouse includes PolyBase, a new breakthrough technology that dramatically simplifies combining non- relational data and traditional relational data for analysis. PolyBase seamlessly provides the benefits of “Big Data” without the complexities. Normally, organization would need to burden IT with pre-populating the data warehouse with Hadoop data, or undergo extensive training on MapReduce in order to query non-relational data. With Polybase, this is made easy, enabling you to rapidly query massive data sets by combining MPP data warehousing performance with Hadoop. Key Points: Power Query: Discover, Search, Transform and Combine data (relational, structured and semi- structured) from across multiple sources. Windows Azure HDInsight Service: Framework to manage, analyze and report on Big Data, using Apache Hadoop services in the cloud. SQL Server 2012 Parallel Data Warehouse (Polybase): Faster ways to combine non-relational data and traditional relational data for analysis. References: Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Server & Tools Business 4/19/2017 Learn more Getting Started with HDInsight http://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx Azure HDInsight and Azure Storage http://blogs.msdn.com/b/windowsazure/archive/2013/03 /21/azure-hdinsight-and-azure-storage.aspx Talk Track: That’s it for this session. To learn more about what I just showed in this session, check out these to resource links for Getting Started with HDInsight and Azure HDInsight and Azure Storage Thank you! END OF PRESENTATION ---------- © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Questions?