Download presentation
Presentation is loading. Please wait.
1
Introduction to Analytics
Turning Data into Intelligent Action Data Science, Azure Machine Learning, and Analytics are all becoming core pieces to the success of every business. As data professionals we need to know how these will affect what we do, and how we can use it to help our business and clients. In this session we will take a look at these hot topics and learn how it helps us turn data into intelligence that will lead to actions that drive profits. We will take a look at how Microsoft’s Cortana suite is the central hub for learning and implementing each of these core pieces. An easy to follow example showing the flow from Data to Data Intelligence and thru to Action will be presented using Cortana Intelligence Suite.
2
Chapter Leader / Regional Mentor for Canada
Melody Zacharias ClearSight Solutions Chapter Leader / Regional Mentor for Canada @SQLMelody ca.linkedin.com/in/melodyzacharias Specialize in Business Intelligence and cloud performance ( public or private) for financial institutions. I have blogged about many of the topics we will discuss today
3
SQL Server -> Data Platform
How data has changed. How far we have come. Name change from SQL Server to Data Platform. How data has changed. – single stream input, OLTP, or traditional BI, with Monthly aggregate, daily if you were cutting edge, on premise SQL Server How far we have come. – multi stream inputs, IOT, on demand BI with real time tansaction processing. Cloud processing becomes more the norm.
4
New Direction Data Intelligence Action
As data professionals we sometimes get caught in the weeds 10,000 foot view of the new world order This is what it means to business Data from any source that can be used to answer the questions the business needs. Information about anything Hidden solutions revealed Automation of systems and processes Information is not the same as intelligence this is where Cortana comes in Direct from Microsoft’s site.
5
Action So now into the weeds, what does this mean for us as data professionals? Lets look at these in more detail, and see how all these puzzle pieces fit together. Cortana Intelligence Azure HDInsight Data Factory Data Catalog Machine Learning
6
Intelligence Options SQL Server Data warehouse Power BI HDInsight
Data Lake Data Stream Data Factory Data Catalog Machine Learning SQLServer ( on or off premises) and in all its forms– The one we know and love, there are many sessions on this today so I will not cover it. There could be just as many sessions on all these options as well. In the coming years I think that is exactly what is going to happen. Data Lake – The Lake at the cottage, vast, eco diverse and with out structure. There are rocky shores, and swampy areas, a bog where we get stuck and maybe if we are lucky even a beach. Data Stream – the steady flow into the lake. HDInsight -Microsoft’s cloud service that provides a managed platform for Apache Hadoop, Spark, R, HBase, Giraph, and Storm. Data Factory – SSIS in the cloud, automates and controls movement in the cloud. Data Catalog – The SME --Data Knowledge, data cleansing and verification, documentation of the data. Machine Learning - Predictive Analytics
7
Data Lake Storage Secure High Availability Performance tuned
Compatibility Big data storage - unlimited storage and is suitable for storing a variety of data for analytics No limits on account sizes, file sizes, or the amount of data. Files can range from kilobyte to petabytes in size making it a great choice to store any type of data. - Essentially unformatted. It is unconstrained with no schema or transformations required. - NO file size limit is what makes it different from Data blob store. (1 TB page blobs,500 TB per container) Secure - Azure Data Lake Store uses Azure Active Directory(ADD) for authentication and access control lists (ACLs) to manage access to your data. AAD features including multi-factor authentication, conditional access, role-based access control, application usage monitoring, security monitoring and alerting, etc. Azure Data Lake Store supports the OAuth 2.0 protocol for authentication with in the REST interface. Optimized - H/A – stored durably by making redundant copies to guard against any unexpected failures. No limit on length of time for storage Compatibility – Most Apache open source tools as well as Azure Services, Hadoop clusters, Data Lake analytics, Stream Analytics, Data Catalog, Power BI, Data Factory, Azure SQL Database and Powershell. Performance - parts of a file over a number of individual storage servers. This improves the read throughput when reading the file in parallel for performing data analytics.
8
Data Stream Stream Analytics Scalable Ease of use Reliable Low cost
Reference data Integrations Connectivity -Real time event processing Scalable to 1GB/sec -monitor and adjust the scale/speed of your job in the Azure portal to scale from a few kilobytes to a gigabyte or more of events processed per second. -Stream analytics is a variation of T-SQL with inteli-sense. quickly and easily implement time series queries, including temporal-based joins, windowed aggregates, temporal filters, and other common operations such as joins, aggregates, projections, and filters. In addition, in-browser query testing against a sample data file enables quick, iterative development. -With the ability to internally maintain state, the service provides repeatable results ensuring it is possible to archive events and reapply processing in the future, always getting the same results. This enables customers to go back in time and investigate computations when doing root-cause analysis, what-if analysis, etc. -pay as you go based on Streaming Unit usage and the amount of data processed by the system. Usage is derived based on the volume of events processed and the amount of compute power provisioned within the cluster to handle the respective Stream Analytics jobs. Reference data can be historical data or simply non-streaming data that changes less frequently over time. The system simplifies the use of reference data to be treated like any other incoming event stream to join with other event streams ingested in real time to perform transformations. Integrations with AML to allow for use of user defined functions and additional processing. Stream Analytics connects directly to Azure Event Hubs and Azure IoT Hubs for stream ingestion, and the Azure Blob service to ingest historical data. Results can be written from Stream Analytics to Azure Storage Blobs or Tables, Azure SQL DB, Azure Data Lake Stores, DocumentDB, Event Hubs, Azure Service Bus Topics or Queues, and Power BI, where it can then be visualized, further processed by workflows, used in batch analytics viaAzure HDInsight or processed again as a series of events. When using Event Hubs it is possible to compose multiple Stream Analytics together with other data sources and processing engines without losing the streaming nature of the computations. Stream Analytics
9
HDInsight/Hadoop Big Data
Azure cloud implementation of the Apache Hadoop technology stack Integrations Default programming languages and scripts Automatic provisioning of clusters Storage is efficient and economical Hybrid options From twitter feeds to Industrial sensors, big data is being collected in ever-escalating volumes, at increasingly higher velocities, and in an expanding variety formats. For big data to provide actionable intelligence or insight, not only must you collect relevant data and ask the right questions, but also the data must be accessible, cleaned, analyzed, and then presented in a useful way. integrates with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services, web apps and SQL Databases Default languages are Java and Python, additional languages can be installed with script actions. Efficient and economical data storage with Azure Blob storage Virtual Network support. HDInsight clusters can be used with Azure Virtual Network to support isolation of cloud resources or hybrid scenarios that link cloud resources with those in your datacenter.
10
Components of HDInsight Clusters
Ambari: Cluster provisioning, management, monitoring, and utilities. Avro: Data serialization for the Microsoft .NET environment. Hive & HCatalog: SQL like querying, the a table and storage management layer. Mahout: Machine learning. MapReduce:Legacy framework for Hadoop distributed processing and resource management Yarn: the next-generation resource framework. Oozie:Workflow management. Phoenix: Relational database layer over HBase. Pig: Simpler scripting for MapReduce transformations. Sqoop: Data import and export. Tez: Allows data-intensive processes to run efficiently at scale. YARN: Part of the Hadoop core library and next generation of the MapReduce software framework. ZooKeeper: Coordination of processes in distributed systems. Hive is data warehouse software built on Hadoop that allows you to query and manage large datasets in distributed storage by using a SQL-like language called HiveQL. Hive, like Pig, is an abstraction on top of MapReduce. When run, Hive translates queries into a series of MapReduce jobs. Hive is conceptually closer to a relational database management system than Pig, and is therefore appropriate for use with more structured data.
11
Data Factory SSIS Personalized Product Recommendations
Effectiveness of Marketing Campaigns -SSIS in the cloud Automates and controls movement and transformation of data Hybrid model from Raw to ready to use data Online retailers use it to generate personalizedproduct recommendations based on customer browsing behavior. Game studios use it to understand the effectiveness of their marketing campaigns. Data Factory works across on-premises and cloud data sources and SaaS to ingest, prepare, transform, analyze, and publish your data. Use Data Factory to compose services into managed data flow pipelines to transform your data using services -Monitor all of your data flow pipelines from a single unified view to easily pinpoint issues and setup monitoring alerts.
12
Data Catalog Discover, understand, and consume data sources
Insider knowledge Consumption knowledge Documentation SME Use and understanding Insider knowledge - did not know it existed until you came across it by accident Consumption – what is the connections string or path? Documentation – where is it if it exists and how many versions, is it in a file directory or on sharepoint? SME - f a user has questions about an information asset, he must locate the expert or team responsible for the data and engage those experts offline; there is no explicit connection between data and those with expert perspectives on its use. Creating and maintaining documentation for a data source is complex and time-consuming. The challenge of making that documentation readily available to everyone who uses the data source is often even more so.
13
Data Catalog
14
Intelligentia a apparatus
Wizardry Magic Predictive analytics Latin for Intelligence from the Machine -technique of data science that helps computers learn from existing data in order to forecast future behaviors, outcomes, and trends. -Fraud detection, next best product -Predictive analytics uses various math formulas called algorithms that analyze historical or current data for patterns or trends in order to forecast future events.
15
Demo
16
QUESTIONS
17
Resources Data Stream: http://bit.ly/2cuXqj6 HDInsight :
Data Factory: Data Catalog: Your Face: Project Murphy: Cortana Suite: Azure free account: Documentation: My Blog: SQLMelody.blogspot.ca My Data Lake: This is the end of this session however, additional information on the items mentioned and this topic in general are listed here.
18
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.