Download presentation
Presentation is loading. Please wait.
1
Deep Into the Cosmos(DB)
Anthony Mattas Principal Cloud Data Architect, Microsoft Deep Into the Cosmos(DB)
2
About Me 5 Years @ Microsoft 2 Years @ BlueGranite
6 Stryker Corporation 2x MCSE, 2x MCSA, & MCT B.Sc. Computational Theory & Algorithm Analysis, Western Michigan University
3
Learning Objective Learn to use Databricks to work with data in CosmosDB Understand the Integration of real-time analytics, using Databricks, into an operational pipeline Be enabled to repeat this lab content with customers and peers. The goal of this session is to introduce you to a scenario within which customer profile data residing in CosmosDB is scored for propensity to buy using Databricks. This scenario will be built out in this lab session, allowing you to build an end-to-end solution which we believe will resonate with customers deploying applications on Azure.
4
Scenario
5
Scenario Adventure Works, an online retailer of bicycles, components, accessories, and clothing, is exploring ways to drive additional sales to visitors to its website. Scenario: We are Adventure Works, an online retailer that specializes in bicycles, bike components, accessories, and branded clothing and we want to increase sales through our website. We have a lot of data on customer demographics and purchase history and want to see if we can use that data and machine learning to predict the propensity of a visitor to the site to purchase a bicycle. Then show them targeted promotions or discounts that might lead to more conversions.
6
Scenario One idea is to use machine learning to display promotions, banners, or discounts customized for the customer’s purchase history and demographics when they visit.. From a machine learning standpoint, this is not an uncommon use case. However, building the model is only the first step. We want to integrate this into our website operation so we can take advantage of those predictions in production. And this is what you will build out in the lab.
7
Scenario Your job is to integrate a machine learning model into the website using Spark and data in CosmosDB. Scenario: We are Adventure Works, an online retailer that specializes in bicycles, bike components, accessories, and branded clothing and we want to increase sales through our website. We have a lot of data on customer demographics and purchase history and want to see if we can use that data and machine learning to predict the propensity of a visitor to the site to purchase a bicycle. Then show them targeted promotions or discounts that might lead to more conversions. From a machine learning standpoint, this is not an uncommon use case. However, building the model is only the first step. We want to integrate this into our website operation so we can take advantage of those predictions in production. And this is what you will build out in the lab.
8
Current Architecture Cosmos DB (Profile Data) Customer eCommerce Site
SQL DB (Transactions) Today our website is hosted in Azure App Service. We use Azure CDN to host the static assets presented on the site. Transactional information like orders we record in a SQL database. Customer profile data we maintain in a Cosmos DB collection. Multiple stores for different type of assets based on the requirements for each use case. CDN
9
Current Architecture – Profile Data
"demographics": { "totalchildren": "3", "commutedistance": "0-1 Miles", "numbercarsowned": "3", "education": "Bachelors", "occupation": "Management", "numberchildrenathome": "0", "gender": "F", "maritalstatus": "S", "yearlyincome": "100000", "houseownerflag": "0", "region": "North America", "age": "50" } "bicycle": { "propensity": " " } Profile Data This is an example of the profile data, it gives us some insight into who our customers are in addition to what we already know about their purchasing habits.
10
Expanded Architecture
Cosmos DB (Profile Data) Machine Learning Model Customer eCommerce Site SQL DB (Transactions) history.txt Azure Databricks We explored the data that was available to us and started out with an extract of customer purchase history from our data warehouse and made it available to our data scientists in our Databricks environment. Since this is our primary transactional system in this environment we didn’t want our data scientists to continuously be extracting data and impacting website performance. In addition to the purchase history, our customer profiles include demographic information that we’ve collected, either directly or through third parties. Building a model based on combining this data with the purchase history, can we predict someone’s propensity to buy a bicycle when they visit the website? If the model predicts you are a likely bicycle buyer, maybe we’ll target specific advertising or promotions to try to force a sale. Calling a machine learning model to score your data on every page hit can become costly, and impact the user experience (performance), so our application team worked with our data scientists and data engineers to push this data back to the profile database as additional attributes so it can be quickly retrieved. But what happens if we have a new customer that we haven’t seen before, and don’t have a propensity score on? Well it turns out we have the option of sending our new users information through the model to be scored as soon as their profile is created. CDN
11
Expanded Architecture
Cosmos DB (Profile Data) Machine Learning Model Customer eCommerce Site SQL DB (Transactions) history.txt Azure Databricks CDN
12
Lab
13
Prerequisites Required: Helpful: Azure Subscription
Familiarity with Databricks & CosmosDB Python & SQL Knowledge Understanding of Lambda Architecture Pattern To follow along with the lab, you will need access to an Azure Subscription and be able to deploy CosmosDB, Databricks and a storage account within that subscription. We will be doing our work in Python, so familiarity with that language as well as SQL/HiveQL will be important.
14
Modules Environment Setup Lab 1 - Setup Databricks Storage
Lab 2 - Initialize Profiles Collection Lab 3 - Build Propensity Model Lab 4 - Implement Bulk Batch Scoring Lab 5 - Implement Incremental Batch Scoring This lab consists of 5 modules, today I’m going to walk through 1-4 since we won’t have enough time to get through the 5th one. All of this content will be published and you’ll be able to (and I highly encourage you to) walk through this at your own leisure.
15
Resources Lab Content: https://amatt.as/CosmosBricksLab
Cosmos DB: Azure Databricks:
16
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.