Presentation is loading. Please wait.

Presentation is loading. Please wait.

Azure Cosmos DB What you need to know & concepts

Similar presentations


Presentation on theme: "Azure Cosmos DB What you need to know & concepts"— Presentation transcript:

1 Azure Cosmos DB What you need to know & concepts
Satya Jayanty Azure Cosmos DB What you need to know & concepts

2 BIG Thanks to SQL Sat Denmark sponsors
GOLD SILVER BRONZE

3 26+ years of IT experience Principal Consultant - D BI A Consulting
Speaking Engagements (115)

4 Let’s begin our ride … … … …fasten your seat-belt

5

6 Data : small data to big data…

7 Building globally distributed applications
Data demand: Building globally distributed applications Mission-critical applications for a global userbase need … Global distribution Elasticity of compute and storage Fast, Responsive millisecond latency Durable, Consistent and Highly available

8 Data demand: Challenges
Developing highly-scalable globally distributed apps comes with planet-scale challenges Write near-accuracy, globally distributed apps Managing (version control) complex schemas Scale throughput and storage Balancing the eventual consistency Deliver highly responsive experience to the customer Worldwide presence with high availability

9 Goodbye Document DB … Azure Cosmos DB Welcome Cosmos DB …
Cosmos DB is Azure’s NoSQL Database-as-a-Service, born in cloud, globally distributed, highly scalable & highly available. The first & only globally distributed, multi-model database system

10 Azure Cosmos DB Turnkey global distribution
A globally distributed, massively scalable, multi-model database service SQL MongoDB Table API Document Column-family Key-value Graph Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Turnkey global distribution You can distribute your data to any number of Azure regions, with the click of a button. This enables you to put your data where your users are, ensuring the lowest possible latency to your customers. Using Azure Cosmos DB's multi-homing APIs, the app always knows where the nearest region is and sends requests to the nearest data center. All of this is possible with no config changes. You set your write-region and as many read-regions as you want, and the rest is handled for you. As you add and remove regions to your Azure Cosmos DB database, your application does not need to be redeployed and continues to be highly available thanks to the multi-homing API capability.

11 Azure Cosmos DB Global Distribution
A globally distributed, massively scalable, multi-model database service Global Distribution Transparent and automatic multi-region replication Associate any number of regions with your database account, at any time Policy based geo-fencing Multi-homing APIs All endpoints are logical, by default Apps don’t need to be redeployed during regional failover Apps can also access physical endpoints if needed Support for both manual and automatic failover Designed for high availability Simulate regional disasters via API Allows for dynamically setting priorities to regions Test the end-to-end availability for the entire app (beyond just the database)

12 Azure Cosmos DB Global Distribution Elastic scale-out
A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out Partition management is automatically taken care for you Independently scale storage and throughput across regions Scale storage from Gigabytes to Petabytes Scale throughput from 100s to 100,000,000s of requests/record Dial down throughput and provision only what is needed

13 Azure Cosmos DB What are Azure Cosmos DB - RUs (Request Units)?
A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out What are Azure Cosmos DB - RUs (Request Units)? Request Units are per seconds RUs (Request Units) are rate base currency You reserve in increments of 100 Normalized number representing amount of CPU/Memory/IO Operations Minimum of 400 for Fixed DB of 10 GB Reserved compute for processing operations Minimum of 1000 for Unlimited databases

14 Azure Cosmos DB A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out Calculating Cosmos DB Request Units (RU) for CRUD and Queries

15 Azure Cosmos DB Global Distribution Elastic scale-out
A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out Guaranteed single-digit latency Reads and writes served from local regions Guaranteed millisecond latency worldwide Write optimized, latch-free database engine Automatically indexed SSD storage Synchronous and automatic indexing at sustained ingestion rates

16 Azure Cosmos DB Global Distribution Elastic scale-out
A globally distributed, massively scalable, multi-model database service Global Distribution Guaranteed single-digit latency Elastic scale-out Choice of 5 consistency levels When you choose the eventual model, you’re saying it doesn’t matter what order data is read as long as something is available. Data that fetches under the eventual model offers the lowest latency for both reads and writes but it also provides the weakest consistency. EVENTUAL: Getting whatever you can, whenever you can, as fast as you can The eventual model favors app performance above data consistency or write order. The eventual model is great for apps that live and die according to their availability. Product reviews have to be available for customers to reach when they want them but it’s not crucial that the reviews always include the latest ratings or preserve the order of the ratings. Social media wall posts (not the comments to a post, but the initial post itself) just need to show up eventually. Users care more about seeing activity when they’re on the site then they care about seeing the order of the activity. It’s okay if, later on, the posts reorder or repopulate in their feed as long as there’s something new to see now. Transaction receipts don’t necessarily need to be available immediately after purchase, as long as they show up within a reasonable window of time. SESSION: Putting the individual app user’s experience front and center The session model prioritizes the user’s interaction by guaranteeing highly available and consistent data throughout that particular session. Session consistency provides predictable read-your-own-write consistency for a given session with maximum read throughput while preserving low latency writes and reads. Consistency within a given session is strong, while consistency outside the given session is eventual. The session model is great for apps that require logical and real-time experiences for the user. Profile updates your user writes to her account must be immediately available for her to read, whereas it’s less important for her to read profile updates other users are writing simultaneously. Social music apps such as Spotify need to be consistent with users’ playlists preferences as they are building them, but the preferences don’t have to show up right away for everyone else who is “following.” STRONG: Getting perfect data every time no matter how long it takes The strong model favors data consistency above all else and preserves the order in which data is written. It guarantees your app users will see all previous writes. When you choose the strong model, you ask your app users to wait until all data writes have been fully written in the master and made durably available. Your app users get an error message if their request comes before the data is ready. The strong model is great if you need your app users to read the absolute truth every time. Banking accounts need to reflects the order of transactions and provide an accurate balance, so team members in different offices don’t pay the same bill twice. Payment processing for online orders need to occur in the correct order especially to avoid charging for the same order more than once. Reservation systems must show accurate availability when customers finalize their booking. BOUNDED STALENESS: Fetching data that’s not “too old” to boost performance. The bounded staleness model ensures relatively accurate data in a more reasonable time frame than strong model. When you choose the bounded staleness model, you are saying it’s okay for apps to fetch old data from local replicas provided it’s not more than x versions older than a primary or peer. The bounded staleness model is great for apps that can afford a little lag time in favor odd data consistency. Flight status apps provide flight arrival time estimations using GPS data collected from planes as they fly. The GPS data doesn’t have to be the most up to date to provide a reasonable estimation. It’s more important that the user get information when they need it. Package tracking apps for a shipping company need to provide chronologically ordered and check points that show where and when a package is received. CONSISTENT PREFIX: Preserving the order of data writes without too much concern for how old it is The consistent prefix model favors performance and availability without sacrificing the sequence of events by fetching old data fast When you choose the consistent prefix model, you’re saying it’s okay to give your app users old data as long as the data read observes the actual sequence of writes. This differs from the eventual model in that it reflects the order of writes as they occurred. Baseball score updates running at the bottom of ESPN must appear in the order that they occurred during the game at the expense of being up-to-the minute accurate Social media comments must be ordered to preserve the back-and-forth nature of dialogue and make sense to people reading them, but the reads do not need to be fully up-to-date. As a result, the cost of read operations (in terms of system resources) are lower than Session, Bounded Staleness and Strong. Five well-defined, practical, and intuitive consistency models provide a spectrum of strong SQL-like consistency all the way to the relaxed NoSQL-like eventual consistency, and everything in-between.

17 Azure Cosmos DB Global Distribution Elastic scale-out
A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out Guaranteed single-digit latency Choice of 5 consistency levels Enterprise level SLAs Only service with financially backed SLAs for millisecond latency at the 99th percentile, 99.99% HA (High Availability) and guaranteed throughput and consistency

18 Azure Cosmos DB Global Distribution Elastic scale-out
A globally distributed, massively scalable, multi-model database service Global Distribution Elastic scale-out Guaranteed single-digit latency Choice of 5 consistency levels Enterprise level SLAs Multi-model + multi API Database engine operates on atom-record-sequence (ARS) based type system All data models are efficiently translated to ARS API and wire protocols are supported via extensible modules Instance of a given data model can be materialized as trees Graph, documents, key-value, column-family, … more to come

19 Common use for a graph database
Social Networks Recommender Systems Logistics e.g. Flights IoT (Internet of Things) Fraud Network Detection many other scenarios…

20 2/19/2019 6:25 AM Resource Models © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Resource Model Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. Account Database Database Container Database Item

22 Account URI and Credentials
********.azure.com Database IGeAvVUp … Database Container Database Item

23 Creating Account Account Database Database Container Database Item

24 Database Representations
Account Database Database Container Database Container Database Item Database Item ollectionA collection is a container of JSON documents and the associated JavaScript application logic. Collections can span one or more partitions/servers and can scale to handle practically unlimited volumes of storage or throughput.

25 Container Representations
Account Database Database Container = Collection Graph Table Database Item

26 Creating Collections – SQL API
Account Database Database Container Database Item

27 Container-Level Resources
Account Database Database Container Sproc Trigger UDF Database Item Conflict This is the conflicting resource resulting from a concurrent async operation in the Azure Cosmos DB service.

28 System Topology (Behind The Scenes)
Planet Earth Azure regions Datacenters Stamps Fault domains Container Resource Manager Language Runtime(s) Hosts Query Processor RSM Index Manager Bw-tree++/ LLAMA++ Log Manager IO Manager Resource Governor Transport Database engine Admission control Various agents a Stampis a cluster of servers hosted in Azure Datacenter. A Fault Domain (FD) is essentially a rack of servers. It consumes subsystems like network, power, cooling etc. Cluster Machine Replica Database engine

29 To remote resource partition(s)
Resource Hierarchy CONTAINERS Logical resources “surfaced” to APIs as tables, collections or graphs, which are made up of one or more physical partitions or servers. RESOURCE PARTITIONS Consistent, highly available, and resource-governed coordination primitives Consist of replica sets, with each replica hosting an instance of the database engine Containers Resource Partitions Collections Tables Graphs Tenants User defined (arbitrary) JSON content An attachment is a special document containing references and associated metadata for external blob/media.  Leader Follower Forwarder Replica Set To remote resource partition(s) An attachment is a special document containing references and associated metadata for external blob/media. The developer can choose to have the blob managed by Cosmos DB or store it with an external blob service provider such as OneDrive, Dropbox, etc. Role Based Access Controls (RBAC) 

30 What sets Azure Cosmos DB apart
Logical Structure Instance Database (sp, trigger, udf) Tables Records T-SQL Relational XML –JSON –Graph built on top Transactions can cross tables, databases T-SQL, module or ad-hoc Account Database Collection (sp, trigger, udf) Documents Document Database MongoDb Compatibility Key-Value (Azure Table) Graph (Gremlin) Wide Column (future…) Transactions are scoped in a container (fixed) or a single partition (unlimited) JavaScript in a module API Transactions

31 Cosmos DB Container (e.g. Collection)
Partitions document: a partition key and a row key, which uniquely identify it. partition key acts as a logical partition for your data and provides Azure Cosmos DB with a natural boundary for distributing data across physical partitions.  The data for a single logical partition must reside inside a single physical partition and physical partition management is managed by Azure Cosmos DB. Partition Key: User ID Cosmos DB Container (e.g. Collection) Logical Partitioning Abstraction Behind the Scenes: Physical Partition Sets hash(User ID) Psuedo-random distribution of data over range of possible hashed values

32 Partitions hash(User ID) Pseudo-random distribution of data over range of possible hashed values Partition Ranges can be dynamically sub-divided to seamlessly grow database as the application grows while simultaneously maintaining high availability. Partition management is fully managed by Azure Cosmos DB, so you don't have to write code or manage your partitions. If you are implementing a multitenant application using Azure Cosmos DB, there are two popular designs to consider: one partition key per tenantand one container per tenant. Dharma Shireesh Karthik Rimma Alice Carol Dharma Shireesh Rimma Karthik + Partition x Partition x1 Partition x2 Ballpark scale needs (size/throughput) Understand the workload # of reads/sec vs writes per sec Use pareto principal (80/20 rule) to help optimize bulk of workload For reads – understand top 3-5 queries (look for common filters) For writes – understand transactional needs

33 How latency is addressed?
Guaranteed low latency at P99 (99th percentile) Requests are served from local region reads and writes replicated as per selected consistency model Single-digit millisecond latency worldwide < 10ms for reads < 15ms for writes Synchronous automatic indexing at sustained ingestion rates the data is written, committed, replicated and indexed in less than 15ms Write optimized, latch-free database engine designed for SSD

34 Monitor with Metrics Monitor SLAs
Understanding how many requests are succeeding or causing errors Determining the throughput distribution across partitions Determining the storage distribution across partitions Comparing data size against index size Debugging why queries are running slow SQL API SDKs, Azure Cosmos DB provides query execution statistics. Performance Metrics – Azure portal Metrics page Setup alerts To access additional metrics, use the Azure Monitor SDK. Available metric definitions can be retrieved How/What to Monitor?

35 What sets Azure Cosmos DB apart
Dynamic setting priorities to regions Test the end-to-end availability for the entire app (beyond just the database) by simulating regional disasters (via API) Free emulator for testing: Azure Cosmos DB Emulator – emulator Reserved capacity –  save money by pre-paying for Azure Cosmos DB resources for a period of one year or three years availability-of-azure-cosmos-db-reserved-capacity/ Support for both policy based (manual and automatic) failover Foundational Azure service – available in all Azure regions by default - allows you to have your data replicated to as many data centres in as many regions specify priorities for data regions - decide where to failover – manual or automatic What sets Azure Cosmos DB apart

36 Schema-agnostic, automatic indexing
At global scale, schema/index management is painful All properties indexed by default Automatic and synchronous indexing Be mindful about space and RU consumption Types: Hash, range, and geospatial Works across every data model Highly write-optimized database engine Schema Physical index

37 Security & Compliance Always encrypted at rest and in motion (https)
Fine grained “row level” authorization Database Access Control Master Keys Resource Tokens Network security with IP firewall rules Comprehensive Azure compliance certification: ISO 27001 ISO 27018 EUMC HIPAA PCI SOC1 and SOC2

38 Lowest Total Cost of Ownership (TCO)
Handle Cloud core properties & economies of scale Price is at the container level Cost is split in RUs + Storage Dynamically adjusted based on RUs Charged per hour Significantly cheaper than DynamoDB, Cassandra, Cloud Spanner and MongoDB Mutli-tenant service with resource governance Fully managed as a service - no dev/ops expenses needed Backup & Restore Coupled with Azure accont Automated backups – every 4 hours and latest 2 backups are stored (Azure Storage) No self-service restore available (at this point) $ 10x Cost $ 3x Cosmos DB DynamoDB On-premises MongoDB/ Cassandra Designed from the ground up as a multi-tenant service with end-to-end resource governance to provide performance isolation.

39 Common Scenarios 2/19/2019 6:25 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

40 E-Commerce – Product Catalog & Order Processing
Business Needs: Elastic scale to handle seasonal traffic (e.g. Black Friday) Low-latency access across multiple geographies to support a global user- base and latency sensitive workloads (e.g. real-time personalization) Schema-agnostic storage and automatic indexing to handle diverse product catalogs, orders, and events High availability across multiple data centers Elastic scale to handle bursty traffic on day Low-latency queries to support responsive gameplay for a global user-base Schema-agnostic storage and indexing allows teams to iterate quickly to fit a demanding ship schedule Change-feeds to support leaderboards and social gameplay

41 Multiplayer Gaming - Xbox
Azure Cosmos DB (game database) Azure HDInsight (game analytics) Azure CDN Azure API Apps (game backend) Azure Storage (game files) Azure Notification Hubs (push notifications) Azure Functions Azure Traffic Manager

42 Cosmos DB: In Summary Benefits, Pricing … Multi-platform API:
MongoDB API, SQL, Azure Tables, Gremlin Five consistency models Most other database systems only offer two: eventual consistency and strong consistency Competition: Google Cloud Platform & Cloud Spanner Amazon Dynamo DB Pricing: Cosmos DB pricing - Pay by the hour, change throughput at any time for only what you need Request Units (RUs) - Capacity Planner: Estimate Request Units & Data Storage Cosmos DB: In Summary

43 ….and References Try Cosmos DB for free Training DB documentation Truly Cosmic Data Migration Tool – CosmosDB: Query Cheat Sheet GitHub Samples Foundations of Azure Cosmos DB with Dr. Leslie Lamport

44 Azure Cosmos DB What you need to know & concepts
Thank you Satya Jayanty

45 Raffle and goodbye Beer
Remember to visit the sponsors, stay for the raffle and goodbye beers  Join our sponsors for a lunch break session in: cust 0.01 and cust 1.06 We hope you’ll all have a great Saturday. Regis, Kenneth

46 BIG Thanks to SQL Sat Denmark sponsors
GOLD SILVER BRONZE


Download ppt "Azure Cosmos DB What you need to know & concepts"

Similar presentations


Ads by Google