Download presentation
Presentation is loading. Please wait.
1
Hello My Friends, Welcome to CosmosDB
Peter Shore SQL Saturday Cincinnati
2
About Me Database Administrator – Thirty- One Gifts
Intentionally Accidental DBA Over 20 years IT experience Server Engineer Desktop Engineer Network Infrastructure Desk side support President CBusPASS Co-Organizer SQL Saturday Columbus How to find me
4
Warning MongoDB GraphDB Gremlin Cassandra Key-Value JSON XML NoSQL
5
Relational “vs.” NoSQL Relational Pros Data Integrity
ACID Atomic – all or none of a transaction Consistency – transaction never half finished Isolation – transactions are separated until finished Durability – pending changes tracked for recovery from abnormal termination Standardized access via ANSI SQL Structured Performance Scale up
6
Relational “vs.” NoSQL Relational Cons Complexity Performance
Normalization Data Types Maintenance Performance Abundance of information sources Scale up
7
Relational “vs.” NoSQL Pros of NoSQL Cons of NoSQL No formal structure
Flexibility Performance Scale out Large volumes of data Minimal maintenance Lack of ACID Eventually consistent Lack of formal structure Lack of standardization
8
What is CosmosDB? Microsoft says*
Globally distributed Multi-model database Multi-master Elastically and independently scale Throughput Storage Across any number of Azure's geographic regions Offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs). *
9
SLAs
10
What is CosmosDB? Globally Distributed
Across any number of Azure regions Regions are sets of data centers Tier 1 Azure application Add and remove regions from CosmosDB No application redeployment Highly available Data where the users are CosmosDB’s multi-homing API Requests to nearest data center No configuration changes required
12
What is CosmosDB? Multi-model Graph Use graph structures to store data
Graph structures are made up of nodes, edges, and properties Nodes represent entities Edges represent relationships Properties are germane information to nodes Key concept the graph (or edge or relationship) Directly relates items in the data-store Links allow data in the store to be connected directly and often accessed in one operation Uses Gremlin API in CosmosDB Fraud Detection & Analytics Solution. ... Knowledge Graph. ... Network and Database Infrastructure Monitoring. ... Recommendation Engine & ... Master Data Management. ... Social Media and Social Network Graphs.
14
What is CosmosDB? Multi-model continued
Key-value database (aka key-value store) Designed for storing, retrieving and managing associative arrays Associative arrays are known as dictionaries or hash tables Collections of objects/records Records are stored and retrieved using keys Records may contain different fields Each record treated as an opaque collection Phone directory Stock trading (Symbol & value) Artist Info IP Forwarding table
15
What is CosmosDB? Multi-model continued MongoDB Documents
Not inherently opaque Relies on inner structure of documents to extract metadata Can be XML, JSON, YAML, or BSON Also binary formats such as PDF or Office topics Core operations are Create, Retrieve, Update, Delete (CRUD) Documents are addressed by a unique key which is indexed Organized Collections Tags/non-visible meta data Directories Variable length documents JSON/BSON Performance Scale out Dynamic Operational Intelligence, log storage Product catalogs
16
What is CosmosDB? Multi-model continued Table
API for applications written to use Azure Table storage Column-family (Cassandra Data Model) Maximize writes and data duplication Minimize partitions used SQL API Document database Accessible with familiar SQL syntax Schema not required
17
What is CosmosDB? Atom-Record-Sequence* Core system type of CosmosDB
Atoms are a small set of primitive types String Bool Number Records are structs Sequences are arrays of atoms, records, or sequences Azure CosmosDB engine efficiently translates and projects other data models onto ARS Allows for native support of APIs * Enable customers to elastically scale throughput and storage based on demand, globally. The system should deliver the configured throughput within 5 seconds at the 99th percentile, from the time of the request to scale. Enable customers to build highly responsive, mission-critical applications. The system must deliver predictable and guaranteed end-to-end low read and write latencies at the 99th percentile. Ensure that the system is “always on”. The system must provide 99.99% availability regardless of the number of regions associated with their database. To enable customers to test the end-to-end availability properties of the applications, (in steady state) the service must also allow customers to simulate regional failures or mark regions associated with their database offline. This helps validate the end-to-end availability properties of applications. Enable developers to write correct globally distributed applications. The system must offer an intuitive and predictable programming model around data consistency. While strong consistency comes with a price, writing large globally distributed applications against an “eventually consistent” database results in an application code which is hard to reason about, is brittle, and rife with correctness bugs. Offer stringent financially-backed comprehensive SLAs for 1, 2, 3 and 4 above. Relieve the developers from the burden of database schema/index management and versioning. Keeping database schema and indexes in-sync with an application’s schema is especially painful for globally distributed applications. Natively support multiple data models and popular APIs for accessing data. The translation between the externally exposed APIs and internal data representation needed to be efficient. Operate at a very low cost to pass on the savings to customers.
18
Request Units Request Units/second Measurement used for charges
Can vary per API Abstracts system resources (CPU, IOPS, and memory)
19
Good, Bad, and Ugly of Backups
Microsoft managed Taken every 4 hours Bad Only last two snapshots retained Must contact support for restore If container or database is deleted, last backup retained for 30 days Ugly Copy data with ADF
20
Partitioning, Not What You Think
Splitting items into distinct subsets known as logical partitions Scale containers within a database Partitions formed based on value of the partition key Once defined partitions automatically managed New logical partitions are created as new items are added
21
Choosing a Key Avoid hot spots
Requests to a partition key cannot throughput allocated to partition Requests limited if throughput exceeded Key needs to spread workload evenly across partitions and evenly over time Needs a wide range of values Portion cannot exceed 10GB of storage
22
Indexing By default all items are indexed
Default policy can be overridden Include/exclude terms or properties Can add additional index types, such as spatial Index Types Data Types (String, Number, Point, Polygon, or LineString) Index Kind (Range – equality, range or order by queries) or Spatial Precision - Range index
23
Index Kinds Range Spatial Index Supports equality queries JOIN queries
Range queries Data type of string or Number Spatial Index Efficient within and distance queries
24
Use Cases Common Uses Cases IoT Retail (Windows Store)
Gaming Services (XBoxLive) Gaming Statistics
25
IoT Use Case
28
Access Data MongoDB & Table Gremlin Cassandra
Primary Connection String Secondary Connection String Read Only Connection String Gremlin Gremlin Endpoint Cassandra Read-write Read-only
29
Further Data Access PowerBI
Analysis Services/Azure Analysis Services (in-memory only) SQL Server Linked Servers Polybase (SQL Server 2019)
30
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.