Download presentation
Presentation is loading. Please wait.
Published byFrederick Lucas Modified over 9 years ago
1
Sessions about to start – Get your rig on!
2
Chris J.T. Auld Intergen
3
Agenda 1)Document DB Refresher 2)CUs, RUs and Indexing 3)Polyglot Persistence and Data Modelling 4)Data Tier Programmability 5)Trading Off Consistency
5
5 { } A fully-managed, highly-scalable, NoSQL document database service. Schema free storage, indexing and query of JSON documents Transaction aware service side programmability with JavaScript Write optimized, SSD backed and tuneable via indexing and consistency Built to be delivered as a service. Pay as you go. Achieve faster time to value.
6
DocumentDB in One Slide 6 Simple HTTP RESTful model. Access can be via any client that supports HTTP. Libraries for; Node,.NET, Python, JS All resources are uniquely addressable by a URI. Partitioned for scale out and replicated for HA. Tunable indexing & consistency Granular access control through item level permissions Attachments stored in Azure Blobs and support document lifecycle. T-SQL like programmability. Customers buy storage and throughput capacity basis at account level /dbs/{id}/colls/{id}/docs/{id}/attachments/{id} /sprocs/{id} /triggers/{id} /functions/{id} /users/{id} POST Item resource Tenant Feed URI PUT Item resource Item URI DELET E Item URI GET Tenant Feed Or Item URI Create a new resource /Execute a sprocs/trigger/query Replace an existing resource Delete an existing resource Read/Query an existing resource POST http://myaccount.documents.azure.net/dbs { "name":"My Company Db"}... [201 Created] { “id": "My Company Db", “_rid": "UoEi5w==", “_self": "dbs/UoEi5w==/", “_colls": "colls/", “_users": "users/" }
7
Capacity Units Customers provision one or more Database Accounts A database account can be configured with one to five Capacity Units (CUs). Call for more. A CU is a reserved unit of storage (in GB) and throughput (in Request Units RU) Reserved storage is allocated automatically but subject to a minimum allocation per collection of 3.3GB (1/3 of a CU) and a maximum amount stored per collection of 10GB (1 whole CU) Reserved throughput is automatically made available, in equal amounts, to all collections within the account subject to min/max of 667 RUs (1/3 of a CU) and 2000 RUs (1 whole CU) Throughput consumption levels above provisioned units are throttled Throughput RUs Storage GB Provisioned capacity units * All limits noted above are the Preview Limitations. Subject to change
8
Request Units A CU includes ability to execute up to 2000 Request Units per Second I.e. With 1 CU peak throughput needs to be below 2000 RUs/sec When reserved throughput is exceeded, any subsequent request will be pre-emptively ended Server will respond with HTTP status code 429 Server response includes x-ms-retry-after-ms header to indicate the amount of time the client must wait before retrying.NET client SDK implicitly catches this response, respects the retry-after header and retries the request (3x) You can setup alert rules in the Azure portal to be notified when requests are throttled
9
Request Units DATABASE OPERATIONSNUMBER OF RUsNUMBER OP/s/CU Reading single document by _self12000 Inserting/Replacing/Deleting a single document4500 Query a collection with a simple predicate and returning a single document 21000 Stored Procedure with 50 document inserts10020 Rough estimates: Document size is 1KB consisting of 10 unique property values with the default consistency level is set to “Session” and all of the documents automatically indexed by DocumentDB. As long as the Database stays the same the RUs consumed should stay the same
11
Demo Database in Document DB Studio Queries with different RU results 11
12
Cool Tool: Document DB Studio Useful tool with source for sending queries to Document DB. 12 http://tiny.cc/docdbstudio
14
LET’S CALL A SPADE A SPADE
15
Indexing in DocumentDB By default everything is indexed Indexes are schema free Indexing is not a B-Tree and works really well under write pressure and at scale. Out of the Box. It Just Works. But… … it cannot read your mind all of the time… 15
16
Tuning Indexes 16 We can change the way that DocumentDB indexes We’re trading off Write Performance How long does it take? How many RUs does it use? Read Performance How long does it take? How many RUs does it use? Which queries will need a scan? Storage How much space does the document + index require? Complexity and Flexibility Moving away from the pure schema-free model
17
Index Policy and Mode Index Policy Defines index rules for that collection Index mode Consistent Lazy Automatic True: Documents automatically added (based on policy) False: Documents must be manually added via IndexingDirective on document PUT. Anything not indexed can only be retrieved via _self link (GET) 17 var collection = new DocumentCollection { Id = “myCollection” }; collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy; collection.IndexingPolicy.Automatic = false; collection = await client.CreateDocumentCollectionAsync (databaseLink, collection);
18
Index Paths & Index Types Include/Exclude Paths Include a specific path Exclude sub paths Exclude a specific path Specify Index Type Hash (default) Range (default for _ts) not on strings Specify Precision Byte precision (1-7) Affects storage overhead 18 collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Hash, Path = "/", }); collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Range, Path = @"/"“modifiedTimeStamp""/?", NumericPrecision = 7 }); collection.IndexingPolicy.ExcludedPaths.Add("/\“longHTML\"/*");
20
Demo Three collections One has default indexing One has mode set to Lazy One has string precision on “/” set to 1 How does that affect Write RUs 20
22
IT’S LESS ABOUT BUILDING AND MORE ABOUT BOLTING
23
Worth Reading: NoSQL Distilled By Martin Fowler of ‘Design Patterns’ fame and fortune Provides a good background on characteristics of NoSQL style data stores and strategies for combining multiple stores. 23 http://tiny.cc/fowler-pp
24
24 DocumentDB transactional processing rich query managed as a service elastic scale internet accessible http/rest schema-free data model arbitrary data formats
25
Attachments Store large blobs/media outside core storage Document DB managed Submit raw content in POST Document DB stores into Azure Blob storage (2GB today) Document DB manages lifecycle Self managed Store content in service of your choice Create Attachment providing URL to content 25
27
Demo Show managed attachment Lifecycle Follows Document 27
28
Storage Strategies Things to think about How much storage do I use; where? $$$? How is my data being indexed? Entropy & Precision Will it ever be queried? Should I exclude it? How many network calls to; save & retrieve Complexity of implementation & management Consistency. The Polyglot isn’t consistent 28
29
Embed (De-Normalize) or Reference? 29 { "Products":[ { "id":"BK-M18S", "ProductCode":"BK-M18S", "ProductType":"Mountain-500", "Manufacturer":{ "Name":"Adventure Works", "Website":"www.adventureworks.com", } ] } { "Products":[ { "id":"BK-M18S", "ProductCode":"BK-M18S", "ProductType":"Mountain-500", "Manufacturer":"ADVWKS" } ], "Manufacturers":[ { "id":"ADVWKS", "Name":"Adventure Works", "Website":"www.adventureworks.com", } ] }
30
Embed (De-Normalize) or Reference? Embed Well suited to containment Typically bounded 1:Few Slowly changing data M:N Requires management of duplicates One call to read all data Write call must write whole document 30 Reference Think of this as 3NF Provides M:N without duplicates Allows unbounded 1:N Multiple calls to read all data (hold that thought…) Write call may write single referenced document
31
How Do We Relate? ID or _self A matter of taste. _self will be more efficient (half as many RUs or better) Direction Manufacturer > Product. 1:N We have to update manufacturer every time we add a new product Products are unbounded Product > Manufacturer N:1 We have to update product if manufacturer changes Manufacturers per product are bounded (1) Sometimes both makes sense. 31
32
The Canonical Polyglot Online Store
33
A Product Catalog Product Name (String 100) SKU (String 100 YYYYCCCNNNNN e.g. ‘2013MTB13435’) Description (HTML up to 8kb) Manufacturer (String 100) Price (Amount + Currency) Images (0-N Images Up to 100kb) ProductSizes (0-N including a sort order) Reviews (0-N reviews, Reviewer + Up to 10kb text) Attributes (0-N strongly typed complex details) 33 Probably want to search Hash index is fine May duplicate into Azure Search Probably a core lookup field. Needs a hash index. How to we manage precision? We could store reversed? We could store a duplicate reversed and include/exclude. We might want to pull Year out into another field and range index. A sub document within DocumentDB will allow multiple base currencies. Probably doesn’t change much so de- normalize the currency identifier We probably want price in Search….but… If we are providing localized prices then have consistency issues; huge churn when we change exchange rates Attachments Do we embed these? Do we reference? On product? On reviewer/user? Both? Do we reference and embed? Say embed last 10? Which direction does the reference go? Almost certainly push to search. How deep does the rabbit hole go? Probably want to index in Azure Search Do we ‘save space’ and push to an attachment? Do we often retrieve Product without description? We probably do want to exclude it from the index
35
The Promise of Schema Free Fully indexed complex type structures Ability to define schema independent of data store Reflect for editing and complex search filters Create templates to produce HTML from JSON for editing and rendering http://www.mchem.co.nz/msds/Tutti%20Frutti%20Disinfectant.pdf http://www.toxinz.com/Demo 35
37
Programmability in DocumentDB Familiar constructs Stored procs, UDFs, triggers Transactional Each call to the service is in ACID txn Uncaught exception to rollback Sandboxed No imports No network calls No Eval() Resource governed & time bound 37 var helloWorldStoredProc = { id: "helloRealWorld", body: function () { var context = getContext(); var response = context.getResponse(); response.setBody("Hello, Welcome To The Real World"); response.setBody("Here Be Dragons..."); response.setBody("Oh... and network latency"); }
39
Demo User Defined Function Trigger Stored Procedure 39
40
Where To Use Programmability Reduce Network Calls Send multiple documents & shred in a SPROC Multi-Document Transactions Each call in ACID txn No multi-statement txns One REST call = One txn 40 Transform & Join Pull content from multiple docs. Perform calculations JOIN operator intradoc only Drive lazy processes Write journal entries and process later
42
Worth Reading: Replicated Data Consistency Explained Through Baseball By Doug Terry MS Research 42 http://tiny.cc/cons-baseball
43
Tuning Consistency Database Accounts are configured with a default consistency level. Consistency level can be weakened per read/query request Four consistency levels STRONG – all writes are visible to all readers. Writes committed by a majority quorum of replicas and reads are acknowledged by the majority read quorum BOUNDED STALENESS – guaranteed ordering of writes, reads adhere to minimum freshness. Writes are propagated asynchronously, reads are acknowledged by majority quorum lagging writes by at most N seconds or operations (configurable) SESSION (Default) – read your own writes. Writes are propagated asynchronously while reads for a session are issued against the single replica that can serve the requested version. EVENTUAL – reads eventually converge with writes. Writes are propagated asynchronously while reads can be acknowledged by any replica. Readers may view older data then previously observed. 43 WritesReads Strongsync quorum writes quorum reads Boundedasync replication quorum reads Session*async replication session bound replica Eventualasync replication any replica
45
Document DB is a preview service… expect and enjoy change over time Think outside the relational model… … if what you really want is an RDBMS then use one of those… 45
48
Thanks! Don’t forget to complete your evaluations aka.ms/mytechedmel
49
Demo Title
56
Slide palette info
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.