Download presentation
Presentation is loading. Please wait.
Published byRosalind Harrison Modified over 8 years ago
1
in a Document-Oriented NoSQL Database { "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }
7
NoSQL is buzzword NoSQL is varied Key-value Wide-column Document-oriented Graph
8
Document stores contain data objects that are inherently hierarchical, tree-like structures (most notably JSON). Built for scale and performance Great for: Hierarchical Trees, Logging, Telemetry
9
{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents
10
Not these documents
11
{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
13
ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English
14
ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English Lenovo Thinkpad X1 Carbon???
17
ItemAuthorPagesLanguageProcessorMemoryStorage Harry Potter and the Sorcerer’s Stone J.K. Rowling 309English??? Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English??? Lenovo Thinkpad X1 Carbon ??? Core i7 3.3ghz 8 GB256 GB SSD
18
ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English ItemCPUMemoryStorage Lenovo Thinkpad X1 CarbonCore i7 3.3ghz8 GB256 GB SSD
19
ProductIdItem 1Harry Potter and the Sorcerer’s Stone 2Game of Thrones: A Song of Ice and Fire 3Lenovo Thinkpad X1 Carbon ProductIdAttributeValue 1AuthorJ.K. Rowling 1Pages309 … 2AuthorGeorge R.R. Martin 2Pages864 … 3ProcessorCore i7 3.3ghz 3Memory8 GB …
23
Come as you are Data normalization ORM
24
Modeling data, the relational way
25
Modeling data, the document way
26
To embed, or to reference, that is the question embedreference
27
To embed, or to reference, that is the question Data from entities are queried together
28
To embed, or to reference, that is the question Data from entities are queried together
29
To embed, or to reference, that is the question Data from entities are queried together { id: "book1", covers: [ {type: "front", artworkUrl: "http://..."}, {type: "back", artworkUrl: "http://..."} ], index: "", chapters: [ {id: 1, synopsis: "", pageCount:24, wordCount:1456}, {id: 2, synopsis: "", pageCount:18, wordCount:960} ] }
30
To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order { id: "order1", customer: "customer1", orderDate: "2014-09-15T23:14:25.7251173Z" lines: [ {product: "13inch screen", price: 200.00, qty: 50 }, {product: "Keyboard", price:23.67, qty:4}, {product: "CPU", price:87.89, qty:1} ] }
31
To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship { id: "person1", name: "Mickey" creditCard: { number: "**** **** **** 4794", expiry: "06/2019", cvv: "868", type: "Mastercard" }
32
To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility { id: "person1", name: "Mickey", contactInfo: [ {email: "mickey@disney.com"}, {mobile: "+1 555-5555"}, {twitter: "@MickeyMouse"} ] }
33
To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility The set of values or sub-documents is bounded (1:few) { id: "task1", desc: "deliver an awesome presentation @ #sqlbits", categories: [ "conference", "talk", "workshop", “databases“ ] }
34
To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility The set of values or sub-documents is bounded (1:few) Typically denormalized data models provide better read performance
35
To embed, or to reference, that is the question one-to-many relationships (unbounded) { id: "post1", author: "Mickey Mouse", tags: [ "fun", "cloud", "develop"] } {id: "c1", postId: "post1", comment: "Coolest blog post"} {id: "c2", postId: "post1", comment: "Loved this post, awesome"} {id: "c3", postId: "post1", comment: "This is rad!"} … {id: "c10000", postId: "post1", comment: "You are the coolest cartoon character"} … {id: "c2000000", postId: "post1", comment: "Are we still commenting on this blog?"}
36
To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships { id: "book1", name: "100 Secrets of Disneyland" } { id: "book2", name: "The best places to eat @ Disney" } { author-id: "author1", book-id: "book1" } { author-id: "author2", book-id: "book1" } { id: "author1", name: "Mickey Mouse" } { id: "author2", name: "Donald Duck" } Look familiar? It should …. It's the "relational" way
37
To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships { id: "book1", name: "100 Secrets of Disneyland", authors: ["author1", "author2"] } { id: "book2", name: "The best places to eat @ Disney”, authors: ["author1"] } { id: "author1", name: "Mickey Mouse", books: ["book1", "book2"] } { id: "author2", name: "Donald Duck" books: ["book1"] }
38
To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others { id: "1", author: "Mickey Mouse", stocks: ["dis", "msft"] } { id: "dis", opening: "52.09", numerOfTrades: 10000, trades: [{qty:57, price: 53.97}, {qty:5, price: 54.01}] }
39
To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others Normalized data models can require more round trips to the server. Typically normalizing provides better write performance.
40
Publisher document: { id: "mspress", name: "Microsoft Press", books: [ 1, 2,... ] } Book documents: {id: 1, name: "DocumentDB 101" } {id: 2, name: "DocumentDB for RDBMS Users" }
41
Publisher document: { id: "mspress", name: "Microsoft Press", } Book documents: {id: 1, name: "DocumentDB 101", pub-id: "mspress"} {id: 2, name: "DocumentDB for RDBMS Users", pub-id: "mspress"}
44
{ "id": "product1", "type": "product", "name": "Microsoft Band 2 – Medium", "price": "174.99", "summary": "Continuous heart rate monitor tracks heart rate...", "images": [ {"image1": "http://..."}, { "image2": "http://..."} ], "reviews": { "averageStars": 4, "reviewCount": 313 }
45
{ "id": "product1", "type": "reviewSummary", "reviewBreakdown: [ {5:24},{4:10},{3:3},{2:0},{1:4} ], "topReview": { "rating": 4, "title": "More comfortable than Band 1: But New Size Scale!", "snippet": "I've been wearing the first Band since it…", "fullReviewLink": "http://..." }
47
{ {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } SELECT manager FROM org WHERE id = "Susan" To get the manager of any employee is trivial - Jill BenSusan Sven Andrew Thomas
48
SELECT * FROM org WHERE manager = "Jill" To get all employees where Jill is the manager is also easy - { {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } Jill BenSusan Sven Andrew Thomas
49
{ {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } SELECT * FROM org WHERE id = "Jill" To get all direct reports for Jill is easy - Jill BenSusan Sven Andrew Thomas
50
SELECT * FROM emp WHERE ARRAY_CONTAINS(emp.directs, "Ben") To find the manager for an employee is possible - { {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } Jill BenSusan Sven Andrew Thomas
52
{ id: "CDC101", title: "Fundamentals of database design", credits: 10 } }
53
{ id: "CDC101", title: “The Fundamentals of Database Design", titleWords: ["database","design","database design"], credits: 10 } Consider using a RegEx to transform words to lowercase and remove punctuation. Strip out stop words like “to”, “the”, “of” etc. Denormalize keywords in to key phrases SELECT books.title FROM books WHERE ARRAY_CONTAINS(books.titleWords, "database")
56
{ id: "", timestamp: "...", reading: 123 }
57
{ id: "...", timestampMinute: "...", readings: [ {minute:0, reading:123}, {minute:1, reading:456},... {minute:59,reading:999} ] }
59
{ id: "...", timestamp: "...", logData: {attr1: value1, attr2: value2,...} }
62
{ type: "book", bookId: "book1", authors: [authorId:1, authorId:2]... } { type: "author", authorId: 1, authorName: "Andrew"... } SELECT b.* FROM b WHERE b.type="book"
63
{ type: "book", bookId: "book1", authors: [authorId:1, authorId:2]... } { type: "author", authorId: 1, authorName: "Andrew"... } SELECT b.* FROM b WHERE ARRAY_CONTAINS(b.authorId,1 ) OR b.authorId = 1
67
{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }
68
{ "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } { "id: "contact_xyz", "userid": "xyz", "email" : "user@user.com" "phone" : "555 5555" } Normalizing typically provides better write performance
69
No magic bullet Think about how your data is going to be written, read and model accordingly { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }
70
Understand the access patterns on your database Read/Write Ratio Top Queries, Sprocs, and CRUD operations The life-cycle of the data and growth rate of documents Use built-in properties Use Id (id) to enforce uniqueness constraint and efficient querying Use TTL (ttl) to prune out old data Use Timestamp (_ts) for checking for incremental changes Use ETag (_etag) for optimistic concurrency and cache refresh semantics
71
{ "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.