Presentation is loading. Please wait.

Presentation is loading. Please wait.

In a Document-Oriented NoSQL Database { "name": "Andrew Liu", " ": "twitter": }

Similar presentations


Presentation on theme: "In a Document-Oriented NoSQL Database { "name": "Andrew Liu", " ": "twitter": }"— Presentation transcript:

1 in a Document-Oriented NoSQL Database { "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }

2

3

4

5

6

7 NoSQL is buzzword NoSQL is varied Key-value Wide-column Document-oriented Graph

8 Document stores contain data objects that are inherently hierarchical, tree-like structures (most notably JSON). Built for scale and performance Great for: Hierarchical Trees, Logging, Telemetry

9 { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents

10 Not these documents

11 { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale

12

13 ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English

14 ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English Lenovo Thinkpad X1 Carbon???

15

16

17 ItemAuthorPagesLanguageProcessorMemoryStorage Harry Potter and the Sorcerer’s Stone J.K. Rowling 309English??? Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English??? Lenovo Thinkpad X1 Carbon ??? Core i7 3.3ghz 8 GB256 GB SSD

18 ItemAuthorPagesLanguage Harry Potter and the Sorcerer’s Stone J.K. Rowling309English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864English ItemCPUMemoryStorage Lenovo Thinkpad X1 CarbonCore i7 3.3ghz8 GB256 GB SSD

19 ProductIdItem 1Harry Potter and the Sorcerer’s Stone 2Game of Thrones: A Song of Ice and Fire 3Lenovo Thinkpad X1 Carbon ProductIdAttributeValue 1AuthorJ.K. Rowling 1Pages309 … 2AuthorGeorge R.R. Martin 2Pages864 … 3ProcessorCore i7 3.3ghz 3Memory8 GB …

20

21

22

23 Come as you are Data normalization ORM

24 Modeling data, the relational way

25 Modeling data, the document way

26 To embed, or to reference, that is the question embedreference

27 To embed, or to reference, that is the question Data from entities are queried together

28 To embed, or to reference, that is the question Data from entities are queried together

29 To embed, or to reference, that is the question Data from entities are queried together { id: "book1", covers: [ {type: "front", artworkUrl: "http://..."}, {type: "back", artworkUrl: "http://..."} ], index: "", chapters: [ {id: 1, synopsis: "", pageCount:24, wordCount:1456}, {id: 2, synopsis: "", pageCount:18, wordCount:960} ] }

30 To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order { id: "order1", customer: "customer1", orderDate: "2014-09-15T23:14:25.7251173Z" lines: [ {product: "13inch screen", price: 200.00, qty: 50 }, {product: "Keyboard", price:23.67, qty:4}, {product: "CPU", price:87.89, qty:1} ] }

31 To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship { id: "person1", name: "Mickey" creditCard: { number: "**** **** **** 4794", expiry: "06/2019", cvv: "868", type: "Mastercard" }

32 To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility { id: "person1", name: "Mickey", contactInfo: [ {email: "mickey@disney.com"}, {mobile: "+1 555-5555"}, {twitter: "@MickeyMouse"} ] }

33 To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility The set of values or sub-documents is bounded (1:few) { id: "task1", desc: "deliver an awesome presentation @ #sqlbits", categories: [ "conference", "talk", "workshop", “databases“ ] }

34 To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility The set of values or sub-documents is bounded (1:few) Typically denormalized data models provide better read performance

35 To embed, or to reference, that is the question one-to-many relationships (unbounded) { id: "post1", author: "Mickey Mouse", tags: [ "fun", "cloud", "develop"] } {id: "c1", postId: "post1", comment: "Coolest blog post"} {id: "c2", postId: "post1", comment: "Loved this post, awesome"} {id: "c3", postId: "post1", comment: "This is rad!"} … {id: "c10000", postId: "post1", comment: "You are the coolest cartoon character"} … {id: "c2000000", postId: "post1", comment: "Are we still commenting on this blog?"}

36 To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships { id: "book1", name: "100 Secrets of Disneyland" } { id: "book2", name: "The best places to eat @ Disney" } { author-id: "author1", book-id: "book1" } { author-id: "author2", book-id: "book1" } { id: "author1", name: "Mickey Mouse" } { id: "author2", name: "Donald Duck" } Look familiar? It should …. It's the "relational" way

37 To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships { id: "book1", name: "100 Secrets of Disneyland", authors: ["author1", "author2"] } { id: "book2", name: "The best places to eat @ Disney”, authors: ["author1"] } { id: "author1", name: "Mickey Mouse", books: ["book1", "book2"] } { id: "author2", name: "Donald Duck" books: ["book1"] }

38 To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others { id: "1", author: "Mickey Mouse", stocks: ["dis", "msft"] } { id: "dis", opening: "52.09", numerOfTrades: 10000, trades: [{qty:57, price: 53.97}, {qty:5, price: 54.01}] }

39 To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others Normalized data models can require more round trips to the server. Typically normalizing provides better write performance.

40 Publisher document: { id: "mspress", name: "Microsoft Press", books: [ 1, 2,... ] } Book documents: {id: 1, name: "DocumentDB 101" } {id: 2, name: "DocumentDB for RDBMS Users" }

41 Publisher document: { id: "mspress", name: "Microsoft Press", } Book documents: {id: 1, name: "DocumentDB 101", pub-id: "mspress"} {id: 2, name: "DocumentDB for RDBMS Users", pub-id: "mspress"}

42

43

44 { "id": "product1", "type": "product", "name": "Microsoft Band 2 – Medium", "price": "174.99", "summary": "Continuous heart rate monitor tracks heart rate...", "images": [ {"image1": "http://..."}, { "image2": "http://..."} ], "reviews": { "averageStars": 4, "reviewCount": 313 }

45 { "id": "product1", "type": "reviewSummary", "reviewBreakdown: [ {5:24},{4:10},{3:3},{2:0},{1:4} ], "topReview": { "rating": 4, "title": "More comfortable than Band 1: But New Size Scale!", "snippet": "I've been wearing the first Band since it…", "fullReviewLink": "http://..." }

46

47 { {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } SELECT manager FROM org WHERE id = "Susan" To get the manager of any employee is trivial - Jill BenSusan Sven Andrew Thomas

48 SELECT * FROM org WHERE manager = "Jill" To get all employees where Jill is the manager is also easy - { {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } Jill BenSusan Sven Andrew Thomas

49 { {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } SELECT * FROM org WHERE id = "Jill" To get all direct reports for Jill is easy - Jill BenSusan Sven Andrew Thomas

50 SELECT * FROM emp WHERE ARRAY_CONTAINS(emp.directs, "Ben") To find the manager for an employee is possible - { {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } Jill BenSusan Sven Andrew Thomas

51

52 { id: "CDC101", title: "Fundamentals of database design", credits: 10 } }

53 { id: "CDC101", title: “The Fundamentals of Database Design", titleWords: ["database","design","database design"], credits: 10 } Consider using a RegEx to transform words to lowercase and remove punctuation. Strip out stop words like “to”, “the”, “of” etc. Denormalize keywords in to key phrases SELECT books.title FROM books WHERE ARRAY_CONTAINS(books.titleWords, "database")

54

55

56 { id: "", timestamp: "...", reading: 123 }

57 { id: "...", timestampMinute: "...", readings: [ {minute:0, reading:123}, {minute:1, reading:456},... {minute:59,reading:999} ] }

58

59 { id: "...", timestamp: "...", logData: {attr1: value1, attr2: value2,...} }

60

61

62 { type: "book", bookId: "book1", authors: [authorId:1, authorId:2]... } { type: "author", authorId: 1, authorName: "Andrew"... } SELECT b.* FROM b WHERE b.type="book"

63 { type: "book", bookId: "book1", authors: [authorId:1, authorId:2]... } { type: "author", authorId: 1, authorName: "Andrew"... } SELECT b.* FROM b WHERE ARRAY_CONTAINS(b.authorId,1 ) OR b.authorId = 1

64

65

66

67 { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

68 { "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } { "id: "contact_xyz", "userid": "xyz", "email" : "user@user.com" "phone" : "555 5555" } Normalizing typically provides better write performance

69 No magic bullet Think about how your data is going to be written, read and model accordingly { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }

70 Understand the access patterns on your database Read/Write Ratio Top Queries, Sprocs, and CRUD operations The life-cycle of the data and growth rate of documents Use built-in properties Use Id (id) to enforce uniqueness constraint and efficient querying Use TTL (ttl) to prune out old data Use Timestamp (_ts) for checking for incremental changes Use ETag (_etag) for optimistic concurrency and cache refresh semantics

71 { "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }


Download ppt "In a Document-Oriented NoSQL Database { "name": "Andrew Liu", " ": "twitter": }"

Similar presentations


Ads by Google