Download presentation
Presentation is loading. Please wait.
1
Copyright © 2008 Mark Logic Corporation. All rights reserved.1 Unlock Content™ Copyright © 2008 Mark Logic Corporation. All rights reserved.1 MarkLogic Server: Under The Hood Mary Holstege Principal Engineer
2
Copyright © 2008 Mark Logic Corporation. All rights reserved.2 MarkLogic Server XML Server Special-purpose DBMS for XML Semi-structured Hierarchical Designed for 100s of TB of XML
3
Copyright © 2008 Mark Logic Corporation. All rights reserved.3 How Did We Get Here? Founder: Christopher Lindblad MIT Architect of Ultraseek Server Intranet seach engine product Met people that wanted to use a search engine like a database Rich query language Guaranteed correctness Transactions
4
Copyright © 2008 Mark Logic Corporation. All rights reserved.4 Consider an Application Documents + metadata Documents: rich, variable structure Want: complex full-text search Want: combined text, metadata, structure-aware search Want: granular ad hoc access Want: real-time query How do you build it?
5
Copyright © 2008 Mark Logic Corporation. All rights reserved.5 Two-headed Monster I’m an RDBMSAnswers are right or wrongI like to combine small piecesI allow granular accessLinguistic complexity hurts my brainI guarantee ACID propertiesUpdates are visible right awayI’m a search engineSome answers are better than othersMost pieces of information are largeI can give you the whole documentStructure hurts my brainI’m optimized for sparse dataUpdates are visible… oh, whenever
6
Copyright © 2008 Mark Logic Corporation. All rights reserved.6 A Different Approach Soul of Search Engine: Data Model And Queries Database: On-disk Organization And Transactions
7
Copyright © 2008 Mark Logic Corporation. All rights reserved.7 Data Model Document Title Author Abstract Section Footer Section Section (cont’d) Metadata
8
Copyright © 2008 Mark Logic Corporation. All rights reserved.8 Data Model A database for XML...... uses the XML Data Model XML is a tree Document Title Author Section First Last Metadata
9
Copyright © 2008 Mark Logic Corporation. All rights reserved.9 Data Model A DBMS for a tree data model XML Data Model is a tree data model A database for XML...... uses the XML Data Model Document Title Author Section First Last Metadata
10
Copyright © 2008 Mark Logic Corporation. All rights reserved.10 Building a DBMS for ~Documents~ What do they look like? What’s the data model
11
Copyright © 2008 Mark Logic Corporation. All rights reserved.11 Example Document MarkLogic Server: The Best Place for XML John Kreisa Where should one put their XML? Mark Logic has the best answer to this question: MarkLogic Server.... This high performance engine can... Using an inverted index technique... Copyright© 2008 Mark Logic Corporation. All rights Reserved.
12
Copyright © 2008 Mark Logic Corporation. All rights reserved.12 What Queries Is It Good At? 1)Full-Text Search Find all documents that contain the phrase “high performance”. 2)XML Structure Find all articles that have an abstract. 3)XML Semantics Find all documents that mention the company “Mark Logic”. 4)All of the above... Find all articles that contain the phrase “high performance” and mention the company Mark Logic in the abstract. at the same time
13
Copyright © 2008 Mark Logic Corporation. All rights reserved.13 1) Full-text Search Find all documents that contain the phrase “high performance” MarkLogic Server: The Best Place for XML John Kreisa Where should one put their XML? Mark Logic has the best answer to this question: MarkLogic Server.... This high performance engine can... Using an inverted index technique... Copyright© 2008 Mark Logic Corporation. All rights Reserved.
14
Copyright © 2008 Mark Logic Corporation. All rights reserved.14 1) Full-text Search very high performance index 1220 1 00 1231 0 11 1240 0 00 1250 1 00 1260 1 10 1271 0 00 1291 1 00 1300 1 11 Find all documents that contain the phrase “high performance”
15
Copyright © 2008 Mark Logic Corporation. All rights reserved.15 1) Full-text Search UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” “very high” “performance index” 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, 212, 219, 377... Find all documents that contain the phrase “high performance”
16
Copyright © 2008 Mark Logic Corporation. All rights reserved.16 2) XML Structure Find all articles that have an abstract MarkLogic Server: The Best Place for XML John Kreisa Where should one put their XML? Mark Logic has the best answer to this question: MarkLogic Server.... This high performance engine can... Using an inverted index technique... Copyright© 2008 Mark Logic Corporation. All rights Reserved.
17
Copyright © 2008 Mark Logic Corporation. All rights reserved.17 2) XML Structure UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” / 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, 212, 219, 377... Find all articles that have an abstract
18
Copyright © 2008 Mark Logic Corporation. All rights reserved.18 3) XML Semantics Find all documents that mention the company “Mark Logic” MarkLogic Server: The Best Place for XML John Kreisa Where should one put their XML? Mark Logic has the best answer to this question: MarkLogic Server.... This high performance engine can... Using an inverted index technique... Copyright© 2008 Mark Logic Corporation. All rights Reserved.
19
Copyright © 2008 Mark Logic Corporation. All rights reserved.19 3) XML Semantics UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” / Mark Logic</ 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, 212, 219, 377... Find all documents that mention the company “Mark Logic”
20
Copyright © 2008 Mark Logic Corporation. All rights reserved.20 4) All Of The Above Find all articles that contain the phrase “high performance” and mention the company “Mark Logic” in the abstract MarkLogic Server: The Best Place for XML John Kreisa Where should one put their XML? Mark Logic has the best answer to this question: MarkLogic Server.... This high performance engine can... Using an inverted index technique... Copyright© 2008 Mark Logic Corporation. All rights Reserved.
21
Copyright © 2008 Mark Logic Corporation. All rights reserved.21 4) All Of The Above UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” / Mark Logic</ 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, 212, 219, 377... Find all articles that contain the phrase “high performance” and mention the company “Mark Logic” in the abstract
22
Copyright © 2008 Mark Logic Corporation. All rights reserved.22 Scalar Indexes UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” / Mark Logic</ 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, … Identify a set of documents based on criteria and then characterize the set with scalar indexes (float, dateTime, string etc.)
23
Copyright © 2008 Mark Logic Corporation. All rights reserved.23 Geospatial, too UNIVERSAL INDEX “very” “high” “performance” “index” “high performance” / Mark Logic</ 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377...... Document References 126, 130, 167, … Just a special kind of scalar index, except values are points and scan operators know about Earth geometry
24
Copyright © 2008 Mark Logic Corporation. All rights reserved.24 How To Use XML Think of XML as the API to the Universal Index To create an index, add some XML To delete an index, delete the XML To update an index, update the XML Each document is a configuration for the indexes Passive use of XML: What indexes do I configure to get the features I need? Active use of XML: How can I create my XML to best make use of the indexes?
25
Copyright © 2008 Mark Logic Corporation. All rights reserved.25 Universal Index Is Our Hammer We turn queries into nails
26
Copyright © 2008 Mark Logic Corporation. All rights reserved.26 Examples Of Nails Directories Exclusive, hierarchical, analogous to file system, map to URI Collections Set-based, N:N relationship Security Invisible to your app
27
Copyright © 2008 Mark Logic Corporation. All rights reserved.27 Many Shapes And Sizes News ArticleBookResearch Report Slide PresentationProduct Sheet Operations Manual
28
Copyright © 2008 Mark Logic Corporation. All rights reserved.28 Load As Is XML is self-describing MarkLogic Server:... John Kreisa.... Mark Logic...... index... Copyright©...
29
Copyright © 2008 Mark Logic Corporation. All rights reserved.29 Load As Is MarkLogic Server:... John Kreisa.... Mark Logic...... index... Copyright©... XML is self-describing MarkLogic Server:... John Kreisa MarkLogic... index...
30
Copyright © 2008 Mark Logic Corporation. All rights reserved.30 Load As Is "MarkLogic Server:..." "John" "Kreisa" "MarkLogic" "... " “... " "... index... " XML is self-describing
31
Copyright © 2008 Mark Logic Corporation. All rights reserved.31 Load As Is "MarkLogic Server:..." "John" "Kreisa" "MarkLogic" "... " “... " "... index... " XML is self-describingNo Schema Needed!
32
Copyright © 2008 Mark Logic Corporation. All rights reserved.32 Degrees Of Flexibility Structure Ad hoc Predefined Queries Ad hocPredefined IMS IDMS Relational Databases Search Engines MarkLogic Server XML Server
33
Copyright © 2008 Mark Logic Corporation. All rights reserved.33 The Query Language XML Universal Index XQuery Full-Text Search XML Structure XML Semantics Application Logic Manipulate XML Render Results Load As Is
34
Copyright © 2008 Mark Logic Corporation. All rights reserved.34 The Programming Language XML Universal Index XQuery Full-Text Search XML Structure XML Semantics Application Logic Manipulate XML Render Results Load As Is
35
Copyright © 2008 Mark Logic Corporation. All rights reserved.35 Simple Application
36
Copyright © 2008 Mark Logic Corporation. All rights reserved.36 Mark Logic Application The first things you notice about search in MarkLogic: 1.Search is granular 2.Search is the beginning of a process, not the end Precision content Contextual Dynamically rendered
37
Copyright © 2008 Mark Logic Corporation. All rights reserved.37 A Different Approach Sould of a Search Engine: Data Model And Queries Database: On-disk Organization And Transactions
38
Copyright © 2008 Mark Logic Corporation. All rights reserved.38 What’s In A Database? No tables No rows forests....... of trees Database Forest 1 Forest 2 Forest 3
39
Copyright © 2008 Mark Logic Corporation. All rights reserved.39 The Cluster Host e 1 Forest 1 Host e k Host d 1 Host d 2 Host d 3 Host d l Forest 2 Forest 3 Forest m Host e 2 Forest 4
40
Copyright © 2008 Mark Logic Corporation. All rights reserved.40 What About Updates? Typical XML document: 10KB – 1MB Referenced by 1,000s to 10,000s of term lists Search engines are bad at updates Many indexes to update Option: Index and Information out of sync Option: Slow We want High throughput Transactions (ACID) So how do we avoid updates?
41
Copyright © 2008 Mark Logic Corporation. All rights reserved.41 Solution: Temporal Database No update! No delete! Only insert and read-at-a-time Every document has two timestamps “created”, “expired”
42
Copyright © 2008 Mark Logic Corporation. All rights reserved.42 Temporal Database 520528 Create a.xml Create b.xml Update a.xml Delete b.xml... Query
43
Copyright © 2008 Mark Logic Corporation. All rights reserved.43 The Cluster Host e 1 Forest 1 Host e k Host d 1 Host d 2 Host d 3 Host d l Forest 2 Forest 3 Forest m Host e 2 Forest 4
44
Copyright © 2008 Mark Logic Corporation. All rights reserved.44 Host A Single Forest Stand 1 Stand 2 Stand n … Buffer Forest k Buffer
45
Copyright © 2008 Mark Logic Corporation. All rights reserved.45 Host 1. Create A New Tree Stand 1 Stand 2 Stand n … Buffer Forest k Buffer
46
Copyright © 2008 Mark Logic Corporation. All rights reserved.46 Host 2. Expire Trees Stand 1 Stand 2 Stand n … Buffer Forest k Buffer
47
Copyright © 2008 Mark Logic Corporation. All rights reserved.47 Host 3. Save A Buffer To Disk Stand 1 Stand 2 Stand n … Buffer Forest k Buffer
48
Copyright © 2008 Mark Logic Corporation. All rights reserved.48 Host 4. Optimization: Merge Stands Buffer Forest k
49
Copyright © 2008 Mark Logic Corporation. All rights reserved.49 The Four Forest Operations 1.Create a new document Into a buffer 2.Mark a document as expired Memory-mapped document timestamps per stand 3.Write buffer out to disk Our buffers are 100s of megabytes For performance, double buffer 4.Merge Background process Optimization: reduces number of stands in forest
50
Copyright © 2008 Mark Logic Corporation. All rights reserved.50 Consistency And Throughput 2-phase commit Transactions span forests Recovery Forest Journals Lock-free queries Use the search engine at a point-in-time Increased throughput Time travel?
51
Copyright © 2008 Mark Logic Corporation. All rights reserved.51 A Different Approach Sould of a Search Engine: Data Model And Queries Database: On-disk Organization And Transactions
52
Copyright © 2008 Mark Logic Corporation. All rights reserved.52 Summary XML as data model Ad hoc schema A search engine core Universal Index Temporal transaction model High throughput while keeping... Performance and scalability of a search engine
53
Copyright © 2008 Mark Logic Corporation. All rights reserved.53 Mary Holstege Principal Engineer mary@marklogic.com t: 650.655.2336 f: 650.655.2310 Thank You
54
Copyright © 2008 Mark Logic Corporation. All rights reserved.54 The Cluster Host e 1 Forest 1 Host e k Host d 1 Host d 2 Host d 3 Host d l Forest 2 Forest 3 Forest m Host e 2 Forest 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.