An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013
Slide 2 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. MarkLogic SEARCHDATABASE APPLICATION SERVICES
Slide 3 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Data ≠ Information
Slide 4 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Data + Context = Information
Slide 5 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Dynamic Semantic Publishing BBC Sports Size and Complexity: # of athletes # of teams # of assets (match reports, statistics, etc.) # of relations (facts) Rich user experience See information in context Personalize content Easy navigation Intelligently serve ads (outside of UK) Manageable Static pages? Too many, changing too fast Limited number of journalists Automate as much as possible The ChallengeGoals
Slide 6 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Dynamic Semantic Publishing A Solution Store, manage documents Stories Blogs Feeds Profiles Store, manage values Statistics Full-Text search Performance, scalability Robustness Metadata about documents Tagged by journalists Added (semi- )automatically Inferred Facts reported by journalists Linked Open Data for real-world facts XML DatabaseTriple Store
Slide 7 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. played in plays in plays for Dynamic Semantic Publishing Understanding Data
Slide 8 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Dynamic Semantic Publishing Scaling Up
Slide 9 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. What is RDF? :has-child :has-parent :birth-place :spouse :birth-place :has-child :has-parent :person20 :person5 :place5 :first-name :person4 “John”
Slide 10 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. What is RDF? Schema-less Triple granularity Open world assumption Joins - the cost of granularity RDF
Slide 11 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" What is Semantics?
Slide 12 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Rules tell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England" What is Semantics?
Slide 13 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Rules tell us something about the triples What is Semantics? "John Smith" "England" livesIn "London" isIn livesIn
Slide 14 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Why use RDF? Born or extracted to RDF Denormalize into XML by default Lift data into RDF if you need to: combine it with disparate data sources navigate it like a graph use it for relationships or taxonomy expose it as RDF to end users RDF
Slide 15 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Semantics Architecture TRIPLE XQY XSLT SQLSPARQL GRAPH SPARQL
Slide 16 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Triple Index 3 triple orders Cached for performance Works seamlessly with other indexes Security 150 bytes per triple on disk Billions of triples per host Scaling out horizontally TRIPLE
Slide 17 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. RDF Loading RDF
Slide 18 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Triples Embedded in Documents … <sem:object datatype=" Lawford …
Slide 19 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Content, Data, and Semantics Suspicious vehicle… Suspicious vehicle near airport Z observation/surveillance suspicious activity suspicious vehicle IRIID IRIID isa value license-plate ABC 123 A blue van… A blue van with license plate ABC 123 was observed parked behind the airport sign…
Slide 20 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Content, Data, and Semantics Suspicious vehicle… Z suspicious activity suspicious vehicle A blue van… IRIID isa value license-plate ABC 123 observation/surveillance Semant ic ( RDF ) Triple s Unstructure d full - text Geosp atial Dat a
Slide 21 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. RDF Values “string value”^^xs:string “987”^^xs:double “ ”^^xs:date _:blank1 “simple”
Slide 22 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Datatype Mapping DatatypeSPARQLXQuery Typed Literal “ ”^^xs:datexs:date(“ ”) IRI sem:iri(“ example.com”) Blank Node _:blank1 sem:blank(“…”) Simple Literal “simple”xs:string(“simple”) Language Tagged Literal rdf:langString(“bonjour”, “fr”)
Slide 23 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. SPARQL Executed using the triple index SPARQL much of SPARQL 1.1 Cost-based optimization Join ordering and algorithms select * where { ?person :birth-place ?place; :first-name “John” } SPARQL
Slide 24 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Executing SPARQL sem:sparql(“ prefix : select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”) )
Slide 25 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Returning Binding Solutions select * where { ?person :birth-place :place5 } select * where { ?person :birth-place ?place; :first-name “John” }
Slide 26 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Solution Results personplace :person22:place13 :person4:place5
Slide 27 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. SPARQL Query Results XML Format sem:query-result-serialize( sem:sparql(“select * { … }”), “xml” )
Slide 28 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Returning Triples describe :person4 construct { ?bp :uses-name ?fn } where { ?person :birth-place ?bp; :first-name ?fn }
Slide 29 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Triple Results :place0 :uses-name “Ethel”, “Jeffrey”, “Kara”. :place1 :uses-name “Edward”, “James”. :place10 :uses-name “Robert”, “Sheila”, “Stephen”.
Slide 30 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Querying Named Graphs select * from where { ?s ?p ?o } select * where { graph { ?s ?p ?o }
Slide 31 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Restricting The Datasets let $options := “properties” let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) ) return sem:sparql(“…”,(),(), $options,$query)
Slide 32 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Creating Triples sem:triple() sem:rdf-parse() sem:rdf-get() sem:rdf-builder() sem:rdf-load() sem:rdf-insert() Returning sem:triple valuesInserting to a database
Slide 33 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Graph Store API declare function graph-insert( $graphname as sem:iri, $triples as sem:triple*, [$permissions as element(sec:permission)*, $collections as xs:string*, $quality as xs:int?, $forest-ids as xs:unsignedLong*] ) as xs:string*; declare function graph-delete( $graphname as sem:iri ) as empty-sequence();
Slide 34 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Conclusion Semantics can enhance your data- oriented and search applications. XQuery and SPARQL work well together. A combination RDF and XML database simplifies working with the technologies together. Try MarkLogic 7:
Slide 35 Copyright © 2013 MarkLogic ® Corporation. All rights reserved. Any Questions?