SPARQLeR: Extended Sparql for Semantic Association Discovery Krzysztof Kochut and Maciej Janik Work supported by the National Science Foundation Grant.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Lecture 24 MAS 714 Hartmut Klauck
Chronos: A Tool for Handling Temporal Ontologies in Protégé
CS162 Week 2 Kyle Dewey. Overview Continuation of Scala Assignment 1 wrap-up Assignment 2a.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
RDF Tutorial.
Database Systems: Design, Implementation, and Management Tenth Edition
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
SPARQL RDF Query.
Introduction to Databases
ISBN Chapter 3 Describing Syntax and Semantics.
SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Learning Ontologies from RDF Annotations Alexandre Delteil, Catherine Faron-Zucker, Rose Dieng ACACIA project, INRIA, 2004 Sophia Antipolis, France.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
File Systems and Databases
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Generating Application Ontologies from Reference Ontologies Marianne Shaw Todd Detwiler Jim Brinkley Dan Suciu University of Washington.
SPARQL Query Rewriting for Implementing Data Integration over Linked Data Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt.
Normal forms for Context-Free Grammars
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Chapter 3A Semantic Web Primer 1 Chapter 3 Querying the Semantic Web Grigoris Antoniou Paul Groth Frank van Harmelen Rinke Hoekstra.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
SPARQL Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology.
Database Systems: Design, Implementation, and Management Ninth Edition
AToM 3 : A Tool for Multi- Formalism and Meta-Modelling Juan de Lara (1,2) Hans Vangheluwe (2) (1) ETS Informática Universidad Autónoma de Madrid Madrid,
OWL 2 Web Ontology Language. Topics Introduction to OWL Usage of OWL Problems with OWL 1 Solutions from OWL 2.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Ontology Query. What is an Ontology Ontologies resemble faceted taxonomies but use richer semantic relationships among terms and attributes, as well as.
Grammars CPSC 5135.
Master Informatique 1 Semantic Technologies Part 7SPARQL 1.1 Werner Nutt.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Semantic Web Programming in Python an Introduction Biju B Jaganath G.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Ontology Architectural Support Options Group Name: MAS WG Source: Catalina Mladin, Lijun Dong, InterDigital Meeting Date: Agenda Item: TBD.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
05/01/2016 SPARQL SPARQL Protocol and RDF Query Language S. Garlatti.
Of 38 lecture 6: rdf – axiomatic semantics and query.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
The International RuleML Symposium on Rule Interchange and Applications Visualization of Proofs in Defeasible Logic Ioannis Avguleas 1, Katerina Gkirtzou.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
Database Design, Application Development, and Administration, 6 th Edition Copyright © 2015 by Michael V. Mannino. All rights reserved. Chapter 5 Understanding.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Database Systems: Design, Implementation, and Management Tenth Edition
The Object-Oriented Database System Manifesto
Formal Language Theory
Logics for Data and Knowledge Representation
File Systems and Databases
Arrays .
Semantic Markup for Semantic Web Tools:
Presentation transcript:

SPARQLeR: Extended Sparql for Semantic Association Discovery Krzysztof Kochut and Maciej Janik Work supported by the National Science Foundation Grant No. IIS , entitled “SemDIS: Discovering Complex Relationships in the Semantic Web”. ESWC 2007, Innsbruck, Austria June 4, 2007

Computer Science Department University of Georgia Paths in RDF Directed path Undirected path Undirected path, but with specific properties and directionality

Computer Science Department University of Georgia Why are paths interesting ? A path describes how entities are related. –Relationships on the path define meaning of this connection. –Entities on the path specify the content. Do you have migraine? Try taking magnesium! –Path discovered by Dr. D.R.Swanson from partial information available in PubMed publications stress can lead to loss of magnesium in the human body migraine patients seem to be experiencing stress … that’s why … migraine could lead to a loss of magnesium, so … take magnesium to fight migraine! Swanson, R.D. Migraine and Magnesium: Eleven Neglected Connections. Perspectives in Biology and Medicine, 31 (4)

Computer Science Department University of Georgia Formally, what is a simple path ? Simple directed path between resources r 0 and r n in a description base R: –sequence r 0 p 1 r 1 p 2 r 2, …, p n-1 r n-1 p n r n (n>0) –r 0 p 1 r 1, r 1 p 2 r 2, …, r n-2 p n-1 r n-1, r n-1 p n r n (n>0) are triples in R. –all of the resources r i (0 ≤i ≤ n) in the path are distinct Simple undirected path between resources r 0 and r n in R: –sequence r 0 p 1 r 1 p 2 r 2, …, p n-1 r n-1 p n r n (n>0) –for each r i-1 p i r i (0 < i ≤ n) in the path, either r i-1 p i r i or r i p i r i-1 is a triple in R –all of the resources r i (0 ≤i ≤ n) in the path are distinct

Computer Science Department University of Georgia Paths and SPARQL SPARQL query can express only static graph patterns. –Some flexibility is introduced by an OPTIONAL part, but it does not solve path problems. No support for flexible length path expressions. –Glycan biosynthesis pathway in biology has a specific pattern (properties), but its length may be unknown. –Path discovery may be of unknown length and pattern, like in Dr. Swanson’s example.

Computer Science Department University of Georgia What we need to discover paths? Knowledge discovery needs more flexible patterns. –Patterns may be partially known or even unknown (unrestricted path). –Properties on the path, their order and directionality create a specific meaning. –Entities on the path provide content. –Relationships to entities outside of the path give an additional context.

Computer Science Department University of Georgia Proposed extensions A path may have a flexible length –For computational reasons, length is limited. Constraints on properties –Specific properties must appear in the path. –Their order and directionality is meaningful. –They can form a repeating pattern. Constraints on resources –Specific resources must be on the path. –They can be anywhere on the path or at specific positions.

Computer Science Department University of Georgia SPARQLeR Extension of SPARQL for semantic association discovery. Seamlessly integrated into the SPARQL syntax. Graph patterns incorporating simple paths with constraints. Constraints are based on regular expressions over properties.

Computer Science Department University of Georgia What is a path in SPARQLeR ? Path is a meta-property that connects two resources. –Defined as a sequence of interleaving properties and resources. –Starts and ends with properties (endpoint resources are not included). –A path of length 1 is a sequence with just one property. Path The class of RDFMS paths.

Computer Science Department University of Georgia Path patterns in SPARQLeR Meta-property – similar concept to a property –Resource –[property]  Resource –Resource –[path]  Resource Path as a Sequence –Test if a resource is in the path: rdfs:member –Test if a resource is at a specific position in the path: rdf:_2, rdf:_4,... SPARQLeR-specific path properties –Test all resources or all properties in the path: rdfms:entityResource and rdfms:propertyResource Example: all resources on a path must be of type foo:Person

Computer Science Department University of Georgia Path pattern anatomy p1p1 p1p1 p1p1 p1p1 p2p2 p2p2 p2p2 p3p3 rdfs:member rdf:_6 p3p3 rdf:_3 rdfs:member p2p length: 4 elements: 7 rdfms:entityResource p1p1 rdfms:propertyResource Path patterns (match of path variable)

Computer Science Department University of Georgia Path types in SPARQLeR Directionality of relationships in the path defines its specific semantics. SPARQLeR allows definition of the following path types –As defined in graph theory Directed Undirected –SPARQLeR specific extension Defined directionality path (includes directed path)

Computer Science Department University of Georgia Directionality of properties in path Defined directionality paths: –Neither directed nor undirected –Each property in a path has a specified directionality Example: simple graph with p relationship (a) X p* Y, directed path (b) X p* Y, undirected path (c) X ( p p -1 )* Y, directional path XY (a)(b)(c) pppp pp pp

Computer Science Department University of Georgia Inverse property operator In standard SPARQL there is no need for inverse property operator –Pattern syntax is based on individual statements, so it is easy to reverse direction. Defining path constraints requires the inverse operator –A pPath expression defines constraints on properties, not on individual statements. –Without the inverse property operator some paths constraints would be impossible to express (as shown in the previous example).

Computer Science Department University of Georgia RegExp in path constraints Path constraints on properties are based on regular expressions –Uses syntax similar to lex –Easy for grep users Examples: a c* da+ (b|c) a [abc] c? d( b a -1 )+ c

Computer Science Department University of Georgia Path constraints in SPARQLeR Defined as regular path expressions –Can specify patterns of properties in the path –Directionality requirement needs the inverse operator  (‘-’ minus) –p Supported regular expressions p (single property) -p (the inverse of p) [p 1 p 2... p n ] (class of properties) -[p 1 p 2... p n ] (class of inverse properties) [^p 1 p 2.. p n ] (complement of properties) -[^p 1 p 2.. p n ] (inverse of complement of properties). (wildcard) x | y (alternative) xy (concatenation) x* (Kleene star); x+ (one or more repetition) (x) (match a path matched by x)

Computer Science Department University of Georgia Path constraints (cont’d) Class of properties and inverse operator –Complement operator can be applied only to defined properties, not their inverses –Inverse operator Not allowed inside class of properties Inverses set created from defined properties –Example: properties: q r s t [^rt]  q s –[^qr]  t -1 s -1 (inverses) ([^st] | –[^t])  q r q -1 r -1 s -1

Computer Science Department University of Georgia Integrating paths into SPARQL Path variable binds a path –Name begins with ‘%’ instead of ‘?’ Simple patterns – path between two resources SELECT ?prop WHERE { ?prop } SELECT %path WHERE { %path } Single source path SELECT %path, ?res WHERE { %path ?res}

Computer Science Department University of Georgia Integrating paths into SPARQL Resources on the path SELECT %path WHERE { %path. %path rdfs:member } SELECT %path WHERE { %path. %path rdf:_1 } Listing path elements – list operator SELECT list(%path) WHERE { %path }

Computer Science Department University of Georgia Expressing path constraints Bounded path length –only constants allowed FILTER(length(%path)<5) FILTER(length(%path)>3 && length(%path)<7)

Computer Science Department University of Georgia Expressing path constraints Constraints added as a regular expression filter (existing syntax in SPARQL) regex( pathvariable, pathexpr, pathflags ) FILTER(regex(%path,”.*foo:prop.*”,”uis”)) –Flags: i (instances) s (schema) l (literals) h (match using hierarchy) d (set directionality) u (undirected) –Default flags: d i

Computer Science Department University of Georgia Some examples SELECT list(%path), ?res WHERE { %path ?res. %path rdfs:member ?x. ?x foo:locatedIn wiki:Europe FILTER(regex(%path,”foo:prop+”)} SELECT list(%path) WHERE { %path. %path rdfms:entityResource ?x. ?x rdf:type foo:Person FILTER(regex(%path,”(foo:prop|foo:rel)+”,”u”)} SELECT list(%path) WHERE { %path FILTER(length(%path) =4 && regex(%path,”(foo:prop -foo:rel)+”)}

Computer Science Department University of Georgia SPARQLeR Prototype Implementation Prototype implementation is based on BRAHMS – RDF/S main memory storage. Path search based on a bi-directional BFS for simple paths. Checking of path constraints in regex is implemented as a simulation of DFAs. Janik, M. and Kochut, K., BRAHMS: A WorkBench RDF Store And High Performance Memory System for Semantic Association Discovery. ISWC 2005

Computer Science Department University of Georgia Implementation details Each path expression ( FILTER regex ) is translated into a DFA. –For path between two resources, partial constraints are checked while building the search trie from both endpoints – forward and reverse DFAs –When a path is connected, the forward DFA used to check the full (path) constraint.

Computer Science Department University of Georgia Experiments: biology pathway Biosynthesis paths in biology (glycomics) How specific glyco peptide is created from a basic structure? –Find pathway between dolichol phosphate and glyco peptide G00009 Path has 15 reactions (30 hops, as each reaction is represented by its substrates and products) Only undirected path connects the endpoint resources, but a specific directionality pattern is present RDF representation: sample reactions in the path

Computer Science Department University of Georgia Experiments : biology pathway Functionality test - proof of concept N-glycan biosynthesis pathway SELECT list(%path) WHERE { glyco:dolichol_phosphate %path glyco:glyco_peptide_G %path rdfs:member enzyo:R05969 FILTER ( length(%path) <= 30 && regex(%path, "((-glyco:has_acceptor_substrate| -glyco:has_reactant) glyco:has_product)*" ) ) } Ontology:GlycO Length:30 hops Consists of:15 reactions Search time:milliseconds (less than 1 tick)... courtesy of Dr. Alison Vandersall-Nairn, University of Georgia

Computer Science Department University of Georgia Experiments Scalability –Modified DBLP datasets in RDF (added random citations) –Test on increasing dataset (adding older years of publications) –Search for cited publications (transitive) PREFIX opus: SELECT ?end_publication WHERE { %path ?end_publication FILTER ( length(%path)<=26 && regex(%path, "(opus:cites_publication)*" ) ) } B. Aleman-Meza et. al. Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection. (WWW2006)

Computer Science Department University of Georgia Experiments – dataset characteristics

Computer Science Department University of Georgia Experiments – results: single source paths Search paths up to length 26

Computer Science Department University of Georgia Experiments – results: two endpoint paths

Computer Science Department University of Georgia More complex uses of path expressions Discover connecting paths with a shared node –Path between A and B, length up to 4 –Path between C and D, length up to 4 –Both paths have a shared resource A B C D C %path_2 D length(%path_2) <= 4 A %path_1 B length(%path_1) <= 4 %path_1 rdfs:member ?x %path_2 rdfs:member ?x ?x Potential subgraph discovery

Computer Science Department University of Georgia SPARQLeR summary Path expressions –use of regular expressions over properties Flexible path specification –Undirected –Defined directionality paths Directed –Length restricted Complex path patterns –Test of resources and properties on the path –Intersecting paths

Computer Science Department University of Georgia Conclusion and future work SPARQLeR extension fits seamlessly into the current SPARQL syntax. Performance of path queries is acceptable (if defined expression is highly selective). Optimization of path queries, complex expressions and multiple paths in query. Inclusion of context.

Computer Science Department University of Georgia SPARQLeR Krys Kochut, Maciej Janik Thank you

Computer Science Department University of Georgia Predicate Vs. Statement expressions Predicate alphabet p -p _ (wildcard) … simplicity … Statement alphabet s p o _ p o s _ o s p _ _ _ o _ p _ s _ _ _ _ _ Additional rules: Which statement pattern can be connected with which one …