Download presentation
Presentation is loading. Please wait.
1
A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data
Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright State University Introduce committee and thank everyone Give dissertation title Quick overview: Interesting work in SW area with new querying paradigms based on graph structure of SW data models … focus on relationships. We look at how to incorporate spatial and temporal data with this querying paradigm. Dissertation shows how to exploit this graph-centric data model for a more flexible modeling approach for STT data and for corresponding new query operators. First clarify some of our terminology … what we mean by (1) spatial, temporal and thematic dimensions and (2) semantic analytics Committee: Dr. Amit Sheth (advisor), Dr. T.K. Prasad, Dr. Soon M. Chung, Dr. Christopher Barton (EES WSU), Dr. Kate Beard (SISE U. of Maine)
2
Three Dimensions of Information
Thematic Dimension: What Spatial Dimension: Where Fred Smith moved into the house at 244 Elm Street on November 16, 2007 Temporal Dimension: When First of all, we will clarify what we mean by spatial, temporal and thematic dimensions of information (three dimensions of information). Look at the following statement … This is describing an event or set of relationships. Thematic Dimension – What (thematic entity Fred Smith, thematic relation moving, thematic entity house) Who/What is thematically related (company he works for, who owns the house, etc.) Spatial Dimension – Where (location of the house/event -- point, line, polygon) What is spatially related (houses nearby, nearest highway, etc.) Temporal Dimension – When did this happen (time point or interval) What is temporally related – What happed at the same time, just before, etc.
3
Background
4
Ontology What is an ontology? Types of ontologies?
Agreed-upon formalization of concepts and relationships in the real world Types of ontologies? General-purpose vs. Domain ontologies Parts of an ontology Classes – types or logical groups of objects Relationships – how objects relate to each other Attributes –features and characteristics of objects Instances – members of Classes who have Attributes and participate in Relationships Schema e.g. Student attends University First of all clarify what we mean by ontology. Agreed-upon formalization of concepts and relationships in the real-world. Idea is that through this agreement applications can have a shared interpretation of data (machine understandable) Semantic Web Many types of ontologies – general purpose (TAP, SUMO etc.) domain specific – biology (GO, NCI Cancer ontology) geography – Ordinance Survey (Hydrology) We can categorize the parts of an ontology Classes (Person, Company), Relationships (works-for connecting Person to Company), Attributes (Name of Person, literal values – strings, numbers) These three things can be thought of as a Schema in DB terms Also there are instances – Facts Data e.g. ‘Matt’ attends ‘Wright State University’
5
Representing ontologies and instance data
W W W Consortium (W3C) standards Resource Description Framework (RDF) Language for representing information about resources Resources are identified by Uniform Resource Identifiers (URIs) – globally-unique Common framework for expressing information allows exchange and reuse without loss of meaning Graph-based data model Relationships are first class objects Along with this vision of Semantic Web and Ontologies, we need a standard way to represent ontologies so they are machine processable W3C has defined RDF as a standard language for representing semantic data Major concepts URI – unique identifiers for resources Graph-based Rels are first class object … contrast again with relational model and implicit relationships
6
Subject Predicate Object
rdfs:Class Directed Labeled Graph knoesis:Person rdf:Property rdfs:Literal Statement (triple): <knoesis:Politician_123> <knoesis:gives> <knoesis:Speech_456> . Subject Predicate Object rdfs:range knoesis:Speech rdfs:domain knoesis:Politician knoesis:name Statement (triple): <knoesis:Politician_123> <knoesis:name> “Hillary Clinton” . Subject Predicate Object rdfs:range knoesis:gives rdfs:domain Defining Properties: <knoesis:gives> <rdf:type> <rdf:Property> . Subject Predicate Object Defining Properties (domain and range): <knoesis:gives> <rdfs:domain> <knoesis:Politician> . <knoesis:gives> <rdfs:range> <knoesis:Politician> . Subject Predicate Object Illustration of Major concepts of RDF: Slide shows sample RDF data … schema on top … instance data on the bottom Defining Classes: <knoesis:Person> <rdf:type> <rdfs:Class> . Subject Predicate Object Defining Class/Property Hierarchies: <knoesis:Politician> <rdfs:subClassOf> <knoesis:Person> . Subject Predicate Object knoesis:gives knoesis:Politician_123 knoesis:Speech_456 name “Hillary Clinton” rdf:type rdfs:subClassOf statement
7
RDF(S) Inferencing Rule:
rdfs:Class RDF(S) Inferencing knoesis:Person rdf:Property rdfs:Literal Rule: (x, rdf:type, y) and (y, rdfs:subClassOf, z) (x, rdf:type, z) rdfs:range knoesis:Speech rdfs:domain knoesis:Politician Asserted: (knoesis:Politician_123, rdf:type, knoesis:Politician) (knoesis:Politician, rdfs:subClassOf, knoesis:Person) knoesis:name rdfs:range knoesis:gives rdfs:domain Illustration of Major concepts of RDF Also inferencing rules … standard interpretation of rdf/rdfs relationships. Means if you see statements of a given pattern than you can infer the existence of other statements … so add them Example rdf:type / rdfs:subClassOf Infer: (knoesis:Politician_123, rdf:type, knoesis:Person) knoesis:gives knoesis:Politician_123 knoesis:Speech_456 name “Hillary Clinton” rdf:type rdfs:subClassOf statement
8
Implementation Scheme Experimental Evaluation Query Language Support
Outline Background Motivation Related Work STT Modeling Approach STT Query Operators Implementation Scheme Experimental Evaluation Query Language Support Remainder of the Presentation is organized as follows 8
9
Aggregated RDF Instance Base
SemDis Project author_of Semantic Analytics: Searching, browsing and analyzing semantically meaningful connections among named entities where an ontology provides the context or domain semantics E2:Paper E5:Person author_of author_of author_of author_of E1:Reviewer E3:Paper E7:Submission friend_of author_of E4:Person E6:Person friend_of What do we need? Data model that represents relationships explicitly as first class objects Ability to model semantics of the relationships Tools for efficient storage and querying of these relationships How is entity1 (Reviewer) related to entity2 (Submission) ? Now to give some context to this work, mention the project it’s a part of. Semantic Discovery – discovering complex relationships on the SW New Paradigm for searching – moving from simple, traditional retrieval of items or entities to finding out how entities are related Important aspect here is focus on relationships – what we see as the major benefit of SW data models – named relationships as first class objects e.g. contrasted with relational model where relationships are implicit in key constraints This slide gives the big picture of the SemDis architecture. Large RDF (SW data) instance base, extracted/scraped from multiple heterogeneous sources and corresponding to schema information Idea is that the user uses our tools to analyze connections/relationships between entities in this instance base. For example – conflict of interest – reviewer and author – aggregated bibliography data and social network data Refer to this process as Semantic Analytics – give definition Aggregated RDF Instance Base Ontology Schemas XML TEXT RDBMS HTML
10
An Example: Battlefield Intelligence
has_symptom Chemical_X ?Symptom induces ?Person participated_in ?Location_1 ?Military_Event located_at How are these events related in time? How close are these locations in space? ?Enemy spotted_at ?Location_2 member_of Enemy_Group_Y Consider this motivating example. Get the picture – battle field intelligence monitoring soldiers/activities for possible indications of chemical/biological weapons exposure We will illustrate how the st aspects will help in understanding the relationships. May pose the following query: -- read it – explain importance of spatial aspects Will show such a query expressed with our system – please ignore details of syntax for now Interesting part about this query is as follows: 1 – thematic relation 2 – thematic relation 3 – implicit spatial relation SELECT ?p FROM TABLE(spatial_eval(‘(?p has_symptom ?s)(Chemical_X induces ?s) (?p participated_in ?m)(?m located_at ?l1)’, ‘?l1’, ‘(?e member_of Enemy_Group_y)’); )(?e spotted_at ?l2)’, ‘?l2’, ‘geo_distance(distance=2 unit=mile)’);
11
Application Areas: Semantics + Space and Time
Semantic Sensor Web Web-accessible sensor networks and archived sensor data that can be discovered and accessed using standard application protocols and application program interfaces.1 Event Web Event Web organizes data in terms of events and experiences and allows access from users perspectives. For each event, Event Web collects and organizes audio, visual, textual, and other data to provide people an environment for experiencing the event from their perspective. … Unlike events, hypertext has no notion of time, space or semantic structures other than often ad-hoc hyperlinks.2 Would like to also mention some potential application areas for this work. In both of these applications we have the need to manage large amounts of SW data with heavy spatial and temporal components. Semantic Sensor Web Event Web Botts, M., Percivall, G., Reed, C., and Davidson, J. (2007). OGC Sensor Web Enablement: Overview and High Level Architecture (OGC ). Technical Report, Open Geospatial Consortium. Jain, R. (2008). EventWeb: Developing a Human-centered Computing System. IEEE Computer, 41(2):42–50.
12
Shortcomings of State of the Art
Objective Goal Enable Semantic Analytics over thematic, spatial and temporal dimensions Shortcomings of State of the Art Current GIS technology does not support complex thematic analytics operations Current Semantic Analytics technology does not support spatial and temporal relationship analysis Shortcomings with current state of the art make this challenging 1 – traditional models do not represent thematic entities/rels as first class objects – hard to analyze rels 2 – As discussed earlier work in analyzing relationships has focused on thematic rels – ST not supported – note implicit nature of these rels 3 – additional benefit – no support in general even in simpler queries – benefits to data integration, etc.
13
Contributions An ontology-based spatiotemporal modeling approach using temporal RDF A formalization of a set of spatial, temporal and thematic query operators for the proposed modeling approach A SQL-based implementation of the proposed query operators (storage, indexing, inferencing, query processing) An extension of the SPARQL RDF query language: SPARQL-ST A detailed performance study using large synthetic and real-world RDF datasets
14
Broad Differences from Related Work
ST Modeling Represent thematic entities as first class objects rather than directly attached attributes of spatial objects Provide many-to-many mapping between thematic and spatial objects ST Querying Utilize thematic relationships to connect entities to spatial regions in a variety of ways (contexts) Analyze ST properties of a given entity w.r.t. different contexts Dynamic binding of objects to ST properties ST Data on Semantic Web Focus on relationship-centric nature of RDF data for analytical queries Implicit relationships (e.g., distance) Look at query language aspects and performance issues Only system supporting both spatial and temporal Before we get into details, I will briefly give broad differences with related work. This is detailed in the written dissertation. ST Modeling … note many-to-many is source of increased flexibility ST SW … note qlang done in context of SPARQL
15
Related Work (Ontologies and GIS)
A vehicle to facilitate Interoperability Fonseca et al. (2002), Agarwal (2005) Geospatial Ontologies Fundamental ontology of space vs. Geography domain ontology My Work Upper-level ontology serves as fundamental ontology of space Allows for integration of domain ontologies Goes beyond modeling and allows analysis
16
Related Work (Spatiotemporal Models)
Three Domain Model - Yuan (1994, 1996) Models semantics, space, time separately Relies on direct connections from thematic entities to spatial entities Object-oriented and Event-based Models -Worboys and Hornsby (2004) Models the concept of a setting and a situate function General Modeling appraoches Spatiotemporal ER model – Tryfona and Jensen (1999, 2000) Spatiotemporal UML – Price et al. (2002)
17
Related Work (Storing and Querying RDF)
RDBMS – based Schema-aware – Sesame w/ PostgreSQL, Vertical Partitioning (Abadi, et al. 2007) Schema-oblivious – Jena, Oracle RDF Hybrid – RDF Suite Native – Use lower-level structures Redland – Hash Tables Yars – B+ Trees My approach Use Schema-oblivious with additional structures for ST data Must handle temporal aspects with RDFS inferencing
18
Related Work (ST Data and SW)
Only system to support Temporal RDF and spatial objects Temporal RDF Formally defined – Gutierrez, et al. (2005, 2007) tGRIN – Pugliese et al. (2008) Simpler queries and inferencing Qualitative spatial and temporal reasoning OWL-Time – Hobbs and Pan (2004) Specialized Spatial Reasoners – Smart et al. (2007)
19
STT Modeling Approach Give basic overview of approach:
-Upper-level ontology to model spatial objects and their connections to non-spatial objects -Temporal RDF graphs – give temporal dimension to the relationships
20
Upper-level Ontology modeling Theme and Space
Continuant Occurrent Dynamic_Entity Spatial_Occurrent Non-Spatial_Occurrent Named_Place occurred_at located_at Spatial_Region The first component of our modeling approach is an upper-level ontology outlining the basic classes and relationships for the thematic and spatial domains. The idea is to give a minimal model that we can use to build our query operators. This was done by surveying various related work in st ontologies to form this basic model. Enough for query operators but general enough to incorporate nearly any domain model Gml-based Ontology rdfs:subClassOf property Continuant: Concrete and Abstract Entities – persist over time Occurrent: Events – happen and then don’t exist Named_Place: Those entities with static spatial behavior (e.g., building) Dynamic_Entity: Those entities with dynamic spatial behavior (e.g., person) Spatial_Occurrent: Events with concrete spatial locations (e.g., a speech) Non-Spatial_Occurrent: Event w/out concrete location (e.g., law becomes active) occurred_at: Links Spatial_Occurents to their geographic locations located_at: Links Named_Places to their geographic locations Spatial_Region: Records exact spatial location (geometry objects, coordinate system info)
21
Example Domain Ontology
To clarify these concepts consider the example Domain ontology that has been integrated with the upper-level ontology described previously. We can see the subclasses of Dynamic Entity / Named Place, Spatial Occurrent Also note the start and end times on edges This military scenario will serve as a running example for the presentation
22
Temporal RDF: Incorporating Temporal Information
Student rdfs:subClassOf rdfs:subClassOf Graduate Undergraduate rdf:type [?, ?] rdf:type : [2004, 2008] rdf:type : [2002, 2004] Student1 Associate temporal label with a statement that represents the valid time of the statement (Student1, rdf:type, Graduate) : [2004, 2008] Example temporal RDF graph Student1 was and undergraduate student and the enrolled in graduate program Note need for temporal aspects in rdf inferencing … we need to account for this in our system Temporal Inferencing Interval Union: (Student1, rdf:type, Student) : [2002, 2008] 1. Claudio Gutiérrez, Carlos A. Hurtado, Alejandro A. Vaisman. “Temporal RDF”. ESWC 2005:
23
Temporal RDF Serialization
Instant Statement CalendarClockInterval inCalendarClockDataType C XMLSchema Datetime object begins predicate B Instant temporal ends subject XMLSchema Datetime A inCalendarClockDataType Temporal triples can be modeled/serialized with standard RDF syntax … utilizes RDF reification and existing OWL time ontology. RDF Reification OWLTime NEW
24
STT Query Operators
25
Querying in the STT dimensions
Define a notion of context based on a graph pattern Query about entities w.r.t. a given context Associate spatial region with an entity w.r.t. a context Associate temporal interval with an entity w.r.t. a context How are entities related in space and time w.r.t. a given context First bullet is the goal Note that we will clarify notion of context with the next slide.
26
Georeferenced Coordinate
Contexts Linking Non-Spatial Entities to Spatial Entities E2:Soldier E4:Address lives_at located_at located_at lives_at E6:Address E1:Soldier Georeferenced Coordinate Space (Spatial Regions) E1:Soldier occurred_at E7:Battle assigned_to participates_in E8:Military_Unit This slide illustrates a key contribution of this approach. Consider using thematic contexts to connect non-spatial entities (e.g., Dynamic Entities) to Spatial Entities Describe the picture Look at Solder E1 One context could be residency another could be battle participation Traditionally we have a direct 1-to-1 mapping between entities and spatial properties, but here we are using indirect connections to achieve a many-to-many mapping, can analyze st properties of a given entity w.r.t. multiple contexts Ex – who/what is near in this context what is relationship between spatial properties of same entity and different contexts e.g. employment vs. residency E8:Military_Unit participates_in assigned_to E5:Battle occurred_at Residency Battle Participation E3:Soldier Named Places Spatial Occurrents Dynamic Entities
27
Context Definition Graph Pattern: recursive definition
Basis: a tuple from (UL U VN) X (U U VN) X (UL U VN) is a graph pattern (triple pattern) Recursion: if P1 and P2 are graph patterns, then (P1 AND P2) is a graph pattern Formal definition of context. Notion of a graph pattern – a graph pattern is essentially a collection of triples where the subjects, predicates and/or objects have been replaced with variables. Result of a graph pattern query is a mapping between variables and URIs such that if you substitute the mapped URIs for the variables you get triples actually present in the RDF graph. Semantics1 of a graph pattern are defined in terms of a function [[.]], which takes a graph pattern and returns a set of mappings where a mapping μ : VN RT is a function from VN to RT 1. Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ISWC 2006
28
Context Definition A spatial context is a 2-tuple (GP, v) where: 1) GP is a graph pattern 2) v var(GP) is a variable in GP identifying a Spatial_Region instance Example: (‘(?x assigned_to ?y) (?y participates_in ?z) (?z occurred_at ?s)’, ‘?s’)
29
Spatial Operators spatial_extent ((GP, v))G {(μ, s)} Given: a spatial context (GP, v), a temporal RDF graph G Find: {(μ, s) | μ [[GP]]TRIPLES(G) and s = geom(μ(v))} Example: What are the properties of the 101st Airborne Division w.r.t. battle participation? ANS spatial_extent(‘(<101st Airborne Div> participates_in ?x) (?x occurred_at ?s)’, ‘?s’)G
30
Spatial Operators spatial_restrict ((GP, v), sf($s))G {(μ, s)} Given: a spatial context (GP, v), a spatial formula sf defined over S and a variable $s, a temporal RDF graph G Find: {(μ, s) | μ [[GP]]TRIPLES(G) and s = geom(μ(v)) and sf evaluates to true for $s = s} Example: Which military units have spatial extents that are within 20 miles of (48.45 N, E)? ANS spatial_restrict(‘(?x participates_in ?y), (?y occurred_at ?s)’, ‘?s’, distance($s, point(48.45 N, E)) < 20 miles)G
31
Spatial Operators spatial_eval ((GP1, v1), (GP2, v2), sf($s1, $s2))G {(μ1, s1, μ2, s2)} Given: a spatial context (GP1, v1), a spatial context (GP2, v2), a spatial formula sf defined over S and variables $s1, $s2, a temporal RDF graph G Find: {(μ1, s1, μ2, s2) | μ [[GP1]]TRIPLES(G) and μ [[GP2]]TRIPLES(G) and s1 = geom(μ1(v2)) and s2 = geom(μ2(v2)) and sf evaluates to true for $s1 = s1 and $s2 = s2} Example: Which military unit’s operational area overlaps the operational area of the 3rd Armored Division? ANS spatial_restrict(‘(?x1 participates_in ?y1), (?y1 occurred_at ?s1)’, ‘?s1’, ‘(<3rd Armored Div> participates in ?y2) (?y2 occurred_at s2)’, ‘?s2’, overlap-bdy-intersect($s1, $s2))G
32
Temporal Operators Initial Definitions:
For each statement e = (s, p, o) TRIPLES(G), let temporal(e) = {t | (s, p, o) : [t] G} For a set of time points T’ T, let contig_intervals(T’) = {[ti, tj] | for all t T : (if ti t and t tj then t T’) and ti-1 T’ and tj+1 T’} Example Suppose: T = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} T’ = {2, 3, 4, 7, 8} Then: contig_intervals (T’) = {[2, 4], [7,8]}
33
Temporal Operators Given a set of temporal triples E = {e1, e2, …, en}, we define the interval extension of E, int_extension(E) as the set: contig_intervals(temporal(e1)) X contig_intervals(temporal(e2)) X … contig_intervals(temporal(en)) Example Suppose: E = {e1, e2, e3}, contig_intervals(temporal(e1)) = {[2, 4], [7, 8]}, contig_intervals(temporal(e2)) = {[1, 5], [7, 9]}, contig_intervals(temporal(e3)) = {[4, 5]} Then: int_extension(E) = {{[2, 4], [1, 5], [4, 5]}, {[2, 4], [7, 9], [4, 5]}, {[7, 8], [1, 5], [4, 5]}, {[7, 8], [7, 9], [4, 5]}}
34
Temporal Operators Given a set of time intervals I = {[s1, t1], [s2, t2], …, [sn, tn]} smin = min1<=i<=n si, smax = max1<=i<=n si tmin = min1<=i<=n ti, tmax = max1<=i<=n ti Intersect(I) = [smax, tmin], or NULL if tmin < smax Range(I) = [smin, tmax] Intersect [6, 12] assigned_to:[3, 12] assigned_to:[6, 20] Soldier#123 Platoon#456 Soldier#789 Range [3, 20]
35
Temporal Operators temporal_extent (GP, IT)G {(μ, i)} Given:
a graph pattern GP, an interval type IT {intersect, range}, a temporal RDF graph G Find: {(μ, i) | μ [[GP]]TRIPLES(G) and i intersect/range(int_extension(μ(GP))) } Example: Find all pairs of soldiers who were members of the 101st Airborne Division at the same time and return the times of joint membership? ANS temporal_extent(‘(?x assigned_to <101st Airborne Div>) (?y assigned_to <101st Airborne Div>)’, ‘intersect’)G
36
Temporal Operators temporal_restrict (GP, IT, tf($t))G {(μ, i)}
Given: a graph pattern GP, an interval type IT {intersect, range}, a temporal formula tf defined over I and a variable $t, a temporal RDF graph G Find: {(μ, i) | μ [[GP]]TRIPLES(G) and i int/range(int_extension(μ(GP))) and tf evaluates to true for $t = i} Example: Which members of the 3rd Armored Division participated in battles during September 1944? ANS temporal_restrict(‘(?x assigned_to <3rd Armored Div>) (<3rd Armored Div> participates_in ?y)’, ‘intersect’, during($t, [09:01:1944, 09:31:1944]) = true)G
37
Temporal Operators temporal_eval (GP1, IT1, GP2, IT2, tf($t1, $t2))G {(μ1, i1, μ2, i2)} Given: a graph pattern GP1, a graph pattern GP2, an interval type IT1 {intersect, range}, an interval type IT2 {intersect, range}, a temporal formula tf defined over I and variables $t1, $t2, a temporal RDF graph G Find: {(μ1, i1, μ2, i2) | μ [[GP1]]TRIPLES(G) and μ [[GP2]]TRIPLES(G) and i1 int/range(int_closure(μ1(v1))) and i2 int/range(int_closure(μ2(v2))) and tf evaluates to true for $t1 = i1 and $t2 = i2} Example: Which speeches by President Roosevelt were given during a military event ANS spatial_restrict(‘(<President Roosevelt> gives ?x)’, ‘intersect’, ‘(?y participates_in ?z)’, ‘intersect’, during($t1, $t2) = true)G
38
Implementation Scheme
39
Overview Extended ORDBMS (Oracle 10g) Challenges RDF(S) Inferencing
Defined storage and indexing scheme User-defined functions for temporal RDFS inferencing User-defined functions for querying Challenges Thematic relationships can be directly stated but spatial and temporal relationships require additional computation Spatial and temporal properties of subgraphs aren’t known until query execution time … challenging to index RDF(S) Inferencing If statements have an associated valid time, this must be taken into account when performing inferencing (x, rdfs:subClassOf, y) : [1, 4] AND (y, rdfs:subClassOf, z) : [3, 5] (x, rdfs:subClassOf, z) : [3, 4] This slide gives an overview of our implementation approach. Challenges in comparison to our other work on semantic analytics and to traditional st querying Also have to handle aspects of RDF(S) inferencing with temporal triples – example
40
Existing Oracle Technology
Semantic Technology Component Storage Structures for RDF(S) Data (non-spatial, non-temporal) RDFS Inference Procedures (non-temporal) SQL-based Querying (non-spatial, non-temporal) Spatial Component Spatial Types – SDO_GEOMETRY Implementation of Spatial_Region Spatial Indexing Spatial Operators Implementation builds upon existing Oracle Technologies
41
SQL-based Querying Approach
SQL Table Functions SELECT X, Y FROM TABLE (Table_Func(…)); X Y Z a b c d e f … … … Initial implementation uses SQL table functions .. Benefits compared to new query language .. No data transformation, integrate easily with legacy relational data Use table function in same way as db table name, describe illustration
42
Spatial Functions spatial_extent (graphPattern VARCHAR, spatialVar VARCHAR, ontology RDFModels, <geom SDO_GEOMETRY>, <spatialRelation VARCHAR>) return AnyDataSet; spatial_eval (graphPattern VARCHAR, spatialVar VARCHAR, graphPattern2 VARCHAR, spatialVar2 VARCHAR, spatialRelation VARCHAR, ontology RDFModels) return AnyDataSet; We defined 4 table functions – 2 spatial and 2 temporal The first is spatial_extent designed to retrieve spatial region for a given thematic context and allows optional filtering of results based on spatial predicate. List inputs and optional params – returns variable bindings Go through example
43
Temporal Functions temporal_extent (graphPattern VARCHAR, intervalType VARCHAR, ontology RDFModels, <start DATE>, <end DATE>, <temporalRel VARCHAR>) return AnyDataSet; temporal_eval (graphPattern VARCHAR, intervalType VARCHAR, graphPattern2 VARCHAR, intervalType2 VARCHAR, temporalRel VARCHAR, ontology RDFModels) return AnyDataSet; 2 temporal operators. 1st is designed to construct temporal interval for a context instance and optionally filter the results based on a temporal predicate.
44
Storage Scheme Temporal Indexing Procedure Load RDF Data with Oracle
This slide shows the storage scheme for our spatial and temporal RDF data. The existing schema for Oracle RDF is shown on the left. Triples are stored after normalization in 2 tables RDFValues stores URIs and a generated id and RDFTriples stores subj, prop, obj id triples along with other housekeeping info. Any triples inferred through RDFS semantics are materialized in an InferredTriples table that has the same columns as RDFTriples. Our additional indexing structures are shown on the left. Mention that ST RDF can be serialized as plain RDF. Reification, etc. To load the data users first load the RDF with Oracle Semantic Data Store. We provide a procedure to build a spatial index. Ontology is queried to retrieve all instances of Spatial Region. Get id and create geometry object and store in SpatialData table. Also provide a procedure for Temporal indexing. Reads the reifications and creates the Temporal Triples table where we have subj, prop, obj and start time end time. Stored as date-time values. This is a bit more complicated because we have to handle RDFS inferencing and temporal labels for inferred triples. Spatial Indexing Procedure Thematic Indexes (on TemporalTriples) (subj_id, prop_id, obj_id) (prop_id, subj_id, obj_id) (obj_id, prop_id, subj_id)
45
RDFS Inferencing Rules
Temporal Inferencing RDFS Inferencing Rules (x, rdf:type, y) AND (y, rdfs:subClassOf, z) (x, rdf:type, z) (x, p, y) AND (p, rdfs:domain, a) (x, rdf:type, a) (x, p, y) AND (p, rdfs:range, b) (y, rdf:type, b) (x, p, y) AND (p, rdfs:subPropertyOf, z) (x, z, y) Example: (x, participates_in, e):[2, 5] (y, participates_in, e):[3, 7] (z, participates_in, e):[6, 9] By rule 3: (e, rdf:type, event):[2, 9] We will illustrate the inferencing with the following example. We are not concerned with evolution of ontology schema over time. What we are concerned with are inferences that affect instance-level statements. These come from the following rules. With each of these it turns out that the interval for the inferred statement is the union of the intervals of the asserted statements. Consider the example of an event e .. Go through example Interval Union
46
Temporal Inferencing Algorithm
create table asserted_temporal_triples (subj, prop, obj, start_date, end_date) perform schema-level inferencing perform instance-level inferencing sort redundant_triples by (subj_id, prop_id, obj_id, start) make a single pass and merge overlapping intervals for same statement insert updated triples and intervals into final temporal_triples table asserted_temporal_triples (x, participates_in, e):[2, 5] (y, participates_in, e):[3, 7] (z, participates_in, e):[6, 9] redundant_triples (e, rdf:type, event):[2, 5] (e, rdf:type, event):[3, 7] (e, rdf:type, event):[6, 9] temporal_triples (e, rdf:type, event):[2, 9]
47
Query Function Implementation
Oracle Extensibility Framework ODCITable Interface Start() Prepare SQL query over TemporalTriples and SpatialData tables – Base Query Fetch() Retrieve row from base query and do additional processing (e.g., construct int/range intervals) Close() Final cleanup (e.g., close DB cursors) The query operators were implemented using oracle’s extensibility framework. Note this is not unique to oracle other DBs have similar extensibility frameworks. To create table functions, you implement ODCITable Interface. Has a start(), fetch() and close() method. Start() – generally you prepare a SQL query against some underlying structures and parse the query Fetch() – usually this method retrieves a result from prepared query and returns it or does other processing and returns altered result Kernel calls fetch() for a desired number of rows repeatedly until all rows are returned Close() – method does housekeeping after function releases resources, etc.
48
Temporal Filtering Example
[2, 7] [1, 5] [4, 8] 1. 2. 3. Intersection: Range: [4, 5] [1, 8] Partial Filter on Each Edge: during (3, 6) Intersection: (start <= 6) and (end >= 3) Intersection: (start > 3) and (end < 6)
49
Experimental Evaluation
50
Evaluation Environment Objective Dataset
Oracle 10g R2 on 64-bit Solaris 9 Four 1.8 GHz Ultra Sparc IV processors 8GB RAM 512 MB buffer cache, 512 MB pga_aggregate_target Objective Test scalability w.r.t (1) dataset size, (2) query complexity Dataset Synthetic RDF Graph1 – Historical Battlefield Analysis (SynHist) Spatial: US Census block group polygons Temporal: random intervals Real-world Data – Political Domain (GovTrack) Spatial: US Census congressional district polygons Temporal: given in data Matthew Perry "TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications", AIS SIGSEMIS Bulletin Volume 2 Issue 2 (April - June) 2005, pp
51
Dataset Characteristics
52
Scalability w.r.t. Dataset Size
53
Scalability w.r.t. Dataset Size
54
Scalability w.r.t. Graph Pattern Size
55
Scalability w.r.t. Graph Pattern Size
56
Scalability of Spatiotemporal Queries
57
Query Language Support
58
SPARQL SPARQL-ST Overview
W3C recommended query language for RDF data (as of Jan. 15, 2008) Graph pattern-based queries (subgraph match) SPARQL-ST Spatial variables Temporal variables Spatial filter expressions Temporal filter expressions
59
Intro to SPARQL Basic Query: SELECT ?b, ?p
WHERE { ?b rdf:type usbill:HouseBill . ?b usbill:sponsor ?p } b p < < < < Filtered Query: SELECT ?b WHERE { ?b rdf:type usbill:HouseBill . ?b rdfs:label ?l . FILTER (regex(?l, “handgun”)) }
60
SPARQL-ST: Spatiotemporal Graph Pattern
Sets of Terms: UL = URIs U Literals U = URIs RT = RDF Terms Sets of Variables: VN = Variables VS = Spatial Variables VT = Temporal Variables Triple Pattern: 3-tuple from (UL VN) x (U VN) x (UL VN) Spatial Triple Pattern: 3-tuple from (UL VN VS) x (U VN ) x (UL VN VS) Spatiotemporal Triple Pattern: 4-tuple from (UL VN VS) x (U VN ) x (UL VN VS) x (VT) Spatiotemporal graph patterns are constructed from triple patterns and/or spatial triple patterns and/or spatiotemporal triple patterns
61
SPARQL-ST Mappings SELECT ?c, %s, #t1
WHERE { <Politician_123> on_committee ?c #t1 . <Politician_123> represents ?d #t2 . ?d located_at %s #t3 } Maps to single URI Maps to a time interval Maps to a set of triples Committee_456 on_committee : [1990, 2000] uses_crs : [-∞, + ∞] NAD83 Politician_123 Polygon_1 exterior : [-∞, + ∞] represents : [1984, 1992] Linear_Ring_1 located_at : [1990, 2000] District_789 lrPosList : [-∞, + ∞] , , …,
62
SPARQL-ST by Example Find all politicians who were senators of Ohio at the same time and return the times of joint senatorship. SELECT ?s1, ?s2, intersect(#t1, #t2, #t3, #t4) WHERE { ?s1 usgov:hasRole ?a #t1 . ?a usgov:forOffice usgov:senate/oh #t2 . ?s2 usgov:hasRole ?b #t3 . ?b usgov:forOffice usgov:senate/oh #t4 . FILTER (?s1 != ?s2) }
63
SPARQL-ST by Example Find all House members who sponsored a bill after April 2, 2008 SELECT ?p, ?b WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:isPartOf usgov:congress/house #t3 . ?p usgov:sponsor ?b #t4 . TEMPORAL FILTER ( after(intersect(#t1, #t2, #t3, #t4), interval(04:02:2008, 04:02:2008, MM:DD:YYYY))) }
64
SPARQL-ST by Example Find all politicians that represent areas within 100 miles of the district represented by Nancy Pelosi. SELECT ?n WHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h . SPATIAL FILTER (distance(%g, %h) <= 100 miles) }
65
SPARQL-ST by Example Find all politicians representing congressional districts within a given geographical area at any time in March 2006 SELECT ?p WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( , , , , ))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(03:01:2008, 03:31:2008, MM:DD:YYYY))) }
66
Conclusions Showed how the relationship-centric nature of the RDF data model can extend the state-of-the-art in modeling and querying STT data Modeling Many-to-many mapping between thematic and spatial objects (formalized as a context) Querying Support spatial and temporal relationships in graph pattern queries More complex thematic aspects than traditional STT querying Proposed SPARQL-ST to integrate with current standards Implementation Good scalability on large synthetic and real-world datasets Only system for spatial and temporal RDF Future Work Semantic Associations Sensor Web and Event Web applications
67
Publications Dissertation Related
Matthew Perry, Amit Sheth, Farshad Hakimpour, Prateek Jain. “Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS ‘07), Mexico City, MX, November 29 – 30, 2007 Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA, November , 2006 Matthew Perry and Amit Sheth. “A Framework for Spatial, Temporal and Thematic Analytics over Semantic Web Data”, submitted to VLDB Journal Matthew Perry and Amit Sheth. “SPARQL-ST: Extending SPARQL for Spatial and Temporal Queries”, in preparation.
68
Publications Other STT Related Proposals
Amit Sheth and Matthew Perry. “Traveling the Semantic Web through Space, Time and Theme”, IEEE Internet Computing, Volume 12, Issue 2, March/April 2008 Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, Amit Sheth. "Data Processing in Space, Time, and Semantics Dimensions", Terra Cognita Directions to the Geospatial Semantic Web, in conjunction with the Fifth International Semantic Web Conference (ISWC '06), Athens, GA, November 6, 2006 Matthew Perry, Amit Sheth, Ismailcem Budak Arpinar. "Geospatial and Temporal Semantic Analytics", To appear in Encylopedia of Geoinformatics, Hassan A. Karimi (Ed), Idea-Group Inc., 2008 Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, Amit Sheth. "Spatiotemporal-Thematic Data Processing in Semantic Web", To appear in The Geospatial Web, Springer-Verlag, May, 2007 Proposals Amit Sheth (PI), T.K. Prasad. “Spatial, Temporal and Thematic Analysis of Semantic Web Data” NSF-Small
69
Publications SemDis Related Semantics and Databases
Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Ismailcem Budak Arpinar, Amit Sheth. "Peer-to-Peer Discovery of Semantic Associations", Second International Workshop on Peer-to-Peer Knowledge Management (P2PKM '05), San Diego, CA, July 17, 2005 Cartic Ramakrishnan, William Milnor, Matthew Perry, Amit Sheth. "Discovering Informative Connection Subgraphs in Multi-relational Graphs", SIGKDD Explorations Special Issue on Link Mining, Volume 7, Issue 2, December 2005 Matthew Perry "TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications", AIS SIGSEMIS Bulletin Volume 2 Issue 2 (April - June) 2005, pp. 46 – 48 Matthew Perry and Eric Stiles. "SEMPL: A Semantic Portal", Thirteenth International World Wide Web Conference (WWW '04), New York, NY, May 17-22, 2004 Semantics and Databases Matthew Perry, Souripriya Das, Melliyal Annamalai, Eugene Inseok Chong, Zhe Wu, Jagannathan Srinivasan. “Semantic Similarity based Top-k Queries over Resources categorized using a Taxonomy”, Submitted to VLDB 2009
70
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.