Semantic Access to Existing Archives Using RDF and SPARQL Alasdair J G Gray.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

R 2 O+ODEMapster : Upgrading Relational Legacy Data to the Semantic Web Jesús Barrasa Rodríguez
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.
Information Integration Using Logical Views Jeffrey D. Ullman.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Using Semantic Web Technology to Integrate Scientific Data Alasdair J G Gray University of Manchester.
Michael Povolotsky CMSC491s/691s. What is Virtuoso? Virtuoso, known as Virtuoso Universal Server, is a multi-protocol RDBMS Includes an object-relational.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Accessing Existing Distributed Science Archives as RDF Models Alasdair J G Gray 1 Norman Gray 2 Iadh Ounis 1 1 Computing Science, University of Glasgow.
Triple Stores.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
1. Motivation Knowledge in the Semantic Web must be shared and modularly organised. The semantics of the modular ERDF framework has been defined model.
RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Can RDB2RDF Tools Feasible Expose Large Science Archives for Data Integration?  Alasdair J G Gray (University of Glasgow now Manchester)  Norman Gray.
Astronomical Data Query Language Simple Query Protocol for the Virtual Observatory Naoki Yasuda 1, William O'Mullane 2, Tamas Budavari 2, Vivek Haridas.
Towards linked sensor data Analysis of project task, tools and Hackystat architecture Author: Myriam Leggieri GSoC 2009 project for Hackystat.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Steven Seida D2RQ Blog Integration Lab. Data to RDF Integration Approaches* 1 of 2 *Summarized from Ch 9 of Semantiic Web Programming, 2009, by Hebeler.
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
The Explicator Project: Integrating Astronomy Data with Semantic Web Tools Alasdair J G Gray Information Management Group Seminar University of Manchester.
Integration of Spatial Information Sources Based on Source Description Framework Yoshiharu Ishikawa, Gihyong Ryu, and Hiroyuki Kitagawa University of Tsukuba.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Keyword Query Routing.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
RDF and Relational Databases
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
© 2009 OpenLink Software, All rights reserved. Mapping Relational Databases to RDF with OpenLink Virtuoso Orri Erling - Program Manager, Virtuoso.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Geographic Information Systems GIS Data Databases.
ETL Validator Deployment Options
Triple Stores.
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Analyzing and Securing Social Networks
R2O+ODEMapster: Upgrading Relational Legacy Data to the Semantic Web
Triple Stores.
Database Systems Instructor Name: Lecture-3.
Chaitali Gupta, Madhusudhan Govindaraju
Triple Stores.
Geographic Information Systems
Presentation transcript:

Semantic Access to Existing Archives Using RDF and SPARQL Alasdair J G Gray

The Idea Use case Access and combine data from multiple sources Pose one query Issues Archives exists Mostly relational Bespoke schemas Relating data is hard Matching columns Matching types Maintaining semantics Virtual Observatory

Abstract Computer Science Approach Data Integration Autonomous Sources – Maintain own schema One Global Schema – Pre-agreed by all – Limits available data Mappings are relational views – Global as View (GAV) – Local as View (LAV) Problem No global schema in astronomy Global Schema Query 1 Query n DB 1 Wrapper 1 DB k Wrapper k DB i Wrapper i Mappings

A Semantic Web Approach Use RDF Expose source as RDF model Allow multiple “access” models Semantic mappings between models Query using SPARQL Two approaches Replicate data in RDF Store – Consistency issues On-the-fly translation – SPARQL to SQL – Relational data to RDF Community Schema 1 DB 1 RDF Wrapper 1 DB k RDF Wrapper k DB i RDF Wrapper i Community Schema j Mappings Query 1 Query n

System Hypothesis Is it viable to perform on- the-fly conversions from existing science archives to RDF to facilitate data access from a data model that a scientist is familiar with? Relational DB RDF / Relational Conversion XML DB RDF / XML Conversion Common Model (RDF) Mappings SPARQL query

Test Data SuperCOSMOS Science Archive (SSA) Data extracted from scans of Schmidt plates Stored in a relational database About 4TB of data, detailing 6.4 billion objects Fairly typical of astronomical data archives Schema designed using 20 real queries Personal version contains Data for a specific region of the sky About 0.1% of the data, ~500MB

Real Science Queries Query 5 Find the positions and (B,R,I) magnitudes of all star-like objects within delta mag of 0.2 of the colours of a quasar of redshift 2.5 < z < 3.5 SELECT TOP 30 ra, dec, sCorMagB, sCorMagR2, sCorMagI FROM ReliableStars WHERE (sCorMagB- sCorMagR2 BETWEEN 0.05 AND 0.80) AND (sCorMagR2- sCorMagI BETWEEN AND 0.64)

Analysis of Test Queries Query FeatureQuery Numbers Arithmetic in body1-5, 7, 9, 12, 13, Arithmetic in head7-9, 12, 13 Ordering1-8, 10-17, 19, 20 Joins (including self-joins)12-17, 19 Range functions (e.g. Between, ABS)2, 3, 5, 8, 12, 13, 15, Aggregate functions (including Group By)7-9, 18 Math functions (e.g. power, log, root)4, 9, 16 Trigonometry functions8, 12 Negated sub-query18, 20 Type casting (e.g. Radians to degrees)7, 8, 12 Server functions10, 11

Expressivity of SPARQL Features Select-project-join Arithmetic in body Conjunction and disjunction Ordering String matching External function calls (extension mechanism) Limitations Range shorthands Arithmetic in head Math functions Trigonometry functions Sub queries Aggregate functions Casting

Analysis of Test Queries Query FeatureQuery Numbers Arithmetic in body1-5, 7, 9, 12, 13, Arithmetic in head7-9, 12, 13 Ordering1-8, 10-17, 19, 20 Joins (including self-joins)12-17, 19 Range functions (e.g. Between, ABS)2, 3, 5, 8, 12, 13, 15, Aggregate functions (including Group By)7-9, 18 Math functions (e.g. power, log, root)4, 9, 16 Trigonometry functions8, 12 Negated sub-query18, 20 Type casting (e.g. radians to degrees)7, 8, 12 Server functions10, 11 Expressible queries: 1, 2, 3, 5, 6, 14, 15, 17, 19

Experimental Setup Machine Quad Core Intel Xeon 2.4GHz 64 bit processor 4GB RAM 100 GB Disc Linux Java 1.6 Software Database MySQL Triple Stores Jena with SDB 1.1 Sesame RDB2RDF Convertors D2RQ SquirrelRDF 0.1 Only 5 queries completed within 2 hours

Performance Results Query Execution (ms) Query12356 JDBC/MySQL (Views) D2RQ (Views)3525,3392,7334,0907,468 D2RQ39,37440,021153,392 SquirrelRDF (Views) 61321, ,30719,984 Jena SDB (Views)3,450485,9327,22917,793372,561 Sesame (Views) Sesame

A New Approach Exploit query engines and data structure of underlying data sources Aid user query generation by explaining source data model in terms of known data model Data extracted in native model Relational DBXML DB SQLXQuery Explicator MappingsModels Explain

Conclusions SPARQL: Not expressive enough for science Query Converters: Poor performance Exploring a new approach RDF to understand data models Native query engines for data extraction RDFRelational Ragged dataStructured data Small to medium data volumes Large data volumes Reasoning over the dataExtracting specific data

Practical Semantic Astronomy March 2009 Glasgow, UK