Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Paul Smart, Ali.
BBC Linked Data Platform Profile of Triple Store usage & implications for benchmarking.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.
© Hortonworks Inc Daniel Dai Thejas Nair Page 1 Making Pig Fly Optimizing Data Processing on Hadoop.
Logics for Data and Knowledge Representation Projects and thesis introduction.
1 UIM with DAML-S Service Description Team Members: Jean-Yves Ouellet Kevin Lam Yun Xu.
A BRIEF INTRO TO THE PROV DATA MODEL Simon Miles The entire W3C Provenance Working Group.
Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.
Understanding Metamodels. Outline Understanding metamodels Applying reference models Fundamental metamodel for describing software components Content.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Chapter 1 Introduction to Databases Pearson Education ©
Database Technical Session By: Prof. Adarsh Patel.
Database System Concepts and Architecture
Introduction: Databases and Database Users
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Linked Data Benchmark Council 2-year status report LDBC Linked Data Benchmark Council 2-year status report Peter Boncz.
DBSQL 12-1 Copyright © Genetic Computer School 2009 Chapter 12 Recent Concepts and Application of Databases.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Export experiments in Corese. October 10th Export experiments in Corese Olivier Corby October 10th, 2005 Interoperability Working Days October 10th-11th,
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
On Demand RDF Databases in the Cloud Presentation for the Ontology Forum March 3, 2016.
Indicate Research Pilots An e-Infrastructure enabled semantic search service Technical Conference Catania 20/04/2012 NTUA Kostas Pardalis 1.
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chair of Tech Committee, BetterGrids.org
Predictive Performance
ece 627 intelligent web: ontology and beyond
Session 2: Metadata and Catalogues
Semantic Markup for Semantic Web Tools:
CC La Web de Datos Primavera 2018 Lecture 8: SPARQL [1.1]
Software Reviews.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
A framework for ontology Learning FROM Big Data
Presentation transcript:

Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014

Use-case This is an industry-motivated benchmark The scenario involves a media / publisher organization that maintains semantic metadata about its Journalistic assets (articles, photos, videos, papers, books, etc), called Creative Works The Semantic Publishing Benchmark simulates: – Consumption of RDF metadata (Creative Works) – Updates of RDF metadata

Benchmark Design - Requirements Storing and processing RDF data Loading data in RDF serialization formats : N-Quads, TRIG, Turtle, etc. Storing and isolating data in separate RDF graphs

Benchmark Design – Requirements (2) Supporting following SPARQL standards : – SPARQL 1.1 Protocol, Query, Update Support for RDFS, in order to return correct results Optional support for the RL profile of Web Ontology Language (OWL2 RL) in order to pass the conformance test suite

Benchmark Design – operational phases Initial loading of reference knowledge – Enriched datasets with DBPedia person data and Geonames – Adjustable loading of reference data Generation of Creative Works – Parallel generation (multi-threaded and multi-process) Loading of Creative Works Warm-up Benchmark Conformance tests (OWL2 RL)

Benchmark Configuration Number of editorial / aggregation agents Size of generated data (triples) Duration of Warm-up and Benchmark phases Each operational phase can be enabled or disabled Parallel data generation

Benchmark Configuration (2) Distribution of queries in the query-mix – editorial operations – aggregate operations Data Generator – Allocation of tags in Creative Works – Clustering of Creative Works around major / minor events – Correlations

Data Generation Produces synthetic data that having the most of the characteristics of real world data provided by The BBC – Input Ontologies Reference knowledge datasets – Output: Creative Works datasets conform to ontologies refer to entities in the reference datasets follow the pre-defined modeling and distributions of the Data Generator

clustering Data Generation (2) Tagged entities Time Jan.2012Dec.2012 correlations random distribution

Ontologies Core Ontologies: describe basic concepts about entities and relationships – Basic Concepts: Creative Works, Places, Persons, Provenance Information, Company Information, etc. Domain Ontologies: describe concepts and properties related to a specific domain – sports (competitions, events) – politics entities – news (concepts that journalists tag annotations with)

Ontology Sample (Creative Work)

Reference Datasets Collections of entities describing various domains Snapshots of the real datasets (BBC) – Football competitions and teams – Formula One competitions and teams – UK Parliament Members Additional datasets – GeoNames - Places, names and coordinates – DBPedia – Person data

Data Generation Process 1.Load ontologies and reference knwoledge data to the RDF repository 2.Data Generator a.retrieves instances from Reference Datasets b.Generates Creative Works according to pre-defined allocations and models c.Writes generated data to disk RDF Repository BBC Ontologies Reference Datasets Ontology & Reference Data Set Loader Creative Works Generator SPARQL Endpoint SPB Data Generator data generation parameters (1) (2.a) Generated CWs (2.c) (1) (2.d)

Choke Points “technical challenges that RDF stores need to overcome in order to satisfy the need for a fast and reliable service using real-world data and real-world queries” test how different constructs affect the performance of the RDF engines : choice of the optimal query plan

Choke Points Join Ordering : –OPTIONALs & nested OPTIONALs : should be evaluated last (treated as left outer joins) –FILTERs : evaluate as early as possible – Sub-queries : evaluate first Parallel execution : UNIONs Elimination of redundant joins : RDFS Constructs Sorting : OrderBy Aggregates : GroupBy, Count

The Workloads (Queries) Simultaneous execution of editorial and aggregation agents – Query mix distributions Editorial agents – simulate editorial work performed by journalists : – Insert, Update, Delete

The Workloads (Queries 2) Aggregation agents – simulate retrieval operations performed by end-users : Base query mix – Aggregation queries – Search queries, Count queries – Geo-spatial, Full-text search queries Extended query mix – Analytical Drill-down queries (geo-locations, time- range) – Faceted Search Queries – Time-line of Interactions Queries

Query Templates All queries are saved to template files Using template parameters in queries Templates allow to modify each query if necessary

Results Metrics and Logs Metrics – Editorial operations, Aggregate operations per second – Total QPS Logs – Brief listing of executed queries – Detailed description of each query and result – Benchmark results summary

Integration Sources and Datasets are in GitHub reposituries Adopted SPB as part of the standard release procedure for OWLIM RDF Store Detect performance deviations for future releases Both on local hardware and on Amazon’s EC2 Instances

Future Work End of April – Validation, execution and query results – Query parameters substitution – Online-replication and Backup

Thank you