EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”

Slides:

Advertisements

Similar presentations

1 OOR in the Classroom An Experience Report Ken Baclawski Northeastern University.

Advertisements

Opportunistic Reasoning for the Semantic Web: Adapting Reasoning to the Environment Carlos Pedrinaci Tim Smithers and Amaia Bernaras.

Configuration management

Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.

1 © ATHENA Consortium 2006 Dynamic Requirements Definition System Interoperability Issues Mapping Nicolas Figay, EADS ATHENA M30 Intermediate Audit 4.-5.

Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.

July 06, 2006DB&IS Building Web Information Systems using Web Services Flavius Frasincar Erasmus University Rotterdam Eindhoven.

L4-1-S1 UML Overview © M.E. Fayad SJSU -- CmpE Software Architectures Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I.

Xyleme A Dynamic Warehouse for XML Data of the Web.

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

IASW – 2005, Jyväskylä, FinlandUniversity of Vaasa, Department of Computer Science, Finland INFORMATION ARCHITECTURES FOR SEMANTIC WEB APPLICATIONS Kimmo.

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

On Cost-benefit Evaluation Methods of Government- invested IT Projects CNAO's Wuhan Resident Office Haiyan zhang.

Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.

AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.

Configuration Management

IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.

1 On the role of a Librarian Agent in ontology- based Knowledge Management Systems Nenad Stojanovic Institute AIFB WM 2003 Luzern, 2. –

Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.

Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.

Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

1 CSE 2102 CSE 2102 CSE 2102: Introduction to Software Engineering Ch9: Software Engineering Tools and Environments.

Conceptual Modeling Issues in Web Applications enhanced with Web services Sara Comai, Politecnico di Milano In collaboration with:

2 1 Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.

Architecture for a Database System

Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.

Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.

Configuration Management (CM)

CubicWeb – The Semantic Web is a construction game! Student: Uglješa Milić University of Belgrade School of Electrical.

Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.

Domain Modeling In FREMA David Millard Yvonne Howard Hugh Davis Gary Wills Lester Gilbert Learning Societies Lab University of Southampton, UK.

The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.

VAMDC use-case for the RDA Data Citation Working Group C.M. Zwölf and VAMDC consortium 6 th RDA Plenary PARIS September 2015.

Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;

Illustrations and Answers for TDT4252 exam, June

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.

Standards for Technology in Automotive Retail STAR Update Michelle Vidanes STAR XML Data Architect April 30 th, 2008.

Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.

CALIBER2009 An Approach for Generic Information Query Retrieval in Web2.0 Thippeswamy.K Assistant Professor & HOD Dept. Information Science & Engineering.

Database Administration

Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.

Domain Modeling In FREMA Yvonne Howard David Millard Hugh Davis Gary Wills Lester Gilbert Learning Societies Lab University of Southampton, UK.

Consistency in the spatial structure of surfaces Yukio SADAHIRO Department of Urban Engineering University of Tokyo Analysis of similarity among surfaces.

Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.

ESWC 2009 Research IX: Evaluation and Benchmarking Benchmarking Fulltext Search Performance of RDF Stores Enrico Minack, Wolf Siberski, Wolfgang Nejdl.

NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.

Virtual Information and Knowledge Environments Workshop on Knowledge Technologies within the 6th Framework Programme -- Luxembourg, May 2002 Dr.-Ing.

Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.

Software Engineering Lecture 8: Quality Assurance.

1 4th of October, 2006 © ATHENA Consortium 2006 B5 EADS CCR piloting Nicolas Figay, EADS Flora Robin, EADS ATHENA Intermediate Review October 2006.

Decibel: The Relational Dataset Branching System

Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;

Metadata Driven Aspect Specification Ricardo Ferreira, Ricardo Raminhos Uninova, Portugal Ana Moreira Universidade Nova de Lisboa, Portugal 7th International.

1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.

Jun-Ki Min KUT.  Data & Information ◦ Data: facts or values obtained by observation or measurement ◦ Information: interpretation or relationship to help.

1 © ATHENA Consortium 2006 Dynamic Requirements Definition System Interoperability Issues Mapping Nicolas Figay, EADS ATHENA M30 Intermediate Audit 4.-5.

Software Configuration Management

2. An overview of SDMX (What is SDMX? Part I)

The Database Environment

PPT and video are due no later than February 15, 2019

Jena HBase: A Distributed, Scalable, Efficient RDF Triple Store

Jena HBase: A Distributed, Scalable, Efficient RDF Triple Store

WSExpress: A QoS-Aware Search Engine for Web Services

Presentation transcript:

EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena” 1

Data Web Evolving – Dynamic communities – Fast-paced environments – Open-world data 2

Problem Tackled Synthetic data widely used for benchmarking – Storage – Querying – Processing Lack of tools and benchmarks for evolving RDF – Versioning Systems – Evolution Management Systems – Change Detection – insert yours here… 3

Requirements Meaningful data generation – Synthetic data generation abstraction – Identification of characteristics Configurability – Definition of parameters based on characteristics Benchmark workload Community engagement 4

Parameters We define three non-exhaustive, non- mutually exclusive parameters to drive the generation process – Shift – Monotonicity – Strictness 5

Parameters 6

Parameters 7

Parameters 8

Lehigh University Benchmark Widely used synthetic data generator Creates universities that contain departments with students, professors, courses etc. Configurable number of universities and starting index Configurable serialization and representation model (RDF/XML in.owl files, DAML) Widely adopted by the data engineering and semantic web community 9

Lehigh University Benchmark 10

Our system A generator for synthetic evolving RDF data – Based on existing LUBM generator – Extends LUBM to create evolving versions of original data – Tailors creation process based on user defined parameters – This version: monotonic shifts – Next version: configurable strictness % 11

Our system Configurable parameters – # of universities – # of consecutive versions – shift (double precision, w.r.t. first-version dataset) Shift is distributed evenly among versions All dataset classes are generated based on weight factors – serialization mode (full vs diffs) Next version – Strictness as % of Characteristic Sets generated from LUBM, spread over versions – Custom query workload 12

Resulting Data Based on Lehigh University Benchmark (LUBM) User defines: – shift as a positive or negative percentage – number of versions to be created LUBM schema classes are given weights based on their contribution to the dataset’s size Shift percentage is distributed to all LUBM classes based on their weights and the defined shift 13

System Architecture 14

Evaluation of Shift Parameter Measure achieved shift w.r.t to desired for increasing number of unis 15

Further resources Lehigh University Benchmark (LUBM) – Source code repository – paper – 16

Example of usage User defines: – 5 universities – 10 versions – 0.3% incremental change evenly distributed between versions 17

Thank you Questions? 18