EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”

EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena” m.meimaris@imis.athena-innovation.gr 1 Meimaris@DIACHRON2016

Data Web Evolving – Dynamic communities – Fast-paced environments – Open-world data Meimaris@DIACHRON2016 2

Problem Tackled Synthetic data widely used for benchmarking – Storage – Querying – Processing Lack of tools and benchmarks for evolving RDF – Versioning Systems – Evolution Management Systems – Change Detection – insert yours here… Meimaris@DIACHRON2016 3

Requirements Meaningful data generation – Synthetic data generation abstraction – Identification of characteristics Configurability – Definition of parameters based on characteristics Benchmark workload Community engagement Meimaris@DIACHRON2016 4

Parameters We define three non-exhaustive, non- mutually exclusive parameters to drive the generation process – Shift – Monotonicity – Strictness Meimaris@DIACHRON2016 5

Parameters Meimaris@DIACHRON2016 6

Lehigh University Benchmark Widely used synthetic data generator Creates universities that contain departments with students, professors, courses etc. Configurable number of universities and starting index Configurable serialization and representation model (RDF/XML in.owl files, DAML) Widely adopted by the data engineering and semantic web community Meimaris@DIACHRON2016 9

Lehigh University Benchmark Meimaris@DIACHRON2016 10 http://blog.andric.name/wp-content/uploads/2013/06/univ-bench.owl_.png

Our system A generator for synthetic evolving RDF data – Based on existing LUBM generator – Extends LUBM to create evolving versions of original data – Tailors creation process based on user defined parameters – This version: monotonic shifts – Next version: configurable strictness % Meimaris@DIACHRON2016 11

Our system Configurable parameters – # of universities – # of consecutive versions – shift (double precision, w.r.t. first-version dataset) Shift is distributed evenly among versions All dataset classes are generated based on weight factors – serialization mode (full vs diffs) Next version – Strictness as % of Characteristic Sets generated from LUBM, spread over versions – Custom query workload Meimaris@DIACHRON2016 12

Resulting Data Based on Lehigh University Benchmark (LUBM) User defines: – shift as a positive or negative percentage – number of versions to be created LUBM schema classes are given weights based on their contribution to the dataset’s size Shift percentage is distributed to all LUBM classes based on their weights and the defined shift Meimaris@DIACHRON2016 13

System Architecture Meimaris@DIACHRON2016 14

Evaluation of Shift Parameter Measure achieved shift w.r.t to desired for increasing number of unis Meimaris@DIACHRON2016 15

Further resources Lehigh University Benchmark (LUBM) – http://swat.cse.lehigh.edu/projects/lubm/ Source code repository – https://github.com/mmeimaris/EvoGen DIACHRON@EDBT paper – http://ceur-ws.org/Vol-1558/paper9.pdf Meimaris@DIACHRON2016 16

Example of usage User defines: – 5 universities – 10 versions – 0.3% incremental change evenly distributed between versions Meimaris@DIACHRON2016 17

Thank you Questions? 18 Meimaris@DIACHRON2016

EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”

Similar presentations

Presentation on theme: "EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”

Similar presentations

Presentation on theme: "EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”"— Presentation transcript:

Similar presentations

About project

Feedback