Presentation is loading. Please wait.

Presentation is loading. Please wait.

EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”

Similar presentations


Presentation on theme: "EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”"— Presentation transcript:

1 EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena” m.meimaris@imis.athena-innovation.gr 1 Meimaris@DIACHRON2016

2 Data Web Evolving – Dynamic communities – Fast-paced environments – Open-world data Meimaris@DIACHRON2016 2

3 Problem Tackled Synthetic data widely used for benchmarking – Storage – Querying – Processing Lack of tools and benchmarks for evolving RDF – Versioning Systems – Evolution Management Systems – Change Detection – insert yours here… Meimaris@DIACHRON2016 3

4 Requirements Meaningful data generation – Synthetic data generation abstraction – Identification of characteristics Configurability – Definition of parameters based on characteristics Benchmark workload Community engagement Meimaris@DIACHRON2016 4

5 Parameters We define three non-exhaustive, non- mutually exclusive parameters to drive the generation process – Shift – Monotonicity – Strictness Meimaris@DIACHRON2016 5

6 Parameters Meimaris@DIACHRON2016 6

7 Parameters Meimaris@DIACHRON2016 7

8 Parameters Meimaris@DIACHRON2016 8

9 Lehigh University Benchmark Widely used synthetic data generator Creates universities that contain departments with students, professors, courses etc. Configurable number of universities and starting index Configurable serialization and representation model (RDF/XML in.owl files, DAML) Widely adopted by the data engineering and semantic web community Meimaris@DIACHRON2016 9

10 Lehigh University Benchmark Meimaris@DIACHRON2016 10 http://blog.andric.name/wp-content/uploads/2013/06/univ-bench.owl_.png

11 Our system A generator for synthetic evolving RDF data – Based on existing LUBM generator – Extends LUBM to create evolving versions of original data – Tailors creation process based on user defined parameters – This version: monotonic shifts – Next version: configurable strictness % Meimaris@DIACHRON2016 11

12 Our system Configurable parameters – # of universities – # of consecutive versions – shift (double precision, w.r.t. first-version dataset) Shift is distributed evenly among versions All dataset classes are generated based on weight factors – serialization mode (full vs diffs) Next version – Strictness as % of Characteristic Sets generated from LUBM, spread over versions – Custom query workload Meimaris@DIACHRON2016 12

13 Resulting Data Based on Lehigh University Benchmark (LUBM) User defines: – shift as a positive or negative percentage – number of versions to be created LUBM schema classes are given weights based on their contribution to the dataset’s size Shift percentage is distributed to all LUBM classes based on their weights and the defined shift Meimaris@DIACHRON2016 13

14 System Architecture Meimaris@DIACHRON2016 14

15 Evaluation of Shift Parameter Measure achieved shift w.r.t to desired for increasing number of unis Meimaris@DIACHRON2016 15

16 Further resources Lehigh University Benchmark (LUBM) – http://swat.cse.lehigh.edu/projects/lubm/ Source code repository – https://github.com/mmeimaris/EvoGen DIACHRON@EDBT paper – http://ceur-ws.org/Vol-1558/paper9.pdf Meimaris@DIACHRON2016 16

17 Example of usage User defines: – 5 universities – 10 versions – 0.3% incremental change evenly distributed between versions Meimaris@DIACHRON2016 17

18 Thank you Questions? 18 Meimaris@DIACHRON2016


Download ppt "EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”"

Similar presentations


Ads by Google