Presentation is loading. Please wait.

Presentation is loading. Please wait.

Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop.

Similar presentations


Presentation on theme: "Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop."— Presentation transcript:

1 CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop 30 Jan 2014

2 PROVENANCE: A SHORT INTRODUCTION 2

3 Provenance in Fine Art and Antiques Provenance of a painting: Sales receipts Auction and exhibition catalogues Gallery stickers Letters from artists 3 “establishing provenance is essentially a matter of documentation”

4 Provenance of Digital Objects Goal: we aim to express how data was created and evolved who played what role in creating the data how the data was revised over time, by whom what other data was used in the process which tool(s) were used to generate each version 4 Interchangeability is key!

5 Provenance: A Definition Provenance is “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing” (PROV-DM – The PROV Data Model) 5 Prof Luc Moreau

6 The PROV Data Model 6

7 A PROV Example entity(isbn:0002261022, [prov:label="The Glass Palace"]) entity(isbn:2020669595, [prov:label="Le Palais des Miroirs"]) agent(AmitavGhosh) wasAttributedTo(isbn:0002261022, AmitavGhosh) activity(writingTheBook) wasAssociatedWith(writingTheBook, AmitavGhosh, -) wasGeneratedBy(isbn:0002261022, writingTheBook, -) agent(ChristianneBesse) wasAttributedTo(isbn:2020669595, ChristianneBesse, [prov:role='translator']) activity(translation) wasAssociatedWith(translation, ChristianneBesse, -) wasGeneratedBy(isbn:2020669595, translation, -) used(translation, isbn:0002261022, -) The provenance of two books: “The Glass Palace”, written by Amitav Ghosh “Le Palais des Miroirs”, the French translation, done by Christianne Besse, of the book of Amitav Ghosh 7

8 Why Provenance Is Needed? Open Information Systems:  Origin of data News and Media  Sources and references of news, blogs, etc. Science  How the results were obtained  Can they be reproduced Manufacturing & business  Traceability of faults (e.g. suppliers, designers, contractors)  Certificates of origin Health  Traceability of medicine, lab test results, organs Policy and Law  Compliance  Privacy protection 8

9 COLLABMAP – A CROWDSOURCING APPLICATION

10

11 11

12 38,000 micro-tasks 160 contributors 5,151 buildings 38,000 micro-tasks 160 contributors 5,151 buildings

13 Provenance in CollabMap 13

14 A Provenance Graph from CollabMap 14

15 15

16 Provenance Graphs as Networks Benefits from network analytics Network structure Extrapolations Sampling Similarity Compression Network metrics Number of nodes Number of edges Graph diameter Maximum finite distances (between each pair of node types – entities, activities, agents) Node degree distribution Densification exponent (and more) 16

17 Data Quality Assessment Classifying the trustworthiness of CollabMap data: Calculate network metrics of CollabMap provenance graphs Supervised learning from user votes (on building, route, route sets) Classification accuracy: over 95% 17 Strong correlation between network metrics of provenance graph and data quality in CollabMap

18 Potential Applications Auditing  confirm that provenance was properly recorded On-the-fly classifications Detection of abnormality Community detection Identifying key actors and key links Inferring missing links  edge directions  edge types 18

19 Conclusions Provenance analytics is new and unexplored Promising novel applications We want to test the approach on new data and applications 19

20 Contact Details Trung Dong Huynh tdh@ecs.soton.ac.uk about.me/dong.huynh 20


Download ppt "Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop."

Similar presentations


Ads by Google