Download presentation
Presentation is loading. Please wait.
Published byRodney Robertson Modified over 9 years ago
1
CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop 30 Jan 2014
2
PROVENANCE: A SHORT INTRODUCTION 2
3
Provenance in Fine Art and Antiques Provenance of a painting: Sales receipts Auction and exhibition catalogues Gallery stickers Letters from artists 3 “establishing provenance is essentially a matter of documentation”
4
Provenance of Digital Objects Goal: we aim to express how data was created and evolved who played what role in creating the data how the data was revised over time, by whom what other data was used in the process which tool(s) were used to generate each version 4 Interchangeability is key!
5
Provenance: A Definition Provenance is “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing” (PROV-DM – The PROV Data Model) 5 Prof Luc Moreau
6
The PROV Data Model 6
7
A PROV Example entity(isbn:0002261022, [prov:label="The Glass Palace"]) entity(isbn:2020669595, [prov:label="Le Palais des Miroirs"]) agent(AmitavGhosh) wasAttributedTo(isbn:0002261022, AmitavGhosh) activity(writingTheBook) wasAssociatedWith(writingTheBook, AmitavGhosh, -) wasGeneratedBy(isbn:0002261022, writingTheBook, -) agent(ChristianneBesse) wasAttributedTo(isbn:2020669595, ChristianneBesse, [prov:role='translator']) activity(translation) wasAssociatedWith(translation, ChristianneBesse, -) wasGeneratedBy(isbn:2020669595, translation, -) used(translation, isbn:0002261022, -) The provenance of two books: “The Glass Palace”, written by Amitav Ghosh “Le Palais des Miroirs”, the French translation, done by Christianne Besse, of the book of Amitav Ghosh 7
8
Why Provenance Is Needed? Open Information Systems: Origin of data News and Media Sources and references of news, blogs, etc. Science How the results were obtained Can they be reproduced Manufacturing & business Traceability of faults (e.g. suppliers, designers, contractors) Certificates of origin Health Traceability of medicine, lab test results, organs Policy and Law Compliance Privacy protection 8
9
COLLABMAP – A CROWDSOURCING APPLICATION
11
11
12
38,000 micro-tasks 160 contributors 5,151 buildings 38,000 micro-tasks 160 contributors 5,151 buildings
13
Provenance in CollabMap 13
14
A Provenance Graph from CollabMap 14
15
15
16
Provenance Graphs as Networks Benefits from network analytics Network structure Extrapolations Sampling Similarity Compression Network metrics Number of nodes Number of edges Graph diameter Maximum finite distances (between each pair of node types – entities, activities, agents) Node degree distribution Densification exponent (and more) 16
17
Data Quality Assessment Classifying the trustworthiness of CollabMap data: Calculate network metrics of CollabMap provenance graphs Supervised learning from user votes (on building, route, route sets) Classification accuracy: over 95% 17 Strong correlation between network metrics of provenance graph and data quality in CollabMap
18
Potential Applications Auditing confirm that provenance was properly recorded On-the-fly classifications Detection of abnormality Community detection Identifying key actors and key links Inferring missing links edge directions edge types 18
19
Conclusions Provenance analytics is new and unexplored Promising novel applications We want to test the approach on new data and applications 19
20
Contact Details Trung Dong Huynh tdh@ecs.soton.ac.uk about.me/dong.huynh 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.