Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar

Overview  Terminology  Motivation  My approach  Evaluation  Conclusion  Future work

 Linked Data is about using the Web to connect related data that was not previously linked.  Resource Description Framework is represented by sets of subject-predicate-object triples, where the elements may be URIs, literals https://www.insight-centre.org/users/andrejs-ābelehttps://www.insight-centre.org/users/andrejs-ābele foaf:name “Andrejs Ābele”  Linked Open Data Cloud is a collection of Linked Data resources that are open and freely available Terminology

Linked Open Data Cloud Diagram  Publications  Life Sciences  Cross-Domain  Social Networking  Geographic  Government  Media  User-Generated Content  Linguistics

Motivation  Linked Data is hard to understand for humans  Only a small number of datasets provide a human readable overview or comprehensive metadata  When adding a new dataset to the LOD cloud, connections have to be identified to as many other relevant LOD datasets as possible  LOD Cloud Diagram relays on human classification

Existing solutions for LD profiling [1] http://demo.seco.tkk.fi/aether/#/http://demo.seco.tkk.fi/aether/#/ [2] https://www.hpi.uni-potsdam.de/naumann/sites/prolod++/#https://www.hpi.uni-potsdam.de/naumann/sites/prolod++/# [3] http://lodlaundromat.org/http://lodlaundromat.org/ [4] http://stats.lod2.eu/http://stats.lod2.eu/ [5] http://demo.seco.tkk.fi/aether/#/http://demo.seco.tkk.fi/aether/#/ [6] http://rdfstats.sourceforge.net/http://rdfstats.sourceforge.net/  Loupe 1  ProLOD++ 2  LOD Laundromat 3  LODStat 4  Aether 5  RDF-stats 6

Domain identification method using DBpedia Topic Extraction Domain Identification Domain

Input : Bio2RDF-sgd Description: The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae 1.Most frequent terms (sgd_vocabulary, query, proper, phenotype, experiment) 2.Literal containing one of the terms ("protein [sgd_vocabulary:protein]@en") 3.Identify DBpedia concept (http://dbpedia.org/resource/Protein) 4.Identify Category (http://dbpedia.org/resource/Category:Molecular_biology)http://dbpedia.org/resource/Category:Molecular_biology 5.Identify domain under which category fits best (Biology =>Life Sciences) Example

Datasets LOD cloud datasets (annotated in LOD Cloud Diagram) 405 datasets, 9 domains Media (13) Linguistics(34) Publications (111) Social Networking (41) Geography (29) Government (65) Cross Domain (25) User Generated (52) Life Sciences (35)

1.Extract URIs of properties and classes from datasets 2.Use classes and properties as features 3.Classify using Support Vector Machine classifier 4.Use Precision and Recall as metrics Extended baseline Enrich the data with human annotated tags from Linked Open Vocabularies 1 1. http://lov.okfn.org/dataset/lov/ Baseline approach

Precision and Recall for different domains using SVM

Correctly Classified Instances

Conclusion Does not require training Works with new and customized vocabularies Works only if datasets contain literals Can not identify User-Generated Content and Cross-Domain Using just classes and properties is hard to improve results above 75%

Future Work Evaluate alternative classification algorithms Use Literals and URIs for classification Classify datasets in more specific subdomains

Thank you!

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Similar presentations

Presentation on theme: "Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Similar presentations

Presentation on theme: "Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar."— Presentation transcript:

Similar presentations

About project

Feedback