KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research, PES University 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 1
Managing Semantic Data in Research Data Services Trend: Publish research data along with paper Digital library of research data How do we manage this data? E.g., our research requires: Several Tera Bytes of data 5 billion data elements so far 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 2
Inverting the Publication Model Past: Description of research results in English Show samples of data “Results, Discussion, Conclusion” framework Present: Publish article and entire dataset No links between article and data 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 3
The Inverted Publication Model Future: Inverted model: Publish self-contained data Publish data analytics Annotate the data with English descriptions where needed Rich linkage between datasets Web of linked data… 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 4
Illustration of Publishing Data 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 5
1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 6
1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 7
Self-Contained Dataset Requirements: Have a proper and consistent structure; Define each element both syntactically and semantically; Specify all the semantic constraints on permissible data values, their types and cardinalities; and Specify data provenance, etc. 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 8
Ontology of Research Data In other words, an ontology of research data Where is the “Dublin Core” of research data? E.g., CERIF 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 9
Why Semantic Data Management? Epistemology of science: Verifying research results Making sense of someone else’s data Documenting the usage scenario of data 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 10
1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 11
Ontology-based multi-domain metadata for research data management using triple stores Full Text: PDF Buy this Article Authors: João Rocha da Silva Universidade do Porto/INESC TEC, Portugal Cristina Ribeiro DEI, Universidade do Porto/INESC TEC, Portugal João Correia Lopes DEI, Universidade do Porto/INESC TEC, PortugalBuy this ArticleJoão Rocha da SilvaUniversidade do Porto/INESC TEC, PortugalCristina RibeiroDEI, Universidade do Porto/INESC TEC, PortugalJoão Correia LopesDEI, Universidade do Porto/INESC TEC, Portugal 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 12
Data on the Web: 5-Star Rating System * Data on the Web: E.g., data published as a set of scanned images ** Machine-Readable Data: E.g., data published as a spreadsheet *** Non-Proprietary Format: E.g., data published as a CSV file **** RDF Data: E.g., a drug database published in RDF ***** Linked RDF Data: Links to other people’s data are included. E.g., the Dbpedia dataset extracted from wikipedia 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 13
Linked Open Data: Principles Use URIs as names of things: E.g, mention author by URI, not just name. Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). Include links to other URIs, so people can discover more things. Sir Tim Berners-Lee 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 14
Linked Open Research Data Services Requirements: Uniquely identify all entities used in datasets such as experiments, specimens, locations, organizations, etc.; Interlink parts of datasets with precise parts of an article in both directions; Classify datasets using a suitable universal classification scheme; Cite other datasets, i.e., refer to them through links; Manage multiple versions and revisions of datasets; and Incorporate a suitable controlled vocabulary or ontology. 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 15
Architecture of Digital Library of Data 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 16
An Ontology for Research Data 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 17
Concluding Remarks Publishing and citing research data will be a common practice Digital libraries need to manage research data Data needs to be self-contained, therefore semantic Linked open data is promising We need a proper ontology of research data Keyword search may be good enough for documents, but not for datasets 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 18
Questions? Thank you! 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 19
1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 20
How? By applying Natural Language Generation Techniques on structure and semantics of Linked Open Datasets and underlying Ontologies. 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Input Triples SubjectPredicateobject 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Ontology for Discourse Structuring 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Classes 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Subclasses 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Individuals 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Ontology as a Chart 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
A few snapshots of the “ MECHANISM ” ontology, in the protégé software, are shown: 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
The 12 subgroups: 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
The functions accommodated under every subgroup: 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Hierarchy of mechanisms: 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Object & data properties being added to each mechanism: 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 33
Subclasses and their Descriptions 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Object properties 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Data properties added 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Linked Open Data Tools Pallavi Karanth ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Data Data ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Web for Data Discovery ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Web for Data Discovery ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Machine Understandable Data ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Machine Understandable Data ©KAnOE, PES Institute of Technology Ram Nickname DOB Location Bangalore 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Open Data and Linked Data Open Data - open access Linked Data Semantic Machine Readable ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Linked Open Data Five Star Rating of Linked Data by Tim Berners Lee ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context About 50 billion triples as of 2013 ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
LOD - IT (Kappa) For Software Developers Technical Helpdesk LOD-IT Video LOD-IT Video LOD-IT Demo LOD-IT Demo ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
LODScape Ontology based Multiple LOD Object Browser DbPedia and Freebase datasets used LODScape Demo LODScape Demo ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Semantic Smart-Aleck Automatic Fact Generator Based on Interestingness Algorithm Uses Dbpedia and Yago datasets SemanticSmartAleck Demo SemanticSmartAleck Demo ©KAnOE, PES Institute of Technology 1/25/ (c) Dr. Kavi Mahesh; Do not copy or distribute
Acknowledgments 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 48
Suggestions? Thank you! 1/25/2016 (c) Dr. Kavi Mahesh; Do not copy or distribute 49