Construction of Enterprise Knowledge Graphs

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
©2011 MFMER | slide-1 The Linked Clinical Data Project Jyotishman Pathak, PhD HCLS TMO October 27, 2010.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Quick RDF Introduction Scott Streit Terminology – RDF Triple (Also the triple form used in SPARQL) RDF Triple  (Resource, Property,
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Outline Chapter 1 Hardware, Software, Programming, Web surfing, … Chapter Goals –Describe the layers of a computer system –Describe the concept.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Tutorial 8 Sharing, Integrating and Analyzing Data
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
DartGrid Browser-based mapping tool of SQL to RDF Point Template Zhejiang University & OpenLink Software.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE ENVIRONMENT ISWC 2012 – BOSTON FABRICE LACROIX –
Information Integration Intelligence with TopBraid Suite SemTech, San Jose, Holger Knublauch
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Practical RDF Chapter 1. RDF: An Introduction
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
Digital Enterprise Research Institute HADA – An Access Controlled Application for Publishing and Discovering Linked Government Data Owen Sacco.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Querying Structured Text in an XML Database By Xuemei Luo.
RDF and triplestores CMSC 461 Michael Wilson. Reasoning  Relational databases allow us to reason about data that is organized in a specific way  Data.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
Ontology Technology applied to Catalogues Paul Kopp.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic metadata in the Catalogue Frédéric Houbie.
THE LEONS COLLEGE OF LAW1 Organizing Data and Information Chapter 4.
MAIME: A Maintenance Manager for ETL Processes
Linked Data Web that can be processed by machines
DATA INTEGRATION FOR LANGUAGE DOCUMENTATION
Cloud based linked data platform for Structural Engineering Experiment
Semantic Database Builder
Chair of Tech Committee, BetterGrids.org
The Re3gistry software and the INSPIRE Registry
[jws13] Evaluation of instance matching tools: The experience of OAEI
Semantic Annotation service
Database Design Hacettepe University
Databases and Information Management
Business Process Management and Semantic Technologies
Information Retrieval and Web Design
SDMX IT Tools SDMX Registry
Presentation transcript:

Construction of Enterprise Knowledge Graphs Chapter 4

Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding

Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding

A general lifecycle

1 Specification Draw up detailed specification One of the main tasks 1) Identification and analysis of data sources 2) URI design Select data to integrate and publish Data that exists in the organization Needed external data

URI design Put as much information into the URI as possible <http://dbpedia.org/resource/Italy> Use slash instead of hash URI whenever possible Separate TBox (ontology model) from ABox (instances) TBox: Append ”ontology” to base URI (ontology/Person) ABox: Append ”resource” to base URI (resource/Erna)

(2) Modelling Determine ontology to be used for modelling of the domain Reuse as much as possible If no suitable ontology is found, reuse parts If nothing works out, start from scrach (follow NeOn methodology)

A general lifecycle

(3) Data lifting Transfer existing data to RDF Two main activities Transformation Linking GRDDL, RDBS2RDF

Transformation Requirements Full conversion – queries on original data source must be possible on RDF version RDF instances should reflect target ontology structure (as closely as possible) RDB2RDF, GRDDL, Google Refine/OpenRefine (RDF extension), D2R Server, ODEMapster, Stats2RDF

Linking Create links between our knowledge graph and external graphs Steps: 1. Identify KG’s that are suitable as linking targets - manual 2. Discover relationships between items in our- and external KG – tools exists 3. Validate relationships – performed by domain experts Finnes lager med KG’s på ”Linked Data repositories” som CKAN – manuelt

4 Data publication Activities Knowledge graph publication Metadata publication

Knowledge graph publication Store and publish RDF data Virtuoso Universal Server Jena Sesa 4Store YARS Some already include SPARQL endpoints

Metadata publication Include metadata information about the KG Data about structure Data about access Descirption of links between knowledge graphs

A general lifecycle

Data Curation Aims at maintaining and preserving data for reuse over time Cleaning noise Identify errors (40x/50x errors) Broken links Malformed data types (”true” as xsd:int) Bevaring

Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding.

Ontology Authoring - A compentency question-driven approach Real-world ontologies requires manual constructions Requires deep and complex professional knowledge Onthology authors are domain experts not KG experts Onthology authoring is time-consuming and error prone Solution: ”Competency question-driven ontology authoring” (CQOA)

Competency Questions Ontology must be able to answer competency questions (CQ) Natural language sentences Semiformal pattern: ”Which [CE1][OPE][CE2]?” Examples: ”Which mammals eat grass?” (animal ontology) Which processes implement an algorithm” (Software engineering ontology) CQs are especially helpful to ontology authors

Presuppositions ”A special condition that must be met for a linguistic expression to have a denotation” Example: ”Which processes implement an algorithm?” Ontology must satisfy the following presuppositions: Classes ”Process”, ”Algorithm” and property ”Implements” occurs in ontology Ontology allows ”Process” to implement ”Algorithm” Ontology allows ”Process” to not implement ”Algorithm”

Formulation of competency questions Selection: ”Which mammals eat grass?” Binary: Should answer the question with a boolean value (yes/no) Counting question: Should answer with a number. ”How many pizzas has ham or chicken as topping?” Question Polarity: ”Which pizza has no vegetables?” Predicate arity: ”Is it thin or thick bread?” Modifier: ”If I have 3 ingredients, how many pizzas can I make?” Selection question Binary question Counting question Question Polarity Predicate Arity Modifier

Test suite of CQs Table 4.1 (p. 99)

Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding

Semi-automated linking of Enterprise Data for knowledge graphs Activity is part of the ”Data lifting” step in the life cycle Create data linkage Helix: linking information sources Build a knowledge graph for data discovery

Techniques of data discovery Normalize data in different format Index structured data in tables Perform semantic matching between schema elements of structured data Tag data with semantic tags Find linkage points in the data so that users can join between tables

Helix input sources Semi-structured sources (API / RDBMS, triple stores) Online or local file stores Online web API’s

Helix pre-processing Implemented in the HADOOP ecosystem 1. Schema discovery 2. Full-text indexing 3. Linkage discovery Output: Semantically tagged Global Schema Graph

Linkage discovery All-to-all instanced based matching of all attributes Does not scale Turn the problem into IR-problem

Linkage discovery example Si noe om skoler som hadde stemmelokaler. I NY brukte kan KG til å finne fram til sykehus ved hjelp av graf-traversering i stedet for fritekst-søk.