Construction of Enterprise Knowledge Graphs Chapter 4
Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
A general lifecycle
1 Specification Draw up detailed specification One of the main tasks 1) Identification and analysis of data sources 2) URI design Select data to integrate and publish Data that exists in the organization Needed external data
URI design Put as much information into the URI as possible <http://dbpedia.org/resource/Italy> Use slash instead of hash URI whenever possible Separate TBox (ontology model) from ABox (instances) TBox: Append ”ontology” to base URI (ontology/Person) ABox: Append ”resource” to base URI (resource/Erna)
(2) Modelling Determine ontology to be used for modelling of the domain Reuse as much as possible If no suitable ontology is found, reuse parts If nothing works out, start from scrach (follow NeOn methodology)
A general lifecycle
(3) Data lifting Transfer existing data to RDF Two main activities Transformation Linking GRDDL, RDBS2RDF
Transformation Requirements Full conversion – queries on original data source must be possible on RDF version RDF instances should reflect target ontology structure (as closely as possible) RDB2RDF, GRDDL, Google Refine/OpenRefine (RDF extension), D2R Server, ODEMapster, Stats2RDF
Linking Create links between our knowledge graph and external graphs Steps: 1. Identify KG’s that are suitable as linking targets - manual 2. Discover relationships between items in our- and external KG – tools exists 3. Validate relationships – performed by domain experts Finnes lager med KG’s på ”Linked Data repositories” som CKAN – manuelt
4 Data publication Activities Knowledge graph publication Metadata publication
Knowledge graph publication Store and publish RDF data Virtuoso Universal Server Jena Sesa 4Store YARS Some already include SPARQL endpoints
Metadata publication Include metadata information about the KG Data about structure Data about access Descirption of links between knowledge graphs
A general lifecycle
Data Curation Aims at maintaining and preserving data for reuse over time Cleaning noise Identify errors (40x/50x errors) Broken links Malformed data types (”true” as xsd:int) Bevaring
Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding.
Ontology Authoring - A compentency question-driven approach Real-world ontologies requires manual constructions Requires deep and complex professional knowledge Onthology authors are domain experts not KG experts Onthology authoring is time-consuming and error prone Solution: ”Competency question-driven ontology authoring” (CQOA)
Competency Questions Ontology must be able to answer competency questions (CQ) Natural language sentences Semiformal pattern: ”Which [CE1][OPE][CE2]?” Examples: ”Which mammals eat grass?” (animal ontology) Which processes implement an algorithm” (Software engineering ontology) CQs are especially helpful to ontology authors
Presuppositions ”A special condition that must be met for a linguistic expression to have a denotation” Example: ”Which processes implement an algorithm?” Ontology must satisfy the following presuppositions: Classes ”Process”, ”Algorithm” and property ”Implements” occurs in ontology Ontology allows ”Process” to implement ”Algorithm” Ontology allows ”Process” to not implement ”Algorithm”
Formulation of competency questions Selection: ”Which mammals eat grass?” Binary: Should answer the question with a boolean value (yes/no) Counting question: Should answer with a number. ”How many pizzas has ham or chicken as topping?” Question Polarity: ”Which pizza has no vegetables?” Predicate arity: ”Is it thin or thick bread?” Modifier: ”If I have 3 ingredients, how many pizzas can I make?” Selection question Binary question Counting question Question Polarity Predicate Arity Modifier
Test suite of CQs Table 4.1 (p. 99)
Outline Knowledge graph lifecycle Ontology authoring Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
Semi-automated linking of Enterprise Data for knowledge graphs Activity is part of the ”Data lifting” step in the life cycle Create data linkage Helix: linking information sources Build a knowledge graph for data discovery
Techniques of data discovery Normalize data in different format Index structured data in tables Perform semantic matching between schema elements of structured data Tag data with semantic tags Find linkage points in the data so that users can join between tables
Helix input sources Semi-structured sources (API / RDBMS, triple stores) Online or local file stores Online web API’s
Helix pre-processing Implemented in the HADOOP ecosystem 1. Schema discovery 2. Full-text indexing 3. Linkage discovery Output: Semantically tagged Global Schema Graph
Linkage discovery All-to-all instanced based matching of all attributes Does not scale Turn the problem into IR-problem
Linkage discovery example Si noe om skoler som hadde stemmelokaler. I NY brukte kan KG til å finne fram til sykehus ved hjelp av graf-traversering i stedet for fritekst-søk.