Download presentation
Presentation is loading. Please wait.
1
Construction of Enterprise Knowledge Graphs
Chapter 4
3
Outline Knowledge graph lifecycle Ontology authoring
Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
4
Outline Knowledge graph lifecycle Ontology authoring
Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
5
A general lifecycle
6
1 Specification Draw up detailed specification One of the main tasks
1) Identification and analysis of data sources 2) URI design Select data to integrate and publish Data that exists in the organization Needed external data
7
URI design Put as much information into the URI as possible
< Use slash instead of hash URI whenever possible Separate TBox (ontology model) from ABox (instances) TBox: Append ”ontology” to base URI (ontology/Person) ABox: Append ”resource” to base URI (resource/Erna)
8
(2) Modelling Determine ontology to be used for modelling of the domain Reuse as much as possible If no suitable ontology is found, reuse parts If nothing works out, start from scrach (follow NeOn methodology)
9
A general lifecycle
10
(3) Data lifting Transfer existing data to RDF Two main activities
Transformation Linking GRDDL, RDBS2RDF
11
Transformation Requirements
Full conversion – queries on original data source must be possible on RDF version RDF instances should reflect target ontology structure (as closely as possible) RDB2RDF, GRDDL, Google Refine/OpenRefine (RDF extension), D2R Server, ODEMapster, Stats2RDF
12
Linking Create links between our knowledge graph and external graphs
Steps: 1. Identify KG’s that are suitable as linking targets - manual 2. Discover relationships between items in our- and external KG – tools exists 3. Validate relationships – performed by domain experts Finnes lager med KG’s på ”Linked Data repositories” som CKAN – manuelt
13
4 Data publication Activities Knowledge graph publication
Metadata publication
14
Knowledge graph publication
Store and publish RDF data Virtuoso Universal Server Jena Sesa 4Store YARS Some already include SPARQL endpoints
15
Metadata publication Include metadata information about the KG
Data about structure Data about access Descirption of links between knowledge graphs
16
A general lifecycle
17
Data Curation Aims at maintaining and preserving data for reuse over time Cleaning noise Identify errors (40x/50x errors) Broken links Malformed data types (”true” as xsd:int) Bevaring
18
Outline Knowledge graph lifecycle Ontology authoring
Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding.
19
Ontology Authoring - A compentency question-driven approach
Real-world ontologies requires manual constructions Requires deep and complex professional knowledge Onthology authors are domain experts not KG experts Onthology authoring is time-consuming and error prone Solution: ”Competency question-driven ontology authoring” (CQOA)
20
Competency Questions Ontology must be able to answer competency questions (CQ) Natural language sentences Semiformal pattern: ”Which [CE1][OPE][CE2]?” Examples: ”Which mammals eat grass?” (animal ontology) Which processes implement an algorithm” (Software engineering ontology) CQs are especially helpful to ontology authors
21
Presuppositions ”A special condition that must be met for a linguistic expression to have a denotation” Example: ”Which processes implement an algorithm?” Ontology must satisfy the following presuppositions: Classes ”Process”, ”Algorithm” and property ”Implements” occurs in ontology Ontology allows ”Process” to implement ”Algorithm” Ontology allows ”Process” to not implement ”Algorithm”
22
Formulation of competency questions
Selection: ”Which mammals eat grass?” Binary: Should answer the question with a boolean value (yes/no) Counting question: Should answer with a number. ”How many pizzas has ham or chicken as topping?” Question Polarity: ”Which pizza has no vegetables?” Predicate arity: ”Is it thin or thick bread?” Modifier: ”If I have 3 ingredients, how many pizzas can I make?” Selection question Binary question Counting question Question Polarity Predicate Arity Modifier
23
Test suite of CQs Table 4.1 (p. 99)
24
Outline Knowledge graph lifecycle Ontology authoring
Semi-atomated linking of Enterprise Data for Virtual knowledge graph Fokus på å lage knowledge graphs med menneskelig innblanding
25
Semi-automated linking of Enterprise Data for knowledge graphs
Activity is part of the ”Data lifting” step in the life cycle Create data linkage Helix: linking information sources Build a knowledge graph for data discovery
26
Techniques of data discovery
Normalize data in different format Index structured data in tables Perform semantic matching between schema elements of structured data Tag data with semantic tags Find linkage points in the data so that users can join between tables
27
Helix input sources Semi-structured sources (API / RDBMS, triple stores) Online or local file stores Online web API’s
28
Helix pre-processing Implemented in the HADOOP ecosystem
1. Schema discovery 2. Full-text indexing 3. Linkage discovery Output: Semantically tagged Global Schema Graph
29
Linkage discovery All-to-all instanced based matching of all attributes Does not scale Turn the problem into IR-problem
30
Linkage discovery example
Si noe om skoler som hadde stemmelokaler. I NY brukte kan KG til å finne fram til sykehus ved hjelp av graf-traversering i stedet for fritekst-søk.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.