Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics for Archives & Records Management at OECD

Similar presentations


Presentation on theme: "Semantics for Archives & Records Management at OECD"— Presentation transcript:

1 Semantics for Archives & Records Management at OECD
the Semantically Enriched Archivist Semantics for Archives & Records Management at OECD 45th ICA / SIO Conference, Brussels, 22 May 2019

2 As archivists we often face issues…
Performance: (…by the way, it took the artist 18 hours to find the needle…)

3 because information without context…
…is like a fish without water

4 Solution = Context + Structure Well, that’s exactly the Archivist’s bread and butter,

5 …the Fundamentals of archival description…
Provenance So, if we embed the: Principle of Provenance Principle of Structure Business Context Series Dossier as metadata in… Content Type Status

6 a set of Corporate Taxonomies, we can use them…

7 to semantically empower our search & discovery !

8 Yes, but what about the backlog ?????

9 Manual indexing is no longer an option…

10 We need robots to help us. But is that possible?

11 Yes! How ? Through Semantic Analysis

12 What do the semantic robots do?

13 Semantic Enrichment = Structure the Unstructured

14 How do we develop these robots ?
We develop on a set of test documents (Test corpus) We debug to correct patterns and disambiguate We test on complete corpus and we put in production using Web Services

15 Some OECD Archival Examples
Problem 1: We don’t know what type of document it is! Document Type Classification Problem 2: We don’t have resources to index scanned documents manually! (OCR-ed) Document Indexing Problem 3: Full text search gives too many results! Topics and Geographical Areas Classification

16 Solution1 Document Type Classification
Is this document a Report, an Agenda, an Invoice ? Quality : 95 % Precision – 85 % Recall

17 Solution 2 (OCR-ed) Document Indexing

18 (OCR-ed) Document Indexing …
Type Precision Description 95.05 Record Date 86.17 Original Security 87.13 Cote Exclusion 85.15 OCR Quality % High 79.21 Medium 14.85 Low 5.94 Total 100.00 Overall quality is remarkably good BUT…. 100% is not possible And OCR can be a challenge…

19 OCR = Problems We can normalise dates But titles are more difficult:
(in French, lionceau = lion cub…)

20 BUT… Our biggest issue is: The « COLLECTION » Stamp

21 Solution 3 Topics and Geographical Areas Classification
Identify the 15 Best Topics and Geographical areas using the Central OECD Taxonomies

22 Topics and Geographical Areas Classification
Works remarkably well…. Even on OCR-ed documents! Cartridge V Number of Validations Overall Precision Overall Recall Overall F-Measure 257 434136 99.4 98.6 99.0 279 363149 99.7 98.0 98.9 529 439726 98.2 Total

23 How do we use all these Metadata ?

24 OECD Taxonomies and Ontologies

25 NO !

26 Taxonomies and Ontologies

27 O.N.E Sight – OECD Semantic Discovery Interface

28 Architecture Semantic Layer Data hub

29 Multi-view annotation graphs
We use several semantic robots, based on several different taxonomies (generic, innovation-oriented, etc…) We tag a same resource in different ways We can see a same resource in context from different « semantic » viewpoints

30 The OECD Semantic Timeline
2013 Launch Call for Tender 2014 Taxonomy & Document Type Analysis 2015 OECD.Records Enrichment 2017 OCR-ed Document Semantic Analysis 2018 O.N.E Sight Launch

31 Knowledge Gardeners Conclusion Semantics are:
Indispensable for our profession True enablers for Knowledge Discovery By becoming Semantically Enriched Archivists, Librarians or Information Scientists we really have become : Knowledge Gardeners


Download ppt "Semantics for Archives & Records Management at OECD"

Similar presentations


Ads by Google