The Dream of a Global Network of Knowledge

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
ICS-FORTH March 30, Waking from a Dogmatic Slumber - A Different View on Knowledge Management for DLs Martin Doerr Alicante, Spain September 21,
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
1 ICS –FORTH, Oct.30-Nov.4,2006, Cyprus Documenting Events in Metadata Martin Doerr, Athina Kritsotaki Center for Cultural Informatics Institute of Computer.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
ICS-FORTH May 23, An Ontological Approach to Digital Preservation Metadata Martin Doerr Foundation for Research and Technology - Hellas Institute.
1 CIDOC CRM + FRBR ER = FRBR OO … an equation for a harmonised view of museum information and bibliographic information Martin Doerr First CASPAR Seminar.
Melbourne, October 13, Electronic Communication on Diverse Data - The Role of the oo CIDOC Reference Model - Martin Doerr (ICS-FORTH, Crete, Greece)
ACQUISITION INFORMATION P52 has current owner (is current owner of) P51 has former or current owner (is former or current owner of) E55 TypeE1 CRM EntityE62.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Introducing Symposia : “ The digital repository that thinks like a librarian”
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
ICS-FORTH October 14, The CIDOC CRM, factor for the integration and presentation of cultural information Martin Doerr Foundation for Research and.
Idea-garden.org SOCIAL SEMANTIC INFORMATION SPACE An Interactive Learning Environment Fostering Creativity Grant agreement no: nd CIDOC CRM-SIG.
Harmonising without Harm: towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology Maja Žumer (University of Ljubljana) & Patrick.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
Using an ontology-driven system to integrate museum information and library information Paper presented on the occasion of the Symposium on Digital Semantic.
The OAI-ORE based data model of Europeana and the Digital Public Library of America: implications for educational publishing Dov Winer MAKASH – Advancing.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
THE YEE CATALOGING RULES: FRBRIZED CATALOGING RULES WITH AN RDF DATA MODEL FOR THE SEMANTIC WEB Presented to ALCTS FRBR Interest Group, ALA Annual 2010,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
A CIDOC CRM – compatible metadata model for digital preservation
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Smithsonian, March 26, International Symposium “Sharing the Knowledge” Martin Doerr Smithsonian, Washington DC March 26, 2003 FORTH, Greece Chair,
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Information Dynamics & Interoperability Presented at: NIT 2001 Global Digital Library Development in the New Millennium Beijing, China, May 2001, and DELOS.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
From FRBR to FRBR OO through CIDOC CRM… A Common Ontology for Cultural Heritage Information Patrick Le Bœuf, National Library of France International Symposium.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
The Semantic Web By: Maulik Parikh.
Harmonized EDM-CRM-FRBRoo
From FRBR to FRBROO through CIDOC CRM…
Harmonized EDM-CRM-FRBRoo
Lifecycle Metadata for Digital Objects
Metadata for research outputs management
Introduction to Semantic Metadata & Semantic Web
Workshop on Semantic Interoperability in e-Science Martin Doerr
Information Networks: State of the Art
Semantic Interoperability in Digital Library Systems
Presentation transcript:

The Dream of a Global Network of Knowledge Martin Doerr Center for Cultural Informatics Institute of Computer Science Foundation for Research and Technology - Hellas Amsterdam, Netherlands November 17, 2011

Introduction Digital Libraries take on different forms and roles. Initially collection management systems, literature collections, digitized resources resource libraries (Perseus etc), on-line corpora In addition, data services scientific data collections research systems (e.g., GIS integrated data) “Metadata” Aggregation Services: a new paradigm using semantic networks integrate diverse forms of information assets and pointers to them for the support of research and interested public New grand challenges Library access paradigm still dominates!

Library, Archive, Museum Information The typical library contents: “The whole stories”, access widely solved! Primary literature: Fiction. Categorical: theories and hypotheses Secondary literature (research results) Facts brought into causal context The typical museum information: “Museum objects rarely talk” Factual documentation of properties and context per object, references, classification Highly heterogeneous, About things taken out of original context, distributed over the world

Library, Archive, Museum Information The typical archive contents: “The needle in the haystack” Primary sources, “bits and pieces” (letters, legal documents, administration acts, images, scientific records). factual, kept in the contextual sequence of creation, as by the creator or responsible. kept due to mandate related to functions. Similarly, library content itself: “What is in the book?” parts of book content (citations!) as primary source of investigation access: not much more than keyword search, if a digital form exists…

Epistemology of Integration exhibit Libraries Museums publish document features & context provide finding aids illustrate, exemplify using are about Books refer to Objects, Sites contain narratives made from pub lish refer to document manage provide finding aids Archives SMRs primary Documents

Traditional Information Access The traditional library task: Collect and preserve documents and provide finding aids The job is solved, when the (one, best) document is handed out. “All you want is in this document”. The digital analogue: implementing “finding aids”: Assumption: User knows a topic, characterized by a noun, or knows associations of a thing he knows it exists. Associations may be known properties, but not directly correlated to the problem to be solved (e.g. “organic farming” for “host-parasite studies”.) Semantic interoperability is limited to the aggregation task: Metadata are mainly homogeneous (DC, VRA, etc.), the only challenge discussed is the matching of terminologies (KOS). …still THE dominant global information integration paradigm

Problems No support to learn from the aggregated sources, to retrieve by contexts, e.g., Who was the employer of Donald Johanson when he found Lucy? e.g., Which plant species are documented for the Black Sea coast for 6000 BC? (Critical climate hypothesis connected to detecting the Black Sea flood in 5600 BC) e.g., Which resolution had Galileo’s telescope when he observed... But understanding lives from relationships. Cultural information has complex relationships. Relationships may be categorical or factual: Categorical (e.g., “smoking causes cancer”). : Richly exploited by Semantic Web technology. Use and integration limited to research results. Not useful for primary research itself. Factual associations concatenate information assets to meaningful (“epistemic”) networks (“stories”): support context-based hypothesis building, cross-disciplinary search etc. (e.g. “John smoked with 20”, …30.. 40”. “John had lung cancer with 60”) Knowledge of Factual associations is the “food” of scholarly research

What Can IT Do Now? Access to categorical knowledge is well solved, if hypotheses have names: subject search, keyword search. content management systems & search engines Increasing account of structured categorical knowledge built in form of thesauri, ontologies (life sciences!) access by terms and browsing broader/narrower terms access by categorical relationships more rarely touched Access to facts is idiosyncratic to diverse systems and limited to: structured data services – no general access paradigm KOS (authors lists, gazetteers) “surfing and browsing” on the Internet or in Digital Libraries

What Can IT Do Now? New promises: Semantic Networks, Semantic Web RDF Triple Stores Open World Systems: Billions of facts under any number of schemata in one database Linked Open Data (LoD): Thousands of triple stores to be accessed Shift to metadata rich of facts from Archives, Libraries, Museums, Digital Libraries from research databases -> difference of data and metadata blurs A global network of knowledge ?... or a perfect intellectual chaos…?

(archive information?) (archive information?) Semantic Networks “…noble simplicity, silent grandeur…” (in a library) Winkelmann’s death time “LAOKOON” (copy) (in Vatican museum) Winkelmann writes…. Winkelmann 1755 Winkelmann sees “Laokoon” unknown Roman (archive information?) “LAOKOON” Winkelmann’s mother Winkelmann’s birth (archive information?) unknown Roman copies “Laokoon” Published Inference (in a library?) Greece Rome Germany space

3 Grand Challenges We need a rich, integrating global schema– a core and extensions of any depth Con: impossible – everybody has his own conceptualization Pro: CIDOC-FRBR work empirically proves opposite “Knitting” the network : without co-ref resolution facts/triples do not connect Con: impossible – automatic means limited, human labor not scalable Pro or Con?: LoD Pro: Human labor scales if massively organized End-users need to query effectively large Triple Stores Con: impossible to write ad hoc rich SPARQL statements, impossible to memorize hundreds of properties Pro: use another, simple global schema for querying

A Global Schema: The CIDOC CRM Developed by the CRM Special Interest Group of the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM) Is an extensible core ontology of 86 classes and 137 properties describing the underlying semantics of over a hundred database schemata and structures from all museum disciplines, archives and libraries, Extended by FRBROO, modeling IFLA’s FRBR, and soon FRSAD,FRAD, (RDFS integration with DC, Europeana EDM, ORE exists) It is result of 15 years interdisciplinary work and agreement. In essence, it is a generic model of recording of “what has happened” in human scale, i.e. a class of discourse. By it we can generate huge, meaningful networks of knowledge by a simple abstraction: history as meetings of people, things and information. An interlingua to transform, transport and merge information from most data structures with clear meaning.

Explicit Events, Object Identity, Symmetry E52 Time-Span E53 Place 7012124 E39 Actor February 1945 P82 at some time within P7 took place at P11 participated in E7 Activity “Crimea Conference” E38 Image E39 Actor P86 falls within P67 is referred to by E65 Creation Event * E39 Actor E31 Document “Yalta Agreement” P14 performed P81 ongoing throughout P94 has created E52 Time-Span 1945-02-11

Data example (RDF-like form) Epitaphios GE34604 (entity E22 Man-Made Object) P30 custody transferred through, P24 changed ownership through Transfer of Epitaphios GE34604 (entity E10 Transfer of Custody, E8 Acquisition Event) P28 custody surrendered by Metropolitan Church of the Greek Community of Ankara (entity E39 Actor) P23 transferred title from P29 custody received by Museum Benaki (entity E39 Actor) P22 transferred title to Exchangeable Fund of Refugees (entity E40 Legal Body) P2 has type national foundation (entity E55 Type) P14 carried out by Exchangeable Fund of Refugees (entity E39 Actor) P4 has time-span GE34604_transfer_time (entity E52 Time-Span) P82 at some time within 1923 – 1928 (entity E61 Time Primitive) P7 took place at Greece (entity E53 Place) nation (entity E55 Type) republic (entity E55 Type) P89 falls within Europe (entity E53 Place) continent (entity E55 Type) Multiple Instantiation TGN data

CRM Top-level classes useful for integration E55 Types refer to / refine E28 Conceptual Objects E41 Appellations E39 Actors refer to / identify E18 Physical Thing participate in affect or / refer to location E2 Temporal Entities E52 Time-Spans E53 Places at within

The CIDOC CRM The types of relationships Identification of real world items by real world names Observation and Classification of real world items Part-decomposition and structural properties of Conceptual & Physical Objects, Periods, Actors, Places and Times Participation of persistent items in temporal entities creates a notion of history: “world-lines” meeting in space-time Location of periods in space-time and physical objects in space Influence of objects on activities and products and vice-versa Reference of information objects to any real-world item

The Hierarchy of Participation Properties P33 used specific technique (was used by) P16 used specific object (was used for) P142 used constituent (was used in) P146 separated from (lost member by) P29 custody received by (received custody through) P96 by mother (gave birth) P25 moved (moved by) P22 transferred title to (acquired title through) P14 carried out by (performed) P23 transferred title from (surrendered title through) P143 joined (was joined by) P28 custody surrendered by (surrendered custody through) P145 separated (left by) P11 had participant (participated in) P144 joined with (gained member by) P99 dissolved (was dissolved by) P12 occurred in the presence of (was present at) P13 destroyed (was destroyed by) P93 took out of existence (was taken out of existence by) P124 transformed (was transformed by) P100 was death of (died in) P112 diminished (was diminished by) P31 has modified (was modified by) P110 augmented (was augmented by) P108 has produced (was produced by) P123 resulted in (resulted from) P95 has formed (was formed by) Generalization P92 brought into existence (was brought into existence by) P98 brought into life (was born) P94 has created (was created by) P135 created type (was created by)

Schema Integration by Property Generalization CIDOC Conceptual Reference Model (CRM) Access all data from any level by CRM property generalization Dublin Core CDWA MIDAS Data Few concepts, high recall Thing Actor was present at Event happened at Special concepts, high precision Acquisition used object automatic data export

Knitting the Network: Extracted Relations & Co-reference Linking documents via co-reference, not hyperlinks! Time- Span Primary link extracted from one document Thing Actor Event CRM: global classification of relationships Deductions Place Fact Integration Johanson's Expedition Discovery of Lucy AL 288-1 Donald Johanson Lucy Cleveland Museum of Natural History Ethiopia Hadar Fact Extraction Documents, Data, Metadata 19 19

Co-reference Knowledge and Reality symbolic level (“vocabulary”) same as (data comparison) same as not same as (direct negotiation) (direct negotiation) interpretion (“speakers”) real world (“objects”) M.Smith born 2-5-65 M.Smith born 2-5-65

Theory of Co-reference A group of “speakers”(a database)” shares unique identifiers for a set of things. Another group “matches” their identifiers to mean the “same as”. The transitive closure of “same as” – “not same as” exhibits “impossible worlds”, the only indication of false knowledge at the data level. Ultimate knowledge is what the author meant by “her/him/it” – a part-of-speech, a database key, an occurrence of a name or URI. Co-reference is primary knowledge, true research, not a “cleaning” issue. Co-reference is more fundamental than schema integration: Supports integration without schema. Schema integration can be seen as co-reference problem. Co-reference is more fundamental than Reference KOS: No description elements are needed. Reference KOS can help co-reference. Co-reference can be distributed! Automatic “duplicate detection” is based on/ improved by co-reference, “Negotiation with the speakers” is the ultimate confirmation = scholarly research.

Query “Friends of a Friend” Co-reference Problem Query “Friends of a Friend” 1. query Content has friend Read output: find “Kostas”, guess “Κώστας” “Kostas” input: “Martin” 2. query Source 1 Content has friend input: “Κώστας” “Κώστας” output: “George” Source 2

Join across sources by transitivity Co-reference via Authority Join across sources by transitivity of co-reference first match local ids query Content Authority service input: “Martin” . resulting link ids L i n k t a b l e “Κώστας” / “Kostas” match Source 1 friend-of-a-friend local ids . Content . second match output: “George” Source 2

Join across sources by transitivity Curating Co-reference without Authority Join across sources by transitivity of co-reference local ids query Content make a co-reference local ids input: “Martin” . “Κώστας” / “Kostas” match . Source 1 friend-of-a-friend local ids make a co-reference Content . output: “George” Source 2

Managing Co-reference Clusters explicit initial “same as” (n-1) explicit redundant “same as” New link connecting clusters ! implicit link ( n(n-1)/2 ) reference occurrence What happens ? “M. Doerr” “M. Dörr” Authority files are good “attractors” of co-reference links, but do not solve co-reference !

A New Service: Global Co-reference Indices Co-reference links should be persistent and public. Primary Co-reference links should be curated and preserved in local databases: “co-reference indices”. Use NER and duplicate-detection algorithms to prepopulate co-reference indices. Use appropriate belief values for generated data. Automated, global, distributed consistency control services are feasible. Co-reference indices are much larger than ontologies, but not larger than search engines. Mobilize general users and domain experts to enhance and verify co-reference information by social tagging to scale-up human labor and precision. Install global supervision by open consortia setting the rules and doing central services. Then the network may converge to consistent global knowledge. Linked Open Data has no co-reference concept so-far. It will lead to a proliferation of URIs.

Last Problem: How to query 250 properties? Humans think consciously in “compressed relations” (G.Fauconnier “The Way We Think”), in particular omitting events: “What do we have from New Guinea?” There are a few “Fundamental Categories” that partition our concepts (Ranganathan, “Who, When, Where, What..) and disambiguate most words e.g., a “”museum” is a “who”, a “where” or a “what” If we implement a simple semantic network with few compressed relationships, we cannot integrate knowledge, because the intermediates are missing, and we cannot manage the immense number of redundant relations If we implement a CIDOC CRM network, end-users cannot write queries Solution: Define a new “datamodel” of “Fundamental Categories” and “Fundamental Relationships” for querying only! implemented as automated deductions from a CRM-based network

How to query with 250 properties? Fundamental Categories: Thing, Actor, Time, Place, Event (E2), Type Fundamental Relationships: has type /is type of is similar to or same with is part of (is member of) / has part (has member) has met from (has founder or has parent) / is origin, founder, parent, provider or creator of had (=owns, keeps) / were owned/kept by at refers to or is about / is referred by/ is referred to at Relationships change interpretation depending on category of domain and range.

Thing is about Thing Path Expression Following this schema, we have implemented over a hundred deductions such as: Thing -> P130F.shows_features_of (0,n) OR P130B.features_are_also_found_on (0,n) -> { E24.Physical_Man-Made_Thing -> P62F.depicts -> Thing OR E24.Physical_Man-Made_Thing -> P128F.carries(0,n) -> E73.Information Object -> P67F.refers_to-> Thing D1.Digital_Object -> {L11B.was_output_of -> D3.Formal_Derivation -> L10F.had_input -> D1.Digital_Object ->}(0,n) L11B.was_output_of -> { D7.Digital_Machine_Event -> P9B.forms_part_of(0,n) ->}(0,1) D2.Digitization_Process -> L1F.digitized -> E18.Physical_Thing } It works!!!

Conclusions After 50 Years of “Artificial Intelligence” research and 15 years “Semantic Web”, the Global Network of Knowledge is still a dream. Today, we have the chance to lay foundations for global knowledge network(s!) with a limited consistency, with a tendency to converge to something more consistent a limited common language, a limited way to globally explore deep relationships For that, we have to Overcome intellectual barriers in conceptual modelling (“quick & dirty”, W3C “beliefs”, ignoring empirical scientific methods, political thinking, domain blindness) Organize domain communities to curate collectively data and co-reference by new awarding methods Invest in technology and methodology for a long data life-cycle by mapping, and transforming data “for ever”, as we do since antiquity…