1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Slides:



Advertisements
Similar presentations
Serials identification and the electronic environment F. Pellé, ISSN IC Cairo, October 2001.
Advertisements

XML/RDF 2 RDF/XML Resource Description Framework Resource Property Value c:semanticweb c:author c:corby Syntaxe XML.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 6-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Improving Human-Semantic Web Interaction: The Rhizomer Experience Roberto García and Rosa Gil GRIHO - Human Computer Interaction Research Group Universitat.
1 Search and Navigate Web Ontologies Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Aug 22, 2008.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Mirror Mirror on the wall does your repository reflect it all? Peter West and Timothy Miles-Board EPrints Services University of Southampton Southampton,
Edward Pentz Executive Director CrossRef Update 2001 October 10 th, 2001.
Implementation of a Validated Statistical Computing Environment Presented by Jeff Schumack, Associate Director – Drug Development Information September.
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
Deconstructing Cataloging A Web Services Approach to Bibliographic Control Thomas Hickey.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Michigan Electronic Grants System Plus
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Making the System Operational
Chapter 3 Critically reviewing the literature
|epcc| NeSC Workshop Open Issues in Grid Scheduling Ali Anjomshoaa EPCC, University of Edinburgh Tuesday, 21 October 2003 Overview of a Grid Scheduling.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
UKOLN, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
ZMQS ZMQS
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
How To Use OPAC.
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
Representational State Transfer (REST): Representing Information in Web 2.0 Applications this is the presentation Emilio F Zegarra CS 2650.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
Flex Your APEX Implementing Oracle E-Business Suite Descriptive Flexfields in Application Express Shane Bentz InterVarsity Christian Fellowship/USA.
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Microsoft Access.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Course 5001-CAA CASORT DQC Training
We are learning how to read the 24 hour clock
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Addition 1’s to 20.
25 seconds left…...
An introduction to RDF and library linked data Gordon Dunsire Presented at the Dewey Decimal Classification Executive Briefing 15 Sep 2011, London.
Week 1.
We will resume in: 25 Minutes.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Achille Felicetti, Emanuele Bellini, Cinzia Luddi Fondazione Rinascimento.
1 Unit 1 Kinematics Chapter 1 Day
South Dakota Library Network MetaLib Management Basics Updating Resources South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Steffen Staab 1WeST Web Science & Technologies University of Koblenz ▪ Landau, Germany Structured Data on the Web Introduction to.
Linked Data, Discovery and Discoverability John McCullough Senior Product Manager, OCLC December 3, 2014 UCL Discovery and Discoverability.
Enrichment of Library Authority Files by Linked Open Data Sources
CSCI3170 Introduction to Database Systems
Markus Geipel | Culturegraph Authorities| SWIB Culturegraph Authorities Markus Michael Geipel.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
Exposing the University of Economics‘ academic bibliography database as linked data Jitka Hladká, University of Economics, Prague Jindřich Mynarz,
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Linked Open Library Bielefeld Conference, Dr. Silke Schomburg.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Current initiatives in developing library linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and.
| Barbara Pfeifer | VIAF workshop Strasbourg | VIAF partners: Deutsche Nationalbibliothek (DNB) Barbara Pfeifer.
Integrating Data for Archaeology
Cataloging the Internet
Presentation transcript:

1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl

2 1.The Linked Data Challenge 2.Culturegraph Platform 1.Resolving & Lookup 2.Process & Technology 3.RDF Modelling 3.Current State Table of Contents

3 Paradigm shift in modeling knowledge/data Isolated Tables Network beyond organizational boundaries

From isolated Tables to a Semantic Network A naïve Approach 1.Transform from Marc21/Mab2/Pica to RDF 2.Put everything into a Triplestore 3.SPARQL and Reasoner do the magic What is wrong with this approach? 4

5 Format is not Content! If you pour water into a wine-glass does it change to wine? How can you expect old Marc21 data to change into a semantically rich, reasoner-ready piece of information just by changing the data format to RDF?

Connections don’t come for free Some challenges … 1.No universally unique id 2.Often no references to entities, just character- strings 3.No controlled vocabulary -Example: 1.3 Mio. different values for the edition field 4.Changing Cataloging Practices 5.Mistakes, Typos 6

Culturegraph as a signpost A coherent picture on bibliographic data 7 Hidden duplicates Different services Different interfaces ? Culturegraph !

8 Culturegraph as a Platform to interlink Bibliographic Data 1.Open Tools -Open algorithms and code; reuse 2.Integration into existing Workflows -Synchronization of data -Integration of results into original data sources 3.Publication Results -Connections and views, not the entire aggregated Data -Linked Open Data/RDF 4.Persistence of Results -Integration into URN resolving infrastructure 5.Tracking provenance

First Project: Resolving & Lookup Universally Unique and Persistent IDs –Input: 6 main German bibliographic catalogues –Objective: Bundling of manifestations –Service: -Publication of bundles -Minting of URNs for approved bundles -Search bundles using established identifiers –Part of the DDB Eco-System -Support for Data Aggregation 9

The Process 1.Translate into internal format 1.Mapping of Fields to Properties 2.Normalization, Cleaning, Regexp Matching, etc. defined in XML 2.Database ingest > 80 Million Records > One Billion Properties 10 XM L

The Process 3.Generate unique properties > 50 Mio.* -Combinations of Properties defined in XML 4.Group by Unique Properties 5.Merge equivalent Groups ca. 18 Mio. Records* in groups 11 XML * For a first simple Matching Algorithm

The Process (next steps) 5.Check quality & mint persistent Ids 6.Publication as Linked Data 12 Id1 Id2 Id3

Representing bundles of bibliographic records in RDF 13

Namespaces for Internal Bibliographic Description rdf: bibo: dcterms: frbr: foaf: cg: (not established yet) others 14

15

Matching & Bundling  Different matching critieria to be discussed  Example: sameness of ISBN & year  Matching algorithms can be created and modified easily  Matched resources are bundled and underlying algorithm indicated  Bundle Ontology:

17

18

Minting Über-Identifiers  In the last step IDs for bibliographic resources may be minted  urn:nbn:de:cg   Based on reliable, agreed-upon algorithm  Record-resource linking by foaf:isPrimaryTopicOf 19

20

Future prospects –Workflow-Integration Share, enrich and reuse metadata right from the start –New Features/Projects From concrete to visionary… 1.Integration of GND-references (from BEACON-Files and other sources) 2.Computation of links to further resources (Subject Headings, Geo coordinates, Person names, Wikipedia) 3.Authority file for works 4.Crowdsourcing (enrich and correct descriptions of titles, works, persons, etc.) 21

Markus M. Geipel |culturgraph.org | 5. October Summary –Culturegraph will -Match the main German library catalogues -give each bibliographic resource a persistent ID –State -Basic infrastructure up running with good performance (80 Mio. Records Matched in one hour) -All Source Code published on Sourceforge -First Demonstrator Webportal at –Soon to come -January: -Operational Webportal -Publication of first matching results (HTML, RDF, etc.) -Next Year: -Persistent IDs

Appendix: Projektmitarbeiter –Daniel Schäfer (DNB) Projektleitung –Katja Mecklinger (DNB) Stellvertretende Projektleitung, ÖA –Markus Geipel (DNB) Leiter Architektur und Entwicklung –Adrian Pohl (hbz) – ÖA, Ontologie –Pascal Christoph (hbz) – Architektur –Julia Hauser (DNB) - Ontologie –Lars Svensson (DNB) - Ontologie –Jürgen Kett (DNB) – Projektsteuerung, ÖA 23