Download presentation
Presentation is loading. Please wait.
1
Principles and pragmatics of a Semantic Culture Web Tearing down walls and Building bridges
2
Overview Virtual collections and Semantic Web Semantic collection-search demonstrator –For cultural heritage objects Metadata & vocabulary representation and enrichment Principles for knowledge engineering on the Web
3
Part of large Dutch knowledge-economy project MultimediaN Partners: VU, CWI, UvA, DEN,ICN People : Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus Schreiber Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga Artchive.com, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis) Acknowledgements
4
Hypothesis Semantic Web technology is in particular useful in knowledge-rich domains or formulated differently If we cannot show added value in knowledge-rich domains, then it may have no value at all
5
The Web: resources and links URL Web link
6
The Semantic Web: typed resources and links URL Web link ULAN Henri Matisse Dublin Core creator Painting “Woman with hat SFMOMA
10
Principle 1: semantic annotation Description of web objects with “concepts” from a shared vocabulary
11
Principle 2: semantic search Search for objects which are linked via concepts (semantic link) Use the type of semantic link to provide meaningful presentation of the search results Paris Montmartre PartOf Query “Paris”
12
The myth of a unified vocabulary In large virtual collections there are always multiple vocabularies –In multiple languages Every vocabulary has its own perspective –You can’t just merge them But you can use vocabularies jointly by defining a limited set of links –“Vocabulary alignment” It is surprising what you can do with just a few links
13
Principle 3: vocabulary alignment “Tokugawa” SVCN period Edo SVCN is local in-house ethnology thesaurus AAT style/period Edo (Japanese period) Tokugawa AAT is Getty’s Art & Architecture Thesaurus
14
A link between two thesauri
15
Levels of interoperability Syntactic interoperability –using data formats that you can share –XML family is the preferred option Semantic interoperability –How to share meaning / concepts –Technology for finding and representing semantic links
17
Distributed vs. centralized collection data Minimal requirement: collection object has image URI Preference for external metadata, accessed through protocol such as OAI In practice, external metadata access is still cumbersome
18
http://e-culture.multimedian.nl/demo/search
19
Search strategies Basic search: keyword-oriented Advanced search: –Tweaking default search parameters –Time-related queries Faceted search Relation search –How are two URIs related?
21
Keyword search with semantic clustering 1.Btree of literals plus Porter stem and metaphone index 2.Find resources with matching labels Default resources are “Work”s 3.Find related resources by one-way graph traversal owl:inverseOf is used Threshold used for constraining search 4.Cluster results (group instances)
25
Search: WordNet patterns that increase recall without sacrificing precisions
26
Term disambiguation is key issue in semantic search Post-query –Sort search results based on different meanings of the search term –Mimics Google-type search Pre-query –Ask user to disambiguate by displaying list of possible meanings –Interface is more complex, but more search functionality can be offered
27
Faceted search Use Dublin Core scheme to formulate complex queries Navigate through relevant metadata
28
Faceted search Faceted search
29
What do you need to do to make your collection part of a Semantic Culture Web? Four activities
30
From metadata to semantic metadata 1. Make vocabulary interoperable 2. Align metadata schema 3. Enrich metadata 4. Align vocabulary
31
Activity 1: syntactic vocabulary interoperability Making vocabularies available in the Web standard RDF Many organizations already do this W3C provides the SKOS template to make this almost straightforward Effort required: at most a few days
33
33 Multi-lingual labels for concepts
34
34 Semantic relation: broader and narrower No subclass semantics assumed!
36
Activity 2: aligning the metadata schema Specify your collection metadata scheme as a specialization of Dublin Core With RDF/OWL this is easy/trivial! Cf. DC Application Profiles
37
Aligning VRA with Dublin Core VRA is specialization of Dublin Core for visual resources VRA properties “material.medium” and “material.support” are specializations of Dublin Core property “format ” vra:material.medium rdfs:subPropertyOf dc:fotmat. vra:material.medium rdfs:subPropertyOf dc:format.
38
Activity 3: enriching the metadata Extracting additional concepts from an annotation –Matching the string “Paris” to a vocabulary term Information-extraction techniques exists (and continue to be developed) Effort required can be up to a few weeks –The more concepts, the better, but no need to be perfect!
39
Example textual annotation
40
Resulting semantic annotation (rendered as HTML with RDFa)
41
41 RDFa: embedding RDF in (X)HTML Regular HTML Resulting RDF statements HTML with RDFa
42
Activity 4: aligning the vocabulary Find semantic links between vocabulary links –Derain (ULAN) related-to Fauve (AAT)) Automatic techniques exists, but performance varies Often combination of automatic and manual alignment Effort strongly dependent on vocabularies –But “a little semantic goes a long way” (Hendler)
43
Learning alignments Learning relations between art styles in AAT and artists in ULAN through NLP of art historic texts –“Who are Impressionist painters?”
44
Extracting additional knowledge from scope notes
45
Principles for knowledge engineering on the Web
46
Principle 1: Be modest! Ontology engineers should refrain from developing their own idiosyncratic ontologies Instead, they should make the available rich vocabularies, thesauri and databases available in web format Initially, only add the originally intended semantics
47
Principle 2: Think large! "Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing." Doug Lenat
48
Principle 3: Develop and use patterns! Don’t try to be (too) creative Ontology engineering should not be an art but a discipline Patterns play a key role in methodology for ontology engineering See for example patterns developed by the W3C Semantic Web Best Practices group http://www.w3.org/2001/sw/BestPractices/ SKOS can also be considered a pattern
49
Principle 4: Don’t recreate, but enrich and align Techniques: –Learning ontology relations/mappings –Semantic analysis, e.g. OntoClean –Processing of scope notes in thesauri
50
Principle 5: Beware of ontological over-commitment!
51
Principle 6: Specifying a data model in OWL does ot make it an ontology! Papers about your own idiosyncratic “university ontology” should be rejected at SW conferences The qality of an ontology does not depend on the number of OWL constrcts sed
52
Principle 7: Required level of formal semantics depends on the domain! In our semantic search we use three OWL constructs: –owl:sameAs, owl:TransitiveProperty, owl:SymmetricProperty But cultural heritage has is very different from medicine and bioinformatics –Don’t over-generalize on requirements for e.g. OWL
53
Perspectives Basic Semantic Web technology is ready for deployment Research themes: –Scalability, vocabulary alignment, metadata extraction Web 2.0 facilities fit well: –Involving community experts in annotation –Personalization Social barriers have to be overcome!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.