Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture.

Slides:



Advertisements
Similar presentations
R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
Advertisements

IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
A Single Entrance for Access to Cultural Data (Archives, Museums, Libraries, Heritage) at the French Ministry of Culture Knowledge.
STELLAR Introduction Ceri Binding, Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.
Associative and Spatial Relationships in Thesaurus-based Retrieval Harith Alani 1, Christopher Jones 2, Douglas Tudhope 1 1 School of Computing, University.
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
STELLAR Introduction Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Engines and Information Retrieval
Learn how to search for information the smart way Choose your own adventure!
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Information Retrieval
Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.
PPAS 3190: Introduction to Library Research Timothy Bristow – Scott Library Political Science & Public Policy Librarian.
Exercise Your your Library ® Smart Searching UW Library Winter 2007.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Digging Up Data: The Archaeotools project, Faceted Classification and Natural Language Processing in an archaeological context. Stuart Jeffrey, Julian.
Stuart Jeffrey, Julian Richards, Fabio Ciravegna Stewart Waller, Sam Chapman, Ziqi ZhangTony Austin. STAR/Archaeotools Workshop, York, 9 th May Stuart.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
KOS-based tools for archaeological dataset interoperability: NKOS Workshop, ECDL 2010 C. Binding, K. May 1, D. Tudhope, A. Vlachidis Hypermedia Research.
Harmonising without Harm: towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology Maja Žumer (University of Ljubljana) & Patrick.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
COINE Cultural Objects in Networked Environments.
Easy-to-Understand Tables RIT Standards Key Ideas and Details #1 KindergartenGrade 1Grade 2 With prompting and support, ask and answer questions about.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Department of Chemical Engineering Project IV Lecture 3: Literature Review.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Semantic Annotation of Grey Literature from an Archaeological Digital Library Andreas Vlachidis, Doug Tudhope Hypermedia Research Unit University of Glamorgan.
Directions for Hypertext Research: Exploring the Design Space for Interactive Scholarly Communication John J. Leggett & Frank M. Shipman Department of.
Enhancing social tagging with a knowledge organization system Brian Matthews STFC.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
CH 42 DEVELOPING A RESEARCH PLAN CH 43 FINDING SOURCES CH 44 EVALUATING SOURCES CH 45 SYNTHESIZING IDEAS Research!
The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Using TEL’s Expanded Academic ASAP Christa Lewis IS 551 December 5, 2006.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Information Retrieval
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Approach to building ontologies A high-level view Chris Wroe.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004
STAR, STELLAR and SKOS Ceri Binding, Phil Carlisle, Keith May, Doug Tudhope, Andreas Vlachidis University of Glamorgan and English Heritage.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
ARIADNE is funded by the European Commission's Seventh Framework Programme Interoperability Holly Wright.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
When ontology and reality collide:
Conal Tuohy Topic NZETC Conal Tuohy
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Aim: How do the Social Studies help us learn about the lives of people? You are an archaeologist in the year You make an important discovery and.
TextCrowd – Collaborative semantic enrichment of text-based datasets
TRSS Terminology Registry Scoping Study
C. Binding, K. May1, R. Souza, D. Tudhope, A. Vlachidis
Presentation transcript:

Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture

Acknowledgements Research team members and collaborators –Ceri Binding (University of Glamorgan) –Andreas Vlachidis (University of Glamorgan) –Keith May, English Heritage (EH) –Stuart Jeffrey, Julian Richards, Archaeology Data Service (ADS) Archaeology Department, University of York

Collaborative acknowledgements Harith AlaniSteve Harris Paul Beynon-DaviesTraugott Koch Dorothee BlockMarianne Lykke Daniel CunliffeBrian Matthews Emlyn EverittStuart Lewis Kora GolubHugh Mackay Rachel HeeryJim Moon Chris JonesRenato Souza Iolo JonesCarl Taylor

Information Discovery Literal string match (eg Google) is good for some kinds of searches: specific concrete topics where all we want are some relevant results - not care how many we miss! Google less good at more conceptual (re)search topics where important to be sure not missed anything important eg medical, legal, scholarly research Searching data and documents a recent general research focus variously termed... eScience, Digital Humanities, Cyberinfrastructure - data.gov.uk a recent initiative for government data

Words are tricky! "When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean--neither more nor less." (Lewis Carroll) Various potential problems with literal string search Different words mean same thing Same word means different things Trivial spelling differences can affect results or a particular choice of synonym or a slightly different perspective in choice of concept - How to address this issue?

This lecture Brief look at the history of work on this topic at Glamorgan Examples from recent AHRC funded research on cross search of different archaeological datasets and reports - try to give a general flavour Discuss some current research issues

This lecture Part of a general move towards a (more) machine understandable Web

Machine readable vs machine understandable What we say to the machine: The Cat in the Hat ISBN: Author: Dr. Seuss Publisher: Collins What the machine understands: 

(More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: Author: Dr. Seuss Publisher: Collins What the machine understands: 

(More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: Author: Dr. Seuss Publisher: Collins What the machine understands:  Book ID Author Publisher conceptual structure (ontology)

(More) machine understandable What we say to the machine: Title:The Cat in the Hat ISBN: Author: Dr. Seuss Publisher: Collins What the machine understands:  Book ID Author Publisher conceptual structure (ontology) vocabularies for terminology and knowledge organization Theodor Geisel

Knowledge Organization Systems eg classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships

Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus

Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus

Origins of research Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD) Experimental museum exhibit Extract of collections database - Pontypridd Historical and Cultural Centre

Origins of research Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD) Experimental museum exhibit Extract of collections database - Pontypridd Historical and Cultural Centre Hard to generalise and maintain if based on manual linking of information  dynamic implicit links In this case based on Social History and Industrial Classification (SHIC) and indexing for place, time period

Indexing on subject, period, place

Similar or different?

FACET - Faceted Access to Cultural hEritage Terminology Subsequent EPSRC funded project with Science Museum, National Railway Museum and J. Paul Getty Trust - Art & Architecture Thesaurus (AAT) Aims: Integration of thesaurus into user interface Semantic query expansion

FACET research question “The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance” (Toni Petersen, AAT Director) Example Query: mahogany, dark yellow, brocading, Edwardian, armchair for National Railway Museum collection - eg royal carriageroyal carriage

FACET Web Demonstrator - Semantic Query Expansion

FACET Web Demonstrator - how to generalise? FACET - more sophisticated search but still a single database How to generalise to multiple datasets and thesauri? How to connect with text documents?

STAR Semantic Technologies for Archaeological Resources AHRC funded project(s) with English Heritage and the ADS Generalise previous methods to :- Different datasets with different structures Reports of excavations ADS OASIS Grey Literature Library (unpublished reports)OASIS Online AccesS to the Index of archaeological investigationS

STAR Semantic Technologies for Archaeological Resources Currently excavation datasets isolated with different terminology systems Currently no connection with grey literature excavation reports Aims Cross search at a conceptual level archaeological datasets with associated grey literature

STAR Semantic Technologies for Archaeological Resources Need for integrating conceptual framework and terminology control via thesauri and glossaries EH (Keith May) designed an ontology describing the archaeological process

The archaeological process Events in the present and events in the past, related by the place in which they occur and the physical remains in that place Activities in the present investigate the remains of the past (affecting them in the process)

Events in the present Excavation // Drawing and Photography Survey // Sampling Treatments and Processing Classification // Grouping and Phasing Measuring including scientific dating Recording of observations Dissemination // Interpretation // Analysis

Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes

Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes Events concerned with object production, disposal or loss (how ‘finds’ produced and later deposited in archaeological context)

Events in the past have results in the present Events shaping natural environment geological, environmental and biological processes Events concerned with object production, disposal or loss (how ‘finds’ produced and later deposited in archaeological context) Construction, modification and destruction events relating to human buildings

Events in the past have results in the present Conceptual framework to model these archaeological events (an EH extension of a standard cultural heritage ontology) Need to move beyond simple Who – What – Where – When model typically used in state of the art cultural heritage databases

Typical ‘Advanced Search’ model - does not deal with events Typical Who - What - Where - When advanced search user interface Who O and O or What O and O or Where O and O or When Resources

Typical ‘Advanced Search’ limitations Typical Who - What - Where - When model - needs more semantics Who O and O or What O and O or Where O and O or When Resources Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth)

Typical ‘Advanced Search’ limitations Need to define relationships between entities and allow multiple connections Who O and O or What O and O or Where O and O or When Resources Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

Typical ‘Advanced Search’ limitations Assigning dates and classifying are important ‘events’ in the present - outcomes of the archaeological process (interpretations can differ) Who O and O or What O and O or Where O and O or When Resources Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

Broader conceptual framework (ontology) Modeling multiple interpretations – linked to underlying data within the ontology  ‘multivocality’ in archaeology Who O and O or What O and O or Where O and O or When Resources Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited? Who made dating judgment? Archaeological ‘find’ (eg coin) Archaeological ‘context’ (eg hearth) When photo was taken? When ‘find’ originally made? When ‘find’ deposited?

Broader conceptual framework (ontology) EH extension of CIDOC Conceptual Reference Model (CRM) explicit modelling of archaeological events – complicated!

STAR general architecture STAR web services EH Thesauri and CRM ontology Archaeological Datasets (CRM) Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets STAR client applications STAR datasets (expressed in terms of CRM) Grey literature indexing (CRM) Grey literature indexing (CRM)

Natural Language Processing (NLP) of archaeological grey literature Extract key concepts in same semantic representation as for data. Allows unified searching of different datasets and grey literature in terms of same underlying conceptual structure “ditch containing prehistoric pottery dating to the Late Bronze Age”

NLP output – what the machine sees!

STAR Demonstrator – search for a conceptual pattern An Internet Archaeology publication on one of the (Silchester Roman) datasets we used in STAR discusses the finding of a coin within a hearth. -- does the same thing occur in any of the grey literature reports? Requires comparison of extracted data with NLP indexing in terms of the ontology.

STAR Demonstrator – search for a conceptual pattern Research paper reports finding a coin in hearth – exist elsewhere?

Current issues and goals a)Apply research outcomes in practice (knowledge transfer) semantic terminology services ‘rubbish example’ using the ADS Archaeology Image Bank b)NLP challenges negation!  Negative findings? c)Multivocality in archaeology broader picture of the research issues

Archaeology is rubbish! Google search for archaeology rubbish

ADS Archaeology Image Bank Example No results when search for rubbish or refuse – what to do?

STAR STAR Semantic Terminology Services - concept expansion (as web service)  midden

MIDDEN n dunghill, refuse heap midden dunghill, compost heap, refuse heap,... muddle, mess... dirty slovenly person... midden mavis or midden raker --- searchers of refuse heaps (Concise Scots dictionary - Mairi Robinson, Scottish National Dictionary Association)

ADSADS Archaeology Image Bank Example No results when search for rubbish or refuse – try midden!

NLP challenges – not just negation detection

NLP challenges – need for negative findings!

Archaeologists have to plan for the future “Research excavations, therefore, must be planned for posterity, eschewing the quick answer and setting up a framework of excavation and recording which can be handed over, extended, modified and improved over decades and in some cases, centuries.” Techniques of Archaeological Excavation, Philip Barker (1993) Archaeology in particular lends itself to the reuse of (excavation) data Connect interpretations with the underlying data Revisit previous archaeological interpretations and findings - excavations inevitably based on a limited sample

Archaeological Multivocality - more voices involved than just original project team? Expose (invisible) datasets for wider analysis and reuse Meta studies comparing different excavation projects Connect datasets and wider grey literature – look for wider patterns Open up a broader range of research questions that might be answered when we connect currently isolated excavation datasets Allow different communities to share data and expertise

Words are tricky! We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves. (John Locke) Emergent classification? – an outcome of the archaeological process - both constructing and constraining the world Map between different classifications and glossaries rather than one imposed standard?

Words are tricky! Words are not as satisfactory as we should like them to be, but, like our neighbours, we have got to live with them and must make the best and not the worst of them. (Samuel Butler) Major issues remain but knowledge organization systems offer some current assistance for moving beyond literal string search and making the best of the words we have to use