Enhancing social tagging with a knowledge organization system Brian Matthews STFC.

Slides:



Advertisements
Similar presentations
Intute Repository Search Project A showcase for UK research output Sophia Jones SHERPA October.
Advertisements

Intute Repository Search Project An iterative approach to developing a national search service to support scholarly communication, teaching and learning.
ESDS Qualidata and QUADS Coordination Louise Corti Online Resources Day 15 November 2005, London.
A centre of expertise in digital information management UKOLN is supported by: EnTag: Enhancing Social Tagging for Discovery K. Golub,
HILT II: Towards Interoperable Subject Descriptions Report to the JISC Terminologies Workshop, February Dennis Nicholson: Centre for Digital Library.
FIBS 2007 Intute: Health and Life Sciences – a new era of online resource discovery Jackie Wickham, Service Manager Carol Collins, Service Officer.
HILT IV Pilot Toolkit Demonstration Emma McCulloch Centre for Digital Library Research CIG 2008, Glasgow.
Advanced Searching Engineering Village.
The JISC vision of research information management Dr Malcolm Read Executive Secretary, JISC.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Discove r Humanities and Social Science Electronic Thesaurus - HASSET Faceted search HASSET is the subject thesaurus that the UK Data Service uses to index.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Evaluation of digital Libraries: Criteria and problems from users’ perspectives Article by Hong (Iris) Xie Discussion by Pam Pagels.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Educational Research Theses : Online Communities and Partnerships Sue Clarke Manager, Cunningham Library, ACER ETD2005: Evolution through discovery 28.
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
JENNIE MATHEWS ST. JOHN’S UNIVERSITY LIS 239 Can the Addition of Social Software Tools & Tags Improve the Productivity of an Academic Library OPAC? 1.
Modern Information Retrieval Computer engineering department Fall 2005.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Cross-linking and Referencing Data and Publications in CLADDIER Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory.
EnTaG Enhanced (social) Tagging for Discovery Doug Tudhope Hypermedia Research Unit, University of Glamorgan Exeter.
Caroline Williams, Executive Director of Intute Andy Priest, Intute Technical Co-ordinator
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
HEFCE/Higher Education Academy/JISC cc-by-sa (uk2.5) Image source – flickr (cc-by) OER and the Open Agenda Malcolm Read, Executive Secretary, JISC.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
WISER : Education Kate Williams, Education Librarian February 2009.
Evidence from Metadata INST 734 Doug Oard Module 8.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Long-term preservation and access: the UK context Michael Day, UKOLN, University of Bath RCUK Workshop on Publication.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
HILT High Level Thesaurus Project Report to the JISC/NSF Conference on HILT Phase I (completed) and HILT Phase II (just starting) Dennis Nicholson: Centre.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
Summon® 2.0 Discovery Reinvented
An Approach to Software Preservation
TRSS Terminology Registry Scoping Study
Organization and Knowledge Management
Web Mining Ref:
EnTag Enhanced Tagging for Discovery Koraljka Golub, Jim Moon,
Information Retrieval
Introduction to Information Retrieval
Presentation transcript:

Enhancing social tagging with a knowledge organization system Brian Matthews STFC

Outline  Who are STFC ?  Controlled Vocabulary  Social Tagging  EnTag –Aims –Glamorgan/UKOLN/Intute Experiment –STFC Experiment  SKOS

Science and Technology Facilities Council  Provide large-scale scientific facilities for UK Science –particularly in physics and astronomy  E-Science Centre – at RAL and DL –Provides advanced IT development and services to the STFC Science Programme –Also includes library and institutional repository –Strong interest in Digital Curation of our science data –Keep the results alive and available –R&D Programme: DCC, CASPAR EnTag

Controlled Vocabulary  Traditional way of providing subject classification –For shelf-marking –For searching –For association of resources  Several different types used, such as –Subject Classification –Keyword lists –Thesaurus  Each has different characteristics

HASSET (I)  UK Data Archive, Univ of Essex  Humanities and Social Science Electronic Thesaurus  Some 1000’s of terms  Structure based on British Standard 5723:1987/ISO (Establishment and development of monolingual thesauri).  preferred terms, broader- narrower relations, associated terms archive.ac.uk/search/hassetSearch.asp

HASSET (II)

HASSET (III)

Observations on using controlled vocabularies  Precise classification of resources –Good for precision and recall  Can exploit the hierarchy to modify query –Using the broader/narrower/related terms  Highly expensive –Requires investment in specialist expertise to devise the vocabulary –Requires investment in specialist expertise to classify resources.  Hard to maintain currency

Social Tagging  The Web 2.0 way of providing search terms  People “tag” resources with free-text terms of their own choosing  Tags used to associate resources together  del.icio.us, flickr  “Folksonomy” –the terms a community choses to use to tag its resources.

Connotea

Connotea – sharing tags

Connotea –Tag Cloud

Observations on Social Tagging  People often use the same tags or keywords (e.g. Preservation, Digital Library) –this makes things which mean the same thing to people easier to find  Cheap way of getting a very large number of resources marked up and classified –Represents the “community consensus” in some sense –“The Wisdom Of Crowds” –Has currency as people update –Tag clouds of popular tags  However, people often use similar but not the same tags: –e.g. Semantic Web, SemanticWeb, SemWeb, SWeb  People make mistakes in tags –mispellings, using spaces incorrectly.  Some tags are more specific than others: –E.g. controlled vocabulary, thesaurus, HASSET  People often associate the same words together with particular ideas in images –these are captured in clusters

EnTag Project Enhanced tagging for discovery  JISC funded project  Partners – UKOLN – University of Glamorgan –STFC –Intute –Non-funded OCLC Office of Research, USA Danish Royal School of Library and Information Science Period: 1 Sep Sep

EnTag Background  Controlled vocabularies –Improve information retrieval and discovery –But, costly to index with, especially the amount of digital documents –Require subject and classification experts  Social tagging –Holds the promise of reducing indexing costs –Uses terms describing how people see the resource –Serendipity –But, tags uncontrolled, missed associations Relating different views Highly personal (“me”, “important”), Quality and ranking Depth of term

EnTag Purpose Investigate the combination of controlled and social tagging approaches to support resource discovery in repositories and digital collections Aim to investigate –whether use of an established controlled vocabulary can help move social tagging beyond personal bookmarking to aid resource discovery

EnTag Objectives Investigate indexing aspects when using only social tagging versus when using social tagging in combination with a controlled vocabulary In particular, does this lead to:  Improve tagging –Relevance of tags (perspective, aspects, specificity, exhaustivity, terminology (linguistic level, semantic level, contextual level) –Consistency –Efficiency (time used, user satisfaction) –Use (tags selected, clouds consulted, order of consultation)  Improve retrieval –Effectiveness (degree of match between user and system terminology) In two different contexts: –Tagging by readers –Tagging by authors

Testing Approach Main focus: –free tagging with no instructions Versus –tagging using a combined system and guidance for users Two demonstrators  Intute digital collection –Major development –Tagging by reader –DDC STFC repository –Complementary development –Tagging by author –A more qualitative approach

Intute

Intute demonstrator: searching

Intute demonstrator : basic tagging

Intute demonstrator: enhanced tagging

EnTag: Intute user study (II) Test setting – 50 graduate students in political science –60 documents, covering up to four topics of relevance for the students Data collection –Logging time spent, selection patterns, –Pre- and post-questionnaires

EnTag: Intute user study (I) Test: comparison of basic and advanced system: –Indexing –Perspective, specificity, exhaustivity –Linguistics (word class, single word/compound, spelling, language) –Consistency –Efficiency (time used, user satisfaction) –Use (tags selected, clouds consulted, order of consultation) –Retrieval efficiency  Degree of match between user and system terminology –user tags, DDC tags, controlled Intute keywords, title terms, text terms

STFC Case Study: EPubs

STFC demonstrator

STFC Author study  A study on a Authors of papers –Smaller number - c –Regular depositors ( > 10 papers each) –Subject experts  Expect that they would want their papers accurately tagged so that they are precisely found  A more qualitative study

Expected Feedback  Relative value of tagging vs. controlled terms –Does it give more satisfactory (accurate, consistent) tags? –Does it lead to the consideration of tags they would not have thought of? –Do they select deeply in the hierarchy? –Is this something they would like to see supported more, and would use? –Is it worth the overhead?  How we should use a combination of tagging and controlled vocab in our system ? To Be Continued…..

Building a Web of Knowledge  Social tagging and controlled vocabulary complement each other –Tagging entry level, quick, does the job, but error prone, fuzzy –Controlled vocabulary, accurate, but slow and expensive  Use one to leverage the other  Use both to build a “Web of knowledge” –The things in the world and their link via their subjects –Get the users to build the means of organising the knowledge

SKOS: Simple conceptual relationships

Conclusions  Controlled vocabulary and Tags complement each other  Hope to get some interesting evidence over the next month as the studies are complete.  Web 2.0 world offers the possibility of combining these results –SKOS a format to use both tags and controlled vocabulary as part of the Web of Linked Data –Also use Web 2.0 to build the vocab themselves.

Questions?

EnTag – Enhanced tagging for discovery Research collaboration between Glamorgan University, UKOLN, INTUTE, CCLRC, OCLC, and DB Financed by JISC Capital Programme Research goal: Investigation of the combination and comparison of controlled and folksonomy approaches to semantic interoperability supporting resource discovery in repositories and digital collections Evaluation in two communities of use: at Intute (Social science), focussing on tagging by readers (postgraduate users), and at CCLRC, focussing on tagging by authors The two studies are carried out as separate projects Intute project use DDC as controlled vocabulary Evaluation by quantitative and qualitative measures

Evaluation Intute – focus and objective Context : tagging as part of information searching and relevance assessment, tagging for recommendation and sharing Hybrid system : investigate whether tagging can be improved by a combination of traditional tag clouds and clouds of controlled descriptors, including interactive tools such as tag suggestions, access to browsing of DDC, etc. Improve tagging Relevance of tags (perspective, aspects, specificity, exhaustivity, terminology (linguistic level, semantic level, contextual level) Consistency Efficiency (time used, user satisfaction) Use (tags selected, clouds consulted, order of consultation) Improve retrieval Effectiveness (degree of match between user and system terminology)