1 The BT Digital Library A case study in intelligent content management Paul Warren

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

AVATAR: Advanced Telematic Search of Audivisual Contents by Semantic Reasoning Yolanda Blanco Fernández Department of Telematic Engineering University.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Engineering Village ™ ® Basic Searching On Compendex ®
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Search Engines and Information Retrieval
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Xyleme A Dynamic Warehouse for XML Data of the Web.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
Information & Library Services SwetsWise User Guide Emma Crowley Senior Academic Services Librarian
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Using WilsonSelect. WilsonSelect (or WilsonSelectPlus) is a database of full-text articles from magazines and journals. It covers a very wide range of.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Using Social Care Online: an overview Version 1.0 April 2015.
Using ProQuest Databases Jackson Community College Atkinson Library.
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Search on Journal of Dairy Science ® An Overview April
Overview of Search Engines
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
1 Building Semantic Applications Paul Warren
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Library Information and Services CSE Librarian: Jason Neal Phone: Office: B 03 E Nedderman Hall UTA.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Using Electronic Sources to Find Information Kay Grieves Information Services, 2002.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
1 nlresearch.com The First ReSearch Engine: Northern Light® Susan M. Stearns Director of Enterprise Marketing March, 1999.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WISER: Keeping up to date Kate Petherbridge & Judy Reading.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Information Retrieval
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
How to search for relevant information. Preparing to search: PLAN WHAT am I looking for? WHY do I want it? WHEN? Time period? HOW? Document type? What.
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Information Integration for Digital Libraries
Search Techniques and Advanced tools for Researchers
ISI Web of Knowledge Early updates
Elsevier Engineering Information
ece 627 intelligent web: ontology and beyond
Advanced search techniques in databases
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

1 The BT Digital Library A case study in intelligent content management Paul Warren

2 Semantics in content management limitations of conventional technology the users’ view using the technology enhancing the experience the starting point

3 Semantics in content management Intelligent content management

4 The need for semantics Content management systems need to: index by meaning, not just text combine information from heterogeneous sources Users need information: identified by semantics, not just keywords precise and complete selected by their interests and their task context defined semantically from heterogeneous sources, accessed uniformly semantics in content management

5 Higher precision, greater recall Precision Find me information about Washington the man, not the state or city Find me information about a company called X which operates in industry Y Recall Finding all relevant documents E.g. ask for information about ‘George W Bush’ and be given documents on ‘the President’ semantics in content management

6 Interests and context Need information about Jaguar? interested in cars, the natural world, South America … with a context defined by current activities Not just about searching interest & context to share information … … and to push information to user … plus many integrated applications semantics in content management

7 Too much relevant information Documents with duplicate information. Goal to: extract what is unique from each document help users prioritise their reading Need to: aggregate from disparate sources remove duplication present meaningfully classified summarised semantics in content management

8 The starting point The BT digital library before SEKT

9 The BT digital library the starting point Two major document databases 5 million articles – abstracts plus some full text Originally text-based with some attribute- based querying: e.g. author, date information spaces defined by queries

10 An information space the starting point Query-defined alerts ed weekly as database updated Public info spaces anyone can subscribe forming communities Private info spaces defined by user

11 Personalisation the starting point Personalised entry page shows user’s info spaces, journals of interest, recent reading and ‘jottings’ (bookmarks)

12 Limitations of conventional technology Why we need semantics

13 Queries Text string ‘knowledge management’ 4161 ABI Inspec records Descriptor ‘knowledge management’ 3213 ABI Inspec So careful query formulation needed … … but average query length is 1.8 words Little use of ‘advanced’ functions … … 80% queries use no query modifier limitations of conventional technology

14 Poor relevancy of results A simple keyword search tends to offer high recall and low precision. Ambiguity in the query, e.g. synonymy where several terms could describe the same concept, homonymy where a word has many different meanings. Relevant documents retrieved |A| Non relevant documents retrieved |B| Non relevant Documents |C| Relevant Documents |D| Relevant documents Retrieved documents Recall = |A|/(|A|+|D|) (proportion of relevant documents retrieved) Precision = |A|/(|A|+|B|) (proportion of retrieved documents that are relevant) limitations of conventional technology

15 Presenting results Searches Only 17% results read after 1 st page … no more than 10 results checked Same query, same results regardless of user’s preference & context Document descriptors Lots – many irrelevant to readership Where relevant, not fine-grained e.g. knowledge management limitations of conventional technology

16 Enhancing the experience What semantics can offer a digital library

17 A new experience enhancing the experience Hybrid searching concepts, instances, information spaces, and text search results meaningfully classified Automatic annotation identifying companies, people, … hyperlinked to a knowledgebase Topics – finer grained than document descriptors semi-automatically generated automatic document classification An extended corpus crawling the Web for related pages Web pages added to share knowledge

18 A better experience Semantics to improve precision & recall Washington the man, not city or state references to the President not just George W Bush Information spaces defined on semantic queries not just text queries Taking account of interests and context semantically defined Natural language results enhancing the experience

19 The users’ view What users want

20 Initial questionnaire & focus group Users want: Improved searching and indexing based on a user’s profile integrated into working environment To stay in control advise but not decide frustrated by too many alerts the users’ view

21 Features – what the users think very important / important summarising results of search with personal interests and preferences advanced attribute-based search looking beyond the library suggesting candidate topic areas highlighting & hyperlinking named entities natural language queries the users’ view

22 After that … Important / minor importance retrieving similar articles re-using old queries agent searches access from a range of devices the users’ view

23 Using the technology Applying semantics to the BT Digital Library

24 Search: knowledge management using the technology knowledge management as: info space topic term With clustered results

25 A complex query using the technology microsoft 2 companies term semantic web info space topic term sem web info space Microsoft-authored Microsoft as term

26 Querying a concept alloy a term but also - concept in ontology … with properties … definition … sub-concepts using the technology

27 Document with markup using the technology Identified: Bhargava Waterbury Connecticut USA IEE Click for related documents, e.g. by Bhargava

28 Categorising results … using the technology

and more categories using the technology

30 In summary Semantic technology - provides intelligence in content management - enhances the user experience - satisfies proven user needs