H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.

Slides:



Advertisements
Similar presentations
Support.ebsco.com Nursing Reference Center Tutorial.
Advertisements

Indexing challenges in work place information retrieval Marianne Lykke Nielsen & Anna Gjerluf Eslau NKOS 2006 Controlled, human indexing vs full-text indexing.
Chapter 5: Introduction to Information Retrieval
R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Advanced Searching Engineering Village.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
June 12, 2015 ©2005 Ovid Technologies Jörn Hope Ovid.
WMES3103 : INFORMATION RETRIEVAL
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Learn how to search for information the smart way Choose your own adventure!
INFO 624 Week 3 Retrieval System Evaluation
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
INTRODUCTION TO BASIC BOOLEAN SEARCH AND TRUNCATION METHODS Paul Tremblay, Reference Librarian Paul Tremblay, Reference Librarian Office: (718)
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Creating Knowledge V, 2008 A search thesaurus for the domain of linguistics Creating a domain specific search tool on the basis of user behaviour study.
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Literature Search Techniques 2 Strategic searching In this lecture you will learn: 1. The function of a literature search 2. The structure of academic.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Finding Articles Chisa Uyeki Library 150: Week 3 October 6, 2006.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Query Expansion.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
BME1450: Biomaterials and Biomedical Research Michelle Baratta Engineering & Computer Science Library Maria Buda Dentistry Library.
Advanced Technical Writing
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Current Events and Issues Using Index Databases for Finding Answers.
Tutorial EBSCO Discovery Service for Corporate Users support.ebsco.com.
End-user interaction with corporate digital thesaurus Marianne Lykke Nielsen The Royal School of Library and Information Science Department of Information.
How to write a professional paper. 1. Developing a concept of the paper 2. Preparing an outline 3. Writing the first draft 4. Topping and tailing 5. Publishing.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
From the Advanced Search page of the Cochrane Library, we have clicked on the Cochrane Reviews: By Topic hyperlink. This has displayed the Topics for Cochrane.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Using OARE Search Engines. Environmental Index (EBSCO) Advanced Search.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
EBSCO SEARCH USING BOOLEAN OPERATORS, AND LIMITERS BY: YEAR, AGE, GENDER COMPANY AND COUNTRY DATABASES: Academic Search Premier Business Source Elite CINAHL.
Innovative Novartis Knowledge Center
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Dr.V.Jaiganesh Professor
BME1450: Biomaterials and Biomedical Research
Search Techniques and Advanced tools for Researchers
BIO1130 Lab 2 Scientific literature
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
MEDLINE with Full Text Searching
Presentation transcript:

H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S Marianne Lykke Nielsen, Associate Professor, Royal School of Library and Information Science Case story evaluating human metadata indexing versus automatic query expansion using a corporate thesaurus

H. Lundbeck A/S3-Oct-152 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-153 Motivation A lot of money has been invested – but does our current search and retrieval function perform as expected? An advanced and time consuming indexing task has been laid upon our end users – but is our current indexing strategy effective? Do we have - as high quality - alternatives to manual indexing?

H. Lundbeck A/S3-Oct-154 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-155 Case study - Research partners H. Lundbeck A/S –Pharmaceutical company –5000 employees, in > 40 countries –Information systems with electronic documents –Corporate thesaurus –Users and search requests Royal School of Librarianship –Thesaurus research expertise –Domain knowledge from former research project Ensight A/S –Verity K2 search engine and Intelligent Classifier –Technical expertise

H. Lundbeck A/S3-Oct-156 Purpose of case study To evaluate 1.Information retrieval based on controlled, human indexing (controlled metadata) 2.Information retrieval based on full-text indexing, with thesaurus- based automatic query expansion

H. Lundbeck A/S3-Oct-157 Case study – Retrieval system and indexing policy Electronic document management system (EDMS) and bibliographic information system containing research documentation Indexing policy –Written indexing policy –Mandatory training of indexers –Corporate Thesaurus –Human, controlled indexing –Topical checklist/Facetted indexing Searching by controlled metadata and full-text Domain specific thesaurus containing 5,500 concepts and 16,000 terms

EDMS 1/2 - Indexing

EDMS 2/2 – Searching

H. Lundbeck A/S3-Oct-1510 Lundbeck Thesaurus 1/3

H. Lundbeck A/S3-Oct-1511 Lundbeck Thesaurus 2/3

H. Lundbeck A/S3-Oct-1512 Lundbeck Thesaurus 3/3

H. Lundbeck A/S3-Oct-1513 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-1514 Test design - Retrieval performance of different search strategies Three different search strategies were evaluated: 1.Searches based on natural language (words from original request) in full text 2.Searches based on natural language in full text expanded with words from thesaurus (query expansion with synonyms and narrower terms) 3.Searches based on (manually assigned) controlled keywords in selected metadata fields

H. Lundbeck A/S3-Oct-1515 Test design - Query expansion Search for information about intravenous administration of a drug AND Alzheimer’s disease: ’Intravenous OR IV OR Intravenously OR…’ AND ’Alzheimer’s disease OR Alzheimer’s disorders OR Alzheimer type dementia OR…..’

H. Lundbeck A/S3-Oct-1516 Lundbeck Thesaurus

H. Lundbeck A/S3-Oct-1517 Test design - Test persons and retrieval system Persons –Query expansion tests were carried out by the thesaurus manager and did not involve end-users –Evaluation of search results were carried out by end users – 4 subject experts (Medical advisers) who had formerly answered the search requests System –Verity K2 search system was used as test retrieval system for the query expansion test work –Original document management systems were used as retrieval system for the metadata searches

H. Lundbeck A/S3-Oct-1518 Test design - Test thesaurus The Lundbeck Thesaurus was the test thesaurus. The thesaurus formed basis for query formulations: - Synonyms and narrower terms were picked from the thesaurus for the test searches based on expansion of natural language in full text searches - Preferred keywords were picked from the thesaurus for the test searches based on controlled keywords in selected metadata fields.

H. Lundbeck A/S3-Oct-1519 Test design - Test collection 25,384 document objects from two different sources –24,369 document objects from a bibliographical (BRS) information system (internal research reports and published research articles) –1015 documents from the full-text EDMS system (internal research reports)

H. Lundbeck A/S3-Oct-1520 Test design - Search requests 10 search requests were selected from a set of searches which in real life had been carried out in the corporate information systems Work task 7: You are a medical reviewer. A physician has contacted you. He would like to have data on the use of Citalopram and Reboxetine together to treat resistant depression. He wants any reporting of possible interactions. Indicative request: Find reports, papers or case stories that investigate the possible interaction of Citalopram and Reboxetine on resistant depression

H. Lundbeck A/S3-Oct-1521 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-1522 Findings – Performance Recall (% relevant docs retrieved out of total no. of relevant docs) Search strategy SJ1SJ1SJ2SJ3SJ4SJ5SJ6SJ7SJ8SJ9SJ10 Full-text Full-text with QE (syn) Full-text with QE (syn, nt) Metadata Precision (% relevant docs out of all retrieved docs) went down from 33% to 24% with query expansion SJ = Search Job, QE = Query Expansion

Findings – Human indexing problems Indexing problemsFrequency (%) N = 156 Explanations 1. Conceptual analysis A1 Omission of topic69 Indexers fail to remember facets and topics that are not explicitly mentioned in indexing policy or checklist Indexing policy recommend to check specific document sections such as title, table of content, etc. why indexers, especially in long documents, tend to omit topics from other document sections ÷÷ A2 Misinterpretation and wrong perspective of topic 14 Indexers misunderstand topic due to lack of topical and domain knowledge ÷ A3 Omission of implicit topic 2 Difficult for indexers to determine degree of topical interpretation and domain-orientation ÷ 2. Translation B1 Topic indexed at BT level 7  B2 Topic indexed with incorrect keyword 8 Indexers misunderstand meaning and use of keywords ÷

H. Lundbeck A/S3-Oct-1524 Findings – Other metadata Topical retrieval and situational relevance ranking - the importance of contextual parameters –Document type –Publication year –Source –Language –Author

H. Lundbeck A/S3-Oct-1525 Findings – Thesaurus Thesaurus –Relevant synonyms (acronyms with multiple meanings should be omitted) –Logical hierarchies –High topical relevance

H. Lundbeck A/S3-Oct-1526 Findings – Documents and search requests Document collection –OCR scanned documents may contain errors => false positive hits –Large (>100 pages) full text documents lower precision (irrelevant hits) Search requests –If people are searching using very general terms, QE will be extremely complicated/extensive, the more levels of QE we choose to add –Different types of facets result in Different relevance assessment according to document types Different recall in metadata search

H. Lundbeck A/S3-Oct-1527 Findings – Search software Search software settings are important –Stemming –Case sensitivity –Character sensitivity (()) –Number of search terms allowed –Zoning

H. Lundbeck A/S3-Oct-1528 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-1529 Conclusion – Thesaurus and QE A domain specific thesaurus are well suited for QE QE improves recall but decreases precision QE with synonyms only are in most cases sufficient

H. Lundbeck A/S3-Oct-1530 Conclusion - Search result display Users want to see all hits (recall is important) Manual sorting of search results by (other than topical) metadata is requested by the users Ranking based on e.g. zoning is not always useful

H. Lundbeck A/S3-Oct-1531 Conclusion – Indexing policy Difficult to obtain complete, accurate and exhaustive human indexing Findings suggest that searching for specific topics should be based on full-text indexing, supported by thesaurus based query expansion Human indexing should focus on few, important, well- defined topics, e.g. used to develop taxonomies for broad browsing Non-Topical context metadata are important in assessment of document relevance –Document type –Publication year –Source –Language –Author

H. Lundbeck A/S3-Oct-1532 Conclusion – Implications for Lundbeck Lundbeck Thesaurus has been integrated with bibliographic information system to perform automated QE EDMS upgrade planned where QE should be possible OCR scanning of existing documents are considered Metadata on document types in EDMS are evaluated and under revision (simplified) New models on how to add metadata are considered (dictionaries) New indexing tools for the users are developed (indexing keys)

H. Lundbeck A/S3-Oct-1533 Agenda Motivation Case study –Research partners –Purpose –Test design –Findings –Conclusions Summing up

H. Lundbeck A/S3-Oct-1534 Summing up If your current search and retrieval function does NOT perform as expected, your organisation may loose important information You may have an indexing strategy (which is good…) but evaluation may reveal that the resource investments could be used even better Evaluation is important, it may save your organisation money over time