Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina.

Slides:



Advertisements
Similar presentations
FOR PROFESSIONAL OR ACADEMIC PURPOSES September 2007 L. Codina. UPF Interdisciplinary CSIM Master Online Searching 1.
Advertisements

Critical Reading Strategies: Overview of Research Process
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
Y.Pancheshnikov, ACRL, 2003 Course-Centered Collection Evaluation in the Agricultural Sciences for University Instructional Program Reviews Yelena Pancheshnikov.
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
Mass Digitization of Archival Manuscripts To ThisGoing from this.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Information Retrieval February 24, 2004
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.
Information Seeking Behavior of Scientists Brad Hemminger School of Information and Library Science University of North Carolina at Chapel.
Literature Search Techniques 2 Strategic searching In this lecture you will learn: 1. The function of a literature search 2. The structure of academic.
Literature Review Getting started. “ a researcher cannot perform significant research without first understanding the literature in the field ” (Boote.
Citations and links as measures of effectiveness of online LIS journals Alastair G. Smith School of Information Management, Victoria University of Wellington.
Search Engines and Information Retrieval Chapter 1.
How To Conduct Background Research For Your Paper.
Databases and Library Catalogs Global Index Medicus/Global Health Library PubMed Source Bibliographic Database: International Health and Disability.
Rajesh Singh Deputy Librarian University of Delhi Measuring Research Output.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
AP/H SCIENCE SKILLS: EXCEL & SIG FIG Suggested summer work for incoming students.
HELPING YOUR LIBRARY BE THE BEST PARTNER FOR RESEARCH.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Web Scale Discovery Service Vs Federated Search NIKESH NARAYANAN
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Searching for Information and Library Databases. Knowing… When When Where Where How to find information isn’t easy How to find information isn’t easy.
Click on the tab to find journals by Subjects. From the drop down menu, we will select Parasitology and Parasitic Diseases.
General division 3.1. Types of scientific reports and their purpose 3. Scientific literature, online resources at the Internet oral form - it is not obligatory,
IL Step 3: Using Bibliographic Databases Information Literacy 1.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Stage-426-Feb-991 Ways to Excel as a Stage-4 Student Professor Clark Thomborson Computer Science Department Auckland University.
Tutorial EBSCO Discovery Service for Corporate Users support.ebsco.com.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Definition and search of scientific articles Tord Heljeberg
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
Anomalies in Open-Access & Traditional Biomedical Literature: A Comparative Analysis Abstract This research compares rates of anomaly and post-publication.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
3.1. Types of scientific reports and their purpose oral form - it is not obligatory, could be very formal and it can not be used for the justification.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Project Thesis 2006 Adapted from Flor Siperstein Lecture 2004 Class CLASS Project Thesis (Fundamental Research Tools)
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Information Retrieval
FEMS Microbiology Ecology Getting Your Work Published Telling a Compelling Story Working with Editors and Reviewers Jim Prosser Chief Editor FEMS Microbiology.
What Does the User Really Want ? Relevance, Precision and Recall.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Unit 5 Commercial Databases. Can You Find an Answer? n Connect to Social Sciences Abstracts n Search: u Cold war (keyword): ______ items u Cold war (title):______.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Necessary Changes to Modern Library Catalogs and Potential Solutions Meg Gill ILS 506-S70.
Roger Mills February don’t be evil stand on the shoulders of giants.
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
Library Workshop Welcome!. What we are talking about: Library facts How to search our print content online (OPAC) How to access and search e-content Questions?
Research Methods in Business and Economics4 Jan Brzozowski, PhD.
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
© 2004 Reviews.com™ 1 Reviews: A Front End to Literature Bruce Antelman
Research Introduction to the concept of incorporating sources into your own work.
Understanding and Critically Appraising the Literature Review
Understanding and Critically Appraising the Literature Review
Using computers to search electronic databases
User Interface HEP Summit, DESY, May 2008
Research related to Health Informatics
IL Step 3: Using Bibliographic Databases
Networked Information Resources
Search for Article Citation
Presentation transcript:

Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina at Chapel Hill

Background Traditionally most researchers have searched for scholarly information through bibliographic databases which match search keywords against the metadata that describes the content, with journal articles being the most common form of content [Hersh, 2006]. Examples of commonly used bibliographic databases include PubMed and the ISI Web of Knowledge. The metadata description serves as a surrogate for the complete article itself. With the advent of electronic (digital) versions of articles being available, there has been an increased interest in searching the complete, or “full-text”, article itself. Many publishers are beginning to support full-text searching of their on-line content.

Background The Pew survey for OCLC in 2003 [Online Computer Library Center, 2005] found that the vast majority of people (89%) turn to search engines to initiate their searches for information while few use library web pages (2%) or online databases (2%). Even academic research scientists prefer search engines over library web pages for their information searching for research purposes [Hemminger, 2005] and are increasing turning to meta- search interfaces like Google Scholar to perform full text searches.

Research Question While it is clear that full-text matches of search strings yield more matches than just searching for matches within the metadata of articles, it is not evident how many more matches or previously undiscovered articles are found on average, or how relevant they are. It is often simply assumed that finding additional articles will automatically be of greater value to the searcher. However, as users have discovered when faced with millions of search engine hits to sort through, more is not always better.

ArabidopsisSchizophrenia Article Discovery Set  Plant Cell  Plant Physiology  Genes Development  Journal of Experimental Biology  PNAS  (13,991 total articles)  PNAS  The American Journal of Human Genetics  American Journal of Psychiatry  Archives of General Psychiatry  (12,314 total articles) Article Review Base Set Three major journals selected in research area, covering  Plant Cell  Plant Physiology  Genes Development  American Journal of Psychiatry  American Journal of Human Genetics  PNAS Gene NamesCandidates (5175) Article Review Subset (10) Candidates (26597) Article Review Subset (15) Article Review Study Set Metadata Articles (18) Full-Text Articles (82) Total (100) Metadata Articles (19) Full-Text Articles (83) Total (102) Article Review Training Set Metadata Articles (3) Full-Text Articles (17) Metadata Articles (3) Full-Text Articles (9)

Article Discovery Schizophrenia + Schizophrenia Gene Schizophrenia GeneArabidopsis Gene Genes Found in Metadata Only % % % Genes Found in Full- text Only % % % Genes Found in Metadata and Full-text % % % Totals for Found Genes

Article Review Study Two literature cohorts, –Schizophrenia (Pat Sullivan) –Arabidopsis (Todd Vision) Each cohort had three readers Readers are asked to “review the article and judge its relevance to them as someone new to the gene in this biological setting, trying to build an understanding of the state of knowledge in that research area.”

Rating Scale for Reviewing Articles RatingRating NameRating Usage 1Definitely UsefulRight on topic, very helpful, primary initial study, excellent review, etc. 2Probably UsefulOn topic and potentially important material 3Possibly UsefulHas some material or references that are likely useful, but not certain without further checking 4Probably Not UsefulUnlikely, but may have some use, for instance references to check out 5Definitely Not UsefulNot on topic; nothing of direct value, not worth keeping.

Metadata Articles More Valuable In both cases and for all observers, their mean quality rating values were lower (more useful) for the metadata discovered articles. There were statistically significant differences between the mean quality rating for the metadata discovered articles versus the full-text discovered articles for the both the Arabidopsis and Schizophrenia sets at the p < 0.05 level

Precision and Recall SchizophreniaArabidopsis RecallPrecisionRecallPrecision Metadata discovered15.7% (16.6%) 94.7%84.1% (84.1%) 100% Full-text only discovered100%63.7%100%69%

Article Features that correlate with Value: Number of Hits The number of hits or matches of the search term within the returned document is a commonly used feature to rank returned articles. To test the value of this feature, the number of hits was correlated with the mean quality ranking for each article (averaged across all observers). The results clearly show a relationship where articles with many matches of the search term, tend to be much more highly valued.

Improving Relevance for Metadata Searching Repeating the calculations on the schizophrenia and Arabidopsis article review sets, but limited to only matches with high hit counts (Schizophrenia ≥ 20 hits and Arabidopsis ≥ 15 hits) shows that precision for the full text is now the same (100% in Aradidopsis) or slightly better than that of the metadata retrieved articles (95% versus 94.4% in schizophrenia). However, the number of additional cases discovered by full-text searching is now only slightly better, finding 5% more cases in schizophrenia and 28% more in Arabidopsis.

Conclusions This suggests that rather than accepting metadata searching as a surrogate for full-text searching, it may be time to make the transition to direct full text searching as the standard. This could be accomplished by using certain features of the full-text article, such as number of hits of the search string or whether the search string is found in the metadata (i.e. our current metadata search) as filters that allow us to increase the precision of our results. (and put the user in control of the filtering).

Schizophrenia ObserverABCMean Mean Ratings Mean Ratings (Fulltext) Mean Ratings (Metadata) Difference in Mean Rating (Fulltext - Metadata)

Arabidopsis ObserverDEFMean Mean Ratings Mean Ratings (Fulltext) Mean Ratings (Metadata) Difference in Mean Rating (Fulltext - Metadata)

Schizophrenia Gene GroupRangeMean Rating ValueDifferent from Groups A1-4 hits3.24C B5-19 hits2.88C C20 or more hits1.62A,B

Arabidopsis GroupRangeMean Rating ValueDifferent from Groups A1-4 hit3.41C B5-14 hit2.94C C15 or more hits1.69A,B

SchizophreniaArabidopsis Search TermNumber of Matches Percentage of Articles Matched Mean Reviewer Rating for Article Class Number of Matches Percentage of Articles Matched Mean Reviewer Rating for Article Class SPDG SGDO SGDD DGDD MUTANT FAMILY SEQUENCE INTERACTION PROCESS STRUCTURE UP DOWN REVIEW MARKER FP REFERENCE TABLE MIP IMG Text ReferencesOnly Letter Errata

Results First, that full-text searching can perform as well as or better than metadata searching in precision and recall. Second, that the best solution might be to provide a dynamic interface allowing the user to trade off between precision and recall by controlling the threshold of the number hits by which the results are filtered.

Schizophrenia + Schizophrenia Gene Schizophrenia GeneArabidopsis Gene Genes Found in Metadata Only % % % Genes Found in Full-text Only % % % Genes Found in Metadata and Full- text % % % Totals for Found Genes Genes not found Overall Total