Evidence-Based Information Retrieval in Bioinformatics

Slides:

Advertisements

Similar presentations

Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.

Advertisements

PubMed Advanced: Linking PubMed to NCBI Genetics Databases KTL Vaughan Librarian for Bioinformatics & Pharmacy UNC-CH Health Sciences Library.

NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.

Conducting a Literature Search and Writing a Literature Review Lisa Eblen, MLIS, AHIP Research Symposium 16 November 2011.

MICB 405 Bioinformatics Mini-Lab #1 – NCBI’s Entrez Dr. Joanne Fox We gratefully acknowledge the funding for the development of these.

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.

© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.

 Finding the right information to answer a given question often depends on the source of the information  Searching for evidence that has already been.

On line (DNA and amino acid) Sequence Information Lecture 7.

The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.

NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.

Annikki Roos & Turid Hedlund ELPUB 2007 Vienna Importance of access to biomedical information for researchers in research groups of molecular medicine.

NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,

GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.

How to use the web for bioinformatics Molecular Technologies October 14, 2006 Ethan Strauss X 1171

Accessing Sources Of Evidence For Practice Introduction To Databases Karen Smith Department of Health Sciences University of York.

How to use the web for bioinformatics Molecular Technologies October 15, 2005 Ethan Strauss X 1171

Information Skills Training – Physics Selina Lock.

Lecture 2.21 Retrieving Information: Using Entrez.

Informatics: A Primer Mary Lou Klem, PhD, MLIS Health Sciences Library System University of Pittsburgh.

Copyright © 2006 Pearson Education, Inc. publishing as Benjamin Cummings. The Literature of Health Education Chapter 9.

Michelle Henley, MLS San Francisco General Hospital Bethany Myers, MLIS UCLA Louise M. Darling Biomedical Library.

An Introduction to Bioinformatics Molecular Biology Databases.

Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at

On line (DNA and amino acid) Sequence Information

Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.

Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.

Periodical Databases Full-text article – entire textual contents of article in online format Abstract – brief summary of article Citation – basic information.

Sequence Databases What are they and why do we need them.

BME1450: Biomaterials and Biomedical Research Michelle Baratta Engineering & Computer Science Library Maria Buda Dentistry Library.

Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.

THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.

1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.

Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.

Bibliographic databases, online journals and literature searching.

Biological Databases By : Lim Yun Ping E mail :

IL Step 3: Using Bibliographic Databases Information Literacy 1.

The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.

Journal Searching Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are available online in Medical.

NCBI Literature Databases: PubMed

Gene Expression Omnibus (GEO)

From the Advanced Search page of the Cochrane Library, we have clicked on the Cochrane Reviews: By Topic hyperlink. This has displayed the Topics for Cochrane.

Bioinformatics and Computational Biology

Course: Research in Biomedicine and Health III Seminar 3: Looking for evidence.

Trinity College Dublin, The University of Dublin GE3M25: Bioinformatics Karsten Hokamp, PhD Genetics TCD, 05/11/2015.

Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Citation Searching Isabel Holowaty Juliet Ralph

Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration.

An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.

Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.

From the initial page of the Cochrane Library, we have clicked on the Cochrane Reviews: By Topic hyperlink. This has displayed the Topics for Cochrane.

Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.

PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.

MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.

NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.

الله الرحيم بسم الرحمن علیرضا صراف شیرازی دانشیار و مدیر گروه دندانپزشکی کودکان رئیس کتابخانه مرکزی و مرکز علم سنجی دانشگاه علوم پزشکی مشهد.

Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.

Evidence-Based Medicine in PubMed PubMed for Trainers, Summer 2016 U.S. National Library of Medicine (NLM) and NN/LM Training Office.

GUIDE. P UB M ED

BME1450: Biomaterials and Biomedical Research

NHP, Hanoi, Vietnam; and community clinics

TJTS505: Master's Thesis Seminar

Supplementary Table 1. PRISMA checklist

Mangaldai College, Mangaldai

Lesson 3 Bioinformatics Laboratory

WISER: Citiation searching

Presentation transcript:

Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee

Goal of the Project The overall, long term goal of this research project is to contribute to evidence-based information retrieval in post-genomic medicine proof of the effectiveness of the way particular information resources are used and combined in order to retrieve that information The overall, long term goal of this research project is to contribute to evidence-based information retrieval in post-genomic medicine. That a biomedical endeavor is evidence-based, no matter whether it is focused on patient care or on the discovery of gene function, implies that both its decision making and its retrieval of information are evidence-based. Evidence-based decision making addresses the need to base decisions, whether they concern patient care or discovery of gene function, on the results of prior scientific study. Evidence-based information retrieval, on the other hand, addresses the necessary prior step of having proof of the effectiveness of the way particular information resources are used and combined in order to retrieve that evidence.

Aims Specific Aim 1: Determine existing pitfalls in accessing literature on gene function Specific Aim 2: Based on user warrant, determine the current state of evidence-based functional genomic retrieval Specific Aim 3: Based on literary warrant, determine the current state of evidence-based functional genomic retrieval Specific Aim 1: Determine existing pitfalls in accessing literature on gene function Specific Aim 2: Determine the current state of evidence-based functional genomic retrieval based on user warrant Specific Aim 3: Determine the current state of evidence-based functional genomic retrieval based on literary warrant

“Determine existing pitfalls in accessing literature on gene function” That is the topic of my talk later today. “Asymmetries in Retrieval of Gene Function Information” The first aim, to determine existing pitfalls in accessing literature on gene function, is the topic of my talk later today, “Asymmetries in Retrieval of Gene Function Information”

The Study Investigated an example of different paths to the literature that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved. Knowledge that they are not equivalent requires knowledge of metadata about the resources. In that study we compared three different paths to literature on gene function that might appear to be equivalent to a user who lacks knowledge of the metadata about the information resources used.

Three Paths Affymetrix Affymetrix Affymetrix Nucleotide Gene Pubmed Genbank Accession number Genbank Accession number Genbank Accession number Nucleotide Gene Each path starts from a microarray experiment so I want to first talk a little about microarrays. Pubmed links Pubmed links Pubmed Pubmed Pubmed Pubmed ID Pubmed ID Pubmed ID

Microarrays can be used to determine gene expression under experimental and control conditions. Each cell in a microarray holds copies of short strands of DNA called probes. The probes are used to identify particular genes. The microarray is used ”… to help researchers identify what RNA sequences are present in [an experimental or control] sample, and this then tells them how strongly those genes are being expressed by that cell.“* The microarray is washed with RNA which has been treated to fluoresce when treated with a stain. The expression level of the genes is indicated by the brightness of the resulting glow. The results of the microarray experiment are statistically analyzed to determine which genes are significantly expressed. In the workflows we consider, there is a representative DNA sequence related to the probes for a gene, and that is what is used to search for primary literature about the gene function. *http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/The_Basic_Principle.affx http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affx

Three Paths Affymetrix Affymetrix Affymetrix Nucleotide Gene Pubmed Genbank Accession number Genbank Accession number Genbank Accession number Nucleotide Gene Each path starts from a microarray experiment, gets the Genbank Accession numbers of the representative sequences of expressed genes, and uses those Accession numbers to search for primary literature that may shed some light on the function of the genes. Pubmed links Pubmed links Pubmed Pubmed Pubmed Pubmed ID Pubmed ID Pubmed ID

Methods We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia. To compare the three paths, we collected the representative DNA Accession numbers associated with genes expressed in a microarray experiment (NIH grant AG18881) designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia.

Methods Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. Directly in the case of Entrez Pubmed Indirectly, via Pubmed links in the case of Entrez Nucleotide and Entrez Gene Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers. Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. Directly in the case of Entrez Pubmed Indirectly, via Pubmed links in the case of Entrez Nucleotide and Entrez Gene Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

Summary of Pubmed ID’s by Accession Number numbers 198 1 36 2 10 3 4 5 Total 251 # of Pubmed ID’s Accession numbers 132 1 112 2 5 3 4 Total 251 # of Pubmed ID’s Accession numbers 216 1 34 2 3 4 5 Total 251 We collected for each Accession number the Pubmed IDs retrieved by each path. This shows a summary of the numbers of Pubmed IDs retrieved by Accession numbers for each path. Pubmed Nucleotide Gene

Methods Compared number of Pubmed ID’s produced for each Accession number by each path. Applied non-parametric test: Kendall’s W Pubmed versus Nucleotide versus Gene p < .05 We then compared the number of Pubmed IDs retrieved for each Accession number by each path. We analyzed that data with Kendall’s W. The results showed that the result sets produced by the three paths were significantly different at p < .05.

The Three Paths Are Not Equivalent Pubmed links Genbank Accession number Pubmed ID Affymetrix Pubmed Nucleotide Gene ≠ ≠ In other words, these three different paths are not equivalent, in that they do not produce the same results.

The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF001892 [si], genbank/AF001892 [si]. The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files. The point is that a user lacking knowledge of the metadata about the resources (i.e.., indexing and other structural features) might have considered the paths equivalent. Here, for example, is documentation about the SI field that strongly suggests that the “direct to Pubmed” path and the Nucleotide path would not be equivalent. http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_D#pubmedhelp.Secondary_Source_ID_

“Based on user warrant, determine the current state of evidence-based functional genomic retrieval” Interviews with biologists who use microarrays to study gene expression levels Questions concern what methods for IR are used, why they consider the methods effective, what are criteria of success and failure, and how they see the role of biomedical librarians in the process The work on the second aim, “Based on user warrant, determine the current state of evidence-based functional genomic retrieval”, is in progress. In this project we interview biologists who use microarrays for gene expression studies and ask them questions about what methods for IR are used, why they consider the methods effective, what are criteria of success and failure, and how they see the role of biomedical librarians in the process

Interviews in Progress Five interviews currently scheduled at the University of Missouri-Columbia Interviews being scheduled at University of Wisconsin-Milwaukee In March we interviewed two subjects at NIG in Japan We currently have five interviews schedule for University of Missouri-Columbia, we are scheduling interviews at University of Wisconsin-Milwaukee, and we have interviewed two subjects at the National Institute of Genetics in Japan. We also have ten interviews that we did previously at the University of Missouri-Columbia and elsewhere.

“Based on literary warrant, determine the current state of evidence-based functional genomic retrieval” We wanted to investigate how and to what extent biological science researchers reported their information retrieval methods, including details of why they used the methods they did. Our third aim was “Based on literary warrant, determine the current state of evidence-based functional genomic retrieval”. In this project we wanted to investigate how and to what extent biological science researchers reported their information retrieval methods, including details of why they used the methods they did.

Methods We searched OVID Medline on October 1, 2004 for the period 1966 to September Week 4 2004 with the query “Oligonucleotide Array Sequence Analysis/”, producing 10746 results. We then limited the results to English (10374), excluded “review articles” (9049), and limited to the years 2003 – 2004 (4798). We next ranked journals in the results by number of articles, and selected a population of all of the articles from the 13 top journals (n=1373). We randomly sampled 150 articles from that population. We searched OVID Medline on October 1, 2004 for the period 1966 to September Week 4 2004 with the query “Oligonucleotide Array Sequence Analysis/”, producing 10746 results. We then limited the results to English (10374), excluded “review articles” (9049), and limited to the years 2003 – 2004 (4798). We next ranked journals in the results by number of articles, and selected a population of all of the articles from the 13 top journals (n=1373). We randomly sampled 150 articles from that population.

Methods If the authors of the paper did report gene function, we wanted to know which information sources and retrieval methods they used, as well as the reasons they had for using them. Functional Attribution Reported Sources of Information Reported Retrieval Strategy Reported Grounds for Choice of Sources Reported Grounds for Retrieval Strategy Reported If the authors of the paper did report gene function, we wanted to know which information sources and retrieval methods they used, as well as the reasons they had for using them. So we classified the relevant articles with respect to the categories “Functional Attribution Reported, “Sources of Information Reported”, “Retrieval Strategy Reported”, “Grounds for Choice of Sources Reported”, “Grounds for Retrieval Strategy Reported”.

Methods How were details of sources and retrieval methods reported? Methods or Procedures Results Discussion Furthermore, we were interested in how details of the sources and retrieval methods they used were reported in the paper. Thus, when details of the information sources and retrieval methods used were discussed, we noted the sections of the paper in which they were discussed. For example, we noted whether information retrieval methods were discussed in the Methods or Procedures section, the Results section, or the Discussion section.

Results Typical evidence for attribution of gene function consists of literature citations. When a literature search (e.g. Pubmed search), or a search of other knowledge sources (e.g. NCBI databases), is cited as the source of evidence to support attribution of function, rarely are details of the search reported. Reasons for using sources and retrieval methods not reported. Typical evidence for attribution of gene function consists of literature citations. When a literature search (e.g. Pubmed search), or a search of other knowledge sources (e.g. NCBI databases), is cited as the source of evidence to support attribution of function, rarely are details of the search reported, certainly not in a level of detail that would allow repeatability. Its also the case that reasons for using sources and retrieval methods are not reported.

Results When information retrieval methods are described in the paper, they are typically mentioned only in the “Results” or “Discussion” sections of the paper, and not in the “Methods” section. Wet bench methods are reported in more detail than dry bench methods. Interestingly, when information retrieval methods are described in the paper, even in detail, they are typically mentioned only in the “Results” or “Discussion” sections of the paper, and not in the “Methods” section. Even perhaps more interesting is that wet bench methods are reported in much more detail than dry bench methods.

Implications for Information Practice

Implications for Information Practice There is a need to embrace a workflow concept There is a need to develop standards for documentation in e-science There is a need to use multidisciplinary teams to develop workflows I will mention three implications for information practice suggested by our studies. There is a need to embrace a workflow concept There is a need to develop standards for documentation in e-science There is a need to use multidisciplinary teams to develop workflows

“There is a need to embrace a workflow concept” Call a scenario of the use of a combination of multiple information resources databases and analysis tools a workflow Workflows are increasingly important for information retrieval and processing in the Life Sciences I maintain the first implication because Workflows are increasingly important in the Life Sciences. A scenario of information retrieval and processing that involves the use of multiple information resources, databases, and analysis tools in combination (like the three paths to the literature that we examined earlier) is called a workflow.

“There is a need to develop standards for documentation in e-science” Traditional Science Computer based Information retrieval and processing The second implication is that there is a need to develop standards for documentation in e-science. It is commonly suggested, and presumably it is true, that we are witness to the ongoing digitization of science, with computer based information retrieval and processing methods increasingly being incorporated into the day to day doing of traditional science. The Digitization of Science or E-science

Life Science Information Retrieval and Processing Workflows Presumably, embracing information processing and retrieval workflows in the Life Sciences requires that we have clear constraints on the quality (e.g. peer review) of those workflows, as well as assurance of repeatability of methods and results.

Life Science Information Retrieval and Processing Workflows documentation Life Science Information Retrieval and Processing Workflows For this we need documentation of the details of the workflow.

Life Science Information Retrieval and Processing Workflows documentation Life Science Information Retrieval and Processing Workflows technology to facilitate documentation In order to achieve the level of documentation that is required for quality and repeatability of methods and results with any very complicated resource composition or workflow, we need to develop technology to facilitate the documentation.

Life Science Information Retrieval and Processing Workflows documentation Life Science Information Retrieval and Processing Workflows technology to facilitate documentation editorial policy drivers But in addition to the technology for capturing and managing provenance records, there must also be policy drivers, particularly editorial policy drivers, to insure the level of documentation of methods and results required for quality and repeatability.

“There is a need to use multidisciplinary teams to develop workflows” INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS The third implication for information practice is that there is a need to use multidisciplinary teams to develop workflows. I think a typical situation in which workflows might be developed would be one in which we have primary information resources and items, various tools for accessing or manipulating that information, metadata describing the primary information and tools, and then workflows that use the primary information and tools, where the design of the workflows is based

KNOWLEDGE-ENABLED WORKFLOWS INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS in part of knowledge of the primary information domain expert (scientist)

KNOWLEDGE-ENABLED WORKFLOWS INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain metadata expert (information specialist) in part on knowledge of the metadata domain expert (scientist)

KNOWLEDGE-ENABLED WORKFLOWS METADATA domain metadata expert (information specialist) In order to construct workflows both kinds of expertise are required. The domain expert is needed to provide the scientific bases for the workflow design, and the information specialist is needed for his or her expertise in the metadata of the domain. The information specialists (e.g. librarians) do not need to become experts in biology, that is, experts in the primary information and tools. But they do need to be experts in the metadata of biology. TOOLS domain expert (scientist) INFORMATION ITEMS