Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration.

Slides:



Advertisements
Similar presentations
PubMed/How to Search, Display, Download & (module 4.1)
Advertisements

In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
PubMed.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Introduction to PubMed® (pubmed.gov)
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
1.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Resources to Answer Questions eModule 2 LSI Curriculum, Year 1 Content Authors Stephanie Schulte, MLIS, Assistant Professor, Health Sciences Library Carol.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
Evidence-Based Information Retrieval in Bioinformatics
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
PubMed MEDLINE Your primary source for biomedical journal literature November 2003 Jeanne Le Ber, Education Services Spencer S. Eccles Health Sciences.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
PubMed/How to Search, Display, Download & (module 4.1)
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Gene Expression Omnibus (GEO)
Taverna and my Grid Basic overview and Introduction Tom Oinn
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
PubMed/How to Search, Display, Download & (module 4.1)
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
PubMed/How to Search, Display, Download & (module 4.1)
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
8 October 2009Microbial Research Commons1 Toward a biomedical research commons: A view from NLM-NIH Jerry Sheehan Assistant Director for Policy Development.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist in Molecular Biology.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Resources for Biological Research Catherine Dockerty and Sophie Wilcox February 2008.
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
NCBI Literature Databases: PubMed
Gene Expression Omnibus (GEO)
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Information Retrieval
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Partner Publishers’ Websites From the Partner publisher services dropdown menu, click on the Elsevier Science - Science Direct website. Note that this.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
Gene Expression Omnibus (GEO)
Lesson 3 Bioinformatics Laboratory
PubMed.
Supporting High-Performance Data Processing on Flat-Files
A Sample Gbrowse-Moby BioMoby Browsing Session
PubMed/How to Search, Display, Download & (module 4.1)
Presentation transcript:

Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration and Informatics, University of Wisconsin-Milwaukee 2 College Of Veterinary Medicine, 3 Health Management and Informatics, University of Missouri-Columbia

Acknowledgements 2004 Donald A. B. Lindberg Research Fellowship University of Missouri National Library of Medicine Biomedical and Health Informatics Research Training grant

Overview Background –What is an asymmetry in retrieval of gene function information? Life science information retrieval and processing workflows Example of asymmetrical workflows –Compare three apparently equivalent asymmetrical workflows Conclusion –Documentation standards –Multidisciplinary teams for life science workflows

What is an Asymmetry in Retrieval? Taking different paths to get the same kind of information about a given biological object Life science information retrieval and processing workflows

Complex Information Retrieval May involve the use of multiple information resources databases and analysis tools, in combination Such combinations of resources are often represented as workflows.

Workflow Standards Business Process Execution Language for Web Services Version 1.1 – Simple Conceptual Unified Flow Language (SCUFL) –Taverna Workbench

Logical Workflows A logical workflow is sort of like a logical process model, with processes, data links, and control links Key aspects of the workflow are inputs, outputs and processes that transform the data get DNA sequence Similarity search Sequence ID Sequence string results

Physical Workflows A physical workflow is like a physical process model, with processes, data links, and control links fetch DNA sequence BLAST UI Sequence string BLAST results

Physical Workflow Antoon Goderis, Ulrike Sattler and Carole Goble, Applying DLs to workflow reuse and repurposing Description Logics workshop, Edinburgh, Scotland, July 2005

Asymmetry Asymmetry means the paths or workflows are different: from the same set of potential inputs about some biological object they take different paths to produce the same kind of results. Asymmetrical workflows are equivalent if they do produce the same results.

This Study Example of asymmetrical workflows that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved. Knowledge that they are not equivalent requires knowledge of metadata about the resources.

Three Workflows Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Methods We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia. This experiment sought, using a mouse model, to identify differences in gene expression associated with successful recovery from sarcopenia in young muscle as compared to failed recovery in old muscle. –NIH grant AG18881 Pattison JS, Folk LC, Madsen RW, Childs TE, Booth FW. Transcriptional profiling identifies extensive downregulation of extracellular matrix gene expression in sarcopenic rat soleus muscle. Physiological Genomics 15(1):34-43, Pattison JS, Folk LC, Madsen RW, Booth FW. Selected Contribution: Identification of differentially expressed genes between young and old rat soleus muscle during recovery from immobilization-induced atrophy. Journal of Applied Physiology 95(5):2171-9, Pattison JS, Folk LC, Madsen RW, Childs TE, Spangenburg EE, Booth FW. Expression profiling identifies dysregulation of myosin heavy chains IIb and IIx during limb immobilization in the soleus muscles of old rats. Journal of Physiology 553(Pt 2): , 2003.

Methods Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. –Directly in the case of Entrez Pubmed –Indirectly, via Pubmed links in the case of Entrez Nucleotide and Entrez Gene Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

Summary of Pubmed ID’s by Accession Number # of Pubmed ID’s # of Accession numbers Total251 # of Pubmed ID’s # of Accession numbers Total251 Pubmed Nucleotide # of Pubmed ID’s # of Accession numbers Total251 Gene

Methods Compared number of Pubmed ID’s produced for each Accession number by each workflow. Applied non-parametric test: Kendall’s W –Pubmed versus Nucleotide versus Gene –p <.05

The Three Workflows Are Not Equivalent ≠≠ Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF [si], genbank/AF [si]. The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files. D#pubmedhelp.Secondary_Source_ID_

Conclusions

Need for Documentation The first conclusion I take from this project is that there is a need for documentation of workflow details. –In another study we look at the character of documentation of information processing and retrieval methods in published reports of microarray experiments

Multidisciplinary Teams for Workflows The second conclusion I take is that the development of workflows requires multidisciplinary teams.

INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS

INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain expert (scientist)

INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain metadata expert (information specialist) domain expert (scientist)

INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain metadata expert (information specialist) domain expert (scientist)

workflows