Presentation is loading. Please wait.

Presentation is loading. Please wait.

Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration.

Similar presentations


Presentation on theme: "Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration."— Presentation transcript:

1 Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration and Informatics, University of Wisconsin-Milwaukee 2 College Of Veterinary Medicine, 3 Health Management and Informatics, University of Missouri-Columbia

2 Acknowledgements 2004 Donald A. B. Lindberg Research Fellowship University of Missouri National Library of Medicine Biomedical and Health Informatics Research Training grant

3 Overview Background –What is an asymmetry in retrieval of gene function information? Life science information retrieval and processing workflows Example of asymmetrical workflows –Compare three apparently equivalent asymmetrical workflows Conclusion –Documentation standards –Multidisciplinary teams for life science workflows

4 What is an Asymmetry in Retrieval? Taking different paths to get the same kind of information about a given biological object Life science information retrieval and processing workflows

5 Complex Information Retrieval May involve the use of multiple information resources databases and analysis tools, in combination Such combinations of resources are often represented as workflows.

6 Workflow Standards Business Process Execution Language for Web Services Version 1.1 –http://www-128.ibm.com/developerworks/library/specification/ws-bpel/ Simple Conceptual Unified Flow Language (SCUFL) –Taverna Workbench http://taverna.sourceforge.net/

7 Logical Workflows A logical workflow is sort of like a logical process model, with processes, data links, and control links Key aspects of the workflow are inputs, outputs and processes that transform the data get DNA sequence Similarity search Sequence ID Sequence string results

8 Physical Workflows A physical workflow is like a physical process model, with processes, data links, and control links fetch DNA sequence BLAST UI Sequence string BLAST results

9 Physical Workflow Antoon Goderis, Ulrike Sattler and Carole Goble, Applying DLs to workflow reuse and repurposing Description Logics workshop, Edinburgh, Scotland, 24- 26 July 2005

10 Asymmetry Asymmetry means the paths or workflows are different: from the same set of potential inputs about some biological object they take different paths to produce the same kind of results. Asymmetrical workflows are equivalent if they do produce the same results.

11 This Study Example of asymmetrical workflows that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved. Knowledge that they are not equivalent requires knowledge of metadata about the resources.

12 Three Workflows Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

13 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

14 http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affx

15 http://www.mygrid.org.uk/images/pagemaster/GravesDiseasescenario_1.png

16

17 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

18 Methods We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia. This experiment sought, using a mouse model, to identify differences in gene expression associated with successful recovery from sarcopenia in young muscle as compared to failed recovery in old muscle. –NIH grant AG18881 Pattison JS, Folk LC, Madsen RW, Childs TE, Booth FW. Transcriptional profiling identifies extensive downregulation of extracellular matrix gene expression in sarcopenic rat soleus muscle. Physiological Genomics 15(1):34-43, 2003. Pattison JS, Folk LC, Madsen RW, Booth FW. Selected Contribution: Identification of differentially expressed genes between young and old rat soleus muscle during recovery from immobilization-induced atrophy. Journal of Applied Physiology 95(5):2171-9, 2003. Pattison JS, Folk LC, Madsen RW, Childs TE, Spangenburg EE, Booth FW. Expression profiling identifies dysregulation of myosin heavy chains IIb and IIx during limb immobilization in the soleus muscles of old rats. Journal of Physiology 553(Pt 2):357- 68, 2003.

19 Methods Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. –Directly in the case of Entrez Pubmed –Indirectly, via Pubmed links in the case of Entrez Nucleotide and Entrez Gene Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

20 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

21 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

22

23 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

24

25

26

27

28

29 Pubmed links Genbank Accession number Pubmed links Genbank Accession number Three Workflows Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

30

31

32

33 Summary of Pubmed ID’s by Accession Number # of Pubmed ID’s # of Accession numbers 0198 136 210 34 41 52 Total251 # of Pubmed ID’s # of Accession numbers 0132 1112 25 32 40 50 Total251 Pubmed Nucleotide # of Pubmed ID’s # of Accession numbers 0216 134 20 31 40 50 Total251 Gene

34 Methods Compared number of Pubmed ID’s produced for each Accession number by each workflow. Applied non-parametric test: Kendall’s W –Pubmed versus Nucleotide versus Gene –p <.05

35 The Three Workflows Are Not Equivalent ≠≠ Pubmed links Genbank Accession number Pubmed links Genbank Accession number Genbank Accession number Pubmed ID Affymetrix Pubmed NucleotideGene

36 The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF001892 [si], genbank/AF001892 [si]. The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files. http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_ D#pubmedhelp.Secondary_Source_ID_

37 Conclusions

38 Need for Documentation The first conclusion I take from this project is that there is a need for documentation of workflow details. –In another study we look at the character of documentation of information processing and retrieval methods in published reports of microarray experiments

39 Multidisciplinary Teams for Workflows The second conclusion I take is that the development of workflows requires multidisciplinary teams.

40 INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS

41 INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain expert (scientist)

42 INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain metadata expert (information specialist) domain expert (scientist)

43 INFORMATION ITEMS METADATA KNOWLEDGE-ENABLED WORKFLOWS TOOLS domain metadata expert (information specialist) domain expert (scientist)

44

45 workflows

46

47


Download ppt "Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration."

Similar presentations


Ads by Google