Download presentation
Presentation is loading. Please wait.
Published byPolly Greer Modified over 9 years ago
1
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation Scientific concepts are annotated with controlled vocabulary (CV) terms from ontologies such as Gene Ontology (GO) and Plant Ontology (PO). Our Arabidopsis specific tool - Patterns in Arabidopsis Annotation (PattArAN) will focus on pattern creation from annotation knowledge of (gene, GO, PO) triplets and triplet validation using the scientific literature. PattArAn will help scientists to scour the literature, to understand the connection to the annotation evidence and biological knowledge, and to develop hypotheses. Goals: Explore new research ideas in three areas of interests using PattArAn. (1) Explore new research ideas in three areas of interests using PattArAn. Build a gold standard dataset using manual annotation of triplet fingerprints. (2) Build a gold standard dataset using manual annotation of triplet fingerprints. The PattArAn Team at the University of Maryland, the University of Iowa, and St. Bonaventure University Gene-GO-PO Triplets Gene-GO-PO Triplets Document Annotation Guidelines Document Annotation Guidelines Observations Observations Check inter-annotator agreement.Check inter-annotator agreement. Extract gene interaction sentences in the context of our annotation triplets.Extract gene interaction sentences in the context of our annotation triplets. Develop algorithms to rank sentences by importance with this gold standard data.Develop algorithms to rank sentences by importance with this gold standard data. GO and PO combinations centered on a gene. Documents supporting annotations identified and collected. Area1Area2Area3 # triplets in document set (8 documents) Found In Full-Text: 3214 # triplets w/ at least 1 sentence1116 # triplets w/ all 3 doublets in at least 1 sentence each010 # triplets w/ only 2 doublets in at least 1 sentence24575 # triplets w/ only 1 doublet in at least 1 sentence515854 Found In Supplementary Data: # triplets found3138 # doublets found83469 Using our triplets we could identify connections between a specific area to other fields in biology in under four weeks. Interesting also to see how biologists’ genes of interest may function in concert to influence different bioprocesses. This well serves as the beginning of an exploration that may eventually lead to new hypotheses and discoveries. : Triplets represented by sentences to varying degrees. Supplementary material quite rich. Doublets have most potential. Annotations : Triplets represented by sentences to varying degrees. Supplementary material quite rich. Doublets have most potential. : Annotations of document (16399800) well explain a biological process of Arabidopsis thaliana. The TSO2 gene relates to cell division by controlling dNTPs balance. All annotating GOs link through the function of TSO2. Also TSO2 is expressed in the organs mentioned in the POs. Thus, this paper nicely links the PO terms and GO terms. Knowledge Underlying Triplets : Annotations of document (16399800) well explain a biological process of Arabidopsis thaliana. The TSO2 gene relates to cell division by controlling dNTPs balance. All annotating GOs link through the function of TSO2. Also TSO2 is expressed in the organs mentioned in the POs. Thus, this paper nicely links the PO terms and GO terms. : Document 9880378 indicates that the redox gene AtCB5-D is expressed at varying levels across plant tissues. Document 17028151 indicates that upon infection with Pseudomonas syringae, expression levels drop significantly in Arabidopsis leaves. This process is one aspect of a complex, genome wide response to bacterial infection involving many genes. Cross-document inference : Document 9880378 indicates that the redox gene AtCB5-D is expressed at varying levels across plant tissues. Document 17028151 indicates that upon infection with Pseudomonas syringae, expression levels drop significantly in Arabidopsis leaves. This process is one aspect of a complex, genome wide response to bacterial infection involving many genes. : Using doublets in document (18305484) we may infer that: “The plasma membrane protein SLAC1 is essential for stomatal closure in response to CO2, abscisic acid, ozone, light/dark transitions, humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This is interesting as it is describes a single protein that is involved in many responses due to various environmental signals. Inferred Triplet : Using doublets in document (18305484) we may infer that: “The plasma membrane protein SLAC1 is essential for stomatal closure in response to CO2, abscisic acid, ozone, light/dark transitions, humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This is interesting as it is describes a single protein that is involved in many responses due to various environmental signals. : regulation of flower and fruit development by genes and signal pathways. (e.g., genes TSO1, TSO2, MSI1) Area 1 : regulation of flower and fruit development by genes and signal pathways. (e.g., genes TSO1, TSO2, MSI1) : signal transduction of the plant hormone ethylene. Area 2 : signal transduction of the plant hormone ethylene. (e.g., genes ETR1, ERS1, ETR2) : integration of metabolite transporters with plant growth, development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2) Area 3 : integration of metabolite transporters with plant growth, development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2) Future Work Summary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.