PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Annotation of Gene Function …and how thats useful to you.
Winging It: Connecting gene expression, cell signaling and morphology Group 7 Brad Davidson, Amy Vollmer, Liz Vallen: Swarthmore College Missy McElligott.
Biological pathway and systems analysis An introduction.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Gene Ontology John Pinney
…Is how cells coordinate their physiological behaviors …Greater than the sum of their subcellular organelles
Plant Responses to Signals IV Photomorphogenesis Circadian Rhythms Gravitropism
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Internet tools for genomic analysis: part 2
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Cis-Regulatory/ Text Mining Interface Discussion.
Agenda 11/26 Collect Ecology article summaries, look at photo gallery, and discuss with numbered partner
EXPLORING LIFE. What is SCIENCE? Derived from the Latin verb meaning “to know” Science is… …a process by which we know and understand how the natural.
Chemical Signals in Animals
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.
Control Systems in Plants. Plant Hormones l Coordinates growth l Coordinates development l Coordinates responses to environmental stimuli.
Gene Set Enrichment Analysis (GSEA)
Bioinformatics Dr. Víctor Treviño BT4007
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Plant Immunology.
Modification of Cell Surface/ Cell Communication
Flexible Text Mining using Interactive Information Extraction David Milward
Discovery from Linking Open Data (LOD) Annotated Datasets Louiqa Raschid University of Maryland PAnG/PSL/ANAPSID/Manjal.
The Plant Ontology Consortium website: Contact Information for deliverables Lincoln Stein,
Mahmuda Khan. Goal  To obtain training data – sentences from the literature – to validate patterns involving triplets of Arabidopsis thaliana genes,
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Pop-Quiz Define a negative feedback system Which organ produces the hormones that are used in regulation of blood glucose? Define Homeostasis?
The Plant Ontology Consortium Lincoln Stein 1, Susan McCouch 2, Elizabeth Kellogg 3, Seung Rhee 4, Pankaj Jaiswal 2, Doreen Ware 1, Peter Stevens 5 1 Cold.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Chapter 11 Cell Communication. Concept Check Questions Chapter 11 Cell Communication.
Your chance to apply the tool Ask team members to brainstorm issues under each category. All of the headings are interrelated so don’t worry too much about.
Transport in plants occurs on three levels:
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
 Signaling molecules that function within an organism to control metabolic processes within cells, the growth and differentiation of tissues, the synthesis.
1.3 Scientific Thinking and Processes KEY CONCEPT Science is a way of thinking, questioning, and gathering evidence.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Supplementary Material 3 Gene ontology annotation of cellular component, molecular function and biological processes for both hypoxia and NAP supplemented.
Phenotype And Trait Ontology (PATO) and plant phenotypes
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Essential Questions What is biology? What are possible benefits of studying biology? What are the characteristics of living things? Introduction to Biology.
Chapter One The Science of Biology
Life Science. Explain that cells are the basic unit of structures and function of living organisms. Cells are the basic unit of structures of living organisms.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
1.3 Scientific Thinking and Processes KEY CONCEPT Science is a way of thinking, questioning, and gathering evidence.
AP Biology Angiosperm life cycle female gametophyte in ovary male gametophyte in pollen sporophyte in seed fertilization Polar nuclei Egg cell.
1.3 Scientific Thinking and Processes KEY CONCEPT Science is a way of thinking, questioning, and gathering evidence.
Plant Hormones - Ethylene
TDM in the Life Sciences Application to Drug Repositioning *
Essential idea: Plants adapt their growth to environmental conditions.
Essential knowledge 3.B.2:
Internal Factors Affecting Plant Growth
9.3 Growth in plants AHL Essential idea: Plants adapt their growth to environmental conditions. Boxwood, Pivet and Yew are plants commonly used for topiary.
Observable cell differentiation results from the expression of genes for tissue-specific proteins. Re-write the sentence above in your own words.
Like all science, biology is a process of inquiry.
Essential idea: Plants adapt their growth to environmental conditions.
A perspective on proteomics in cell biology
Introduction to Metallomics Supplementary Reading:
Presentation transcript:

PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV) terms from ontologies such as Gene Ontology (GO) and Plant Ontology (PO).  Our Arabidopsis specific tool - Patterns in Arabidopsis Annotation (PattArAN) will focus on pattern creation from annotation knowledge of (gene, GO, PO) triplets and triplet validation using the scientific literature.  PattArAn will help scientists to scour the literature, to understand the connection to the annotation evidence and biological knowledge, and to develop hypotheses. Goals: Explore new research ideas in three areas of interests using PattArAn. (1) Explore new research ideas in three areas of interests using PattArAn. Build a gold standard dataset using manual annotation of triplet fingerprints. (2) Build a gold standard dataset using manual annotation of triplet fingerprints. The PattArAn Team at the University of Maryland, the University of Iowa, and St. Bonaventure University Gene-GO-PO Triplets Gene-GO-PO Triplets Document Annotation Guidelines Document Annotation Guidelines Observations Observations Check inter-annotator agreement.Check inter-annotator agreement. Extract gene interaction sentences in the context of our annotation triplets.Extract gene interaction sentences in the context of our annotation triplets. Develop algorithms to rank sentences by importance with this gold standard data.Develop algorithms to rank sentences by importance with this gold standard data.  GO and PO combinations centered on a gene.  Documents supporting annotations identified and collected. Area1Area2Area3 # triplets in document set (8 documents) Found In Full-Text: 3214 # triplets w/ at least 1 sentence1116 # triplets w/ all 3 doublets in at least 1 sentence each010 # triplets w/ only 2 doublets in at least 1 sentence24575 # triplets w/ only 1 doublet in at least 1 sentence Found In Supplementary Data: # triplets found3138 # doublets found83469 Using our triplets we could identify connections between a specific area to other fields in biology in under four weeks. Interesting also to see how biologists’ genes of interest may function in concert to influence different bioprocesses. This well serves as the beginning of an exploration that may eventually lead to new hypotheses and discoveries.  : Triplets represented by sentences to varying degrees. Supplementary material quite rich. Doublets have most potential.  Annotations : Triplets represented by sentences to varying degrees. Supplementary material quite rich. Doublets have most potential.  : Annotations of document ( ) well explain a biological process of Arabidopsis thaliana. The TSO2 gene relates to cell division by controlling dNTPs balance. All annotating GOs link through the function of TSO2. Also TSO2 is expressed in the organs mentioned in the POs. Thus, this paper nicely links the PO terms and GO terms.  Knowledge Underlying Triplets : Annotations of document ( ) well explain a biological process of Arabidopsis thaliana. The TSO2 gene relates to cell division by controlling dNTPs balance. All annotating GOs link through the function of TSO2. Also TSO2 is expressed in the organs mentioned in the POs. Thus, this paper nicely links the PO terms and GO terms.  : Document indicates that the redox gene AtCB5-D is expressed at varying levels across plant tissues. Document indicates that upon infection with Pseudomonas syringae, expression levels drop significantly in Arabidopsis leaves. This process is one aspect of a complex, genome wide response to bacterial infection involving many genes.  Cross-document inference : Document indicates that the redox gene AtCB5-D is expressed at varying levels across plant tissues. Document indicates that upon infection with Pseudomonas syringae, expression levels drop significantly in Arabidopsis leaves. This process is one aspect of a complex, genome wide response to bacterial infection involving many genes.  : Using doublets in document ( ) we may infer that: “The plasma membrane protein SLAC1 is essential for stomatal closure in response to CO2, abscisic acid, ozone, light/dark transitions, humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This is interesting as it is describes a single protein that is involved in many responses due to various environmental signals.  Inferred Triplet : Using doublets in document ( ) we may infer that: “The plasma membrane protein SLAC1 is essential for stomatal closure in response to CO2, abscisic acid, ozone, light/dark transitions, humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This is interesting as it is describes a single protein that is involved in many responses due to various environmental signals.  : regulation of flower and fruit development by genes and signal pathways. (e.g., genes TSO1, TSO2, MSI1)  Area 1 : regulation of flower and fruit development by genes and signal pathways. (e.g., genes TSO1, TSO2, MSI1)  : signal transduction of the plant hormone ethylene.  Area 2 : signal transduction of the plant hormone ethylene. (e.g., genes ETR1, ERS1, ETR2)  : integration of metabolite transporters with plant growth, development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2)  Area 3 : integration of metabolite transporters with plant growth, development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2) Future Work Summary