Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Virginia Tech.

Slides:



Advertisements
Similar presentations
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Advertisements

2nd IeCAB symposium June Mohamed M. Ibrahim 1* and Sameera O. Bafeel 2 1 Science Department ( Biology section), Teacher's College, King.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath.
Gene expression analysis summary Where are we now?
Microarrays Dr Peter Smooker,
Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Mark Schena, Dari Shalon, Renu Heller, Andrew Chai, Patrick O. Brown,
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
The Power of Microarray Technology Ruth G. Alscher.
December 14, 2001Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA 24061
Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Naren Ramakrishnan.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduce to Microarray
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
Applications of Functional Genomics and Bioinformatics Towards an Understanding of Oxidative Stress Resistance in Plants: Expresso and Chips.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
Analysis of microarray data
with an emphasis on DNA microarrays
HC70AL Spring 2009 Gene Discovery Laboratory RNA and Tools For Studying Differential Gene Expression During Seed Development 4/20/09tratorp.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Potato Genomics In Fredericton Dr. Barry Flinn Co-Lead Investigator - Genome Atlantic CPGP Research Director - Solanum Genomics International Inc.
Expresso and Chips Studying Drought Stress in Plants with cDNA Microarrays Lenwood S. Heath Department of Computer Science Virginia Tech, VA
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Biology The Study of Life. Course Description "Biology of organisms and cells concerns living things, their appearance, different types of life, the scope.
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA University.
Principal component analysis Treated Vehicle Treated Vehicle.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
MICROARRAY TECHNOLOGY
Lecture 7. Functional Genomics: Gene Expression Profiling using
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
May 23, 2002Slide 1 Networks in Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA, USA I-SPAN’02 Manila, Philippines May 23, 2002.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
IFAFS Meeting Gene Expression – Disease and Water Deficit John Davis.
Microarray Data Analysis The Bioinformatics side of the bench.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Comparison of biological pathways in zinc decificent Arabidopsis thaliana to zinc excess Thlaspi caerulescens BioInformatics Lab Tuesday, April 13, 2010.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Microarray: An Introduction
Figure S1 (a) (b) Fig. S1. Hydroponics culture of Arabidopsis thaliana. (a) Illustration of the hydroponics system in the growth chamber. (b) close-up.
Risheng Chen et al BMC Genomics
Understanding Stress Response Mechanisms with Expresso
Summary of the Standards of Learning
Presentation transcript:

Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Virginia Tech December 14, 2001

Overview Organization of our group About environmental stress and reactive oxygen species (ROS) Plant responses to ROS Analysis of responses to stress on a chip - microarray technology Expresso: management system for microarrays –Managing expression experiments –Analyzing expression data –Reaching conclusions Where do we go from here?

Boris Chevone Ron Sederoff NCSU Dawei Chen Ruth Alscher Lenny Heath Naren Ramakrishnan, Keying Ye Len van Zyl NCSU Carol Loopstra Texas A and M Jonathan Watkinson Margaret Ellis Logan Hanks Senior Collaborators Students: VT Cecilia Vasquez

Detection of stress - mediated gene expression effects on microarrays Computational tools to infer interaction among genes, pathways Revised / New Tools and Experiments Genetic Regulatory Networks Test inferences with varying conditions and genotypes Iterative strategy for detection of stress -mediated effects on gene expression using microarrays and CS expertise

Expresso

Plants adapt to changing environmental conditions through global cellular responses involving successive changes in, and interactions among, expression patterns of numerous genes. Our group studies these changes through a combination of bioinformatics and genomic techniques. Plant Response to Stress

Biological: To identify molecular stress resistance mechanisms in tree and crop species. Bioinformatic: To support iterative experimentation in plant genomics, capture and analyze experimental data, integrate biological information from diverse sources, and close the experimental loop. Long Term Goals

The Paradox of Aerobiosis Oxygen is essential, but toxic. Aerobic cells face constant danger from reactive oxygen species (ROS). ROS can act as mutagens, they can cause lipid peroxidation and denature proteins.

ROS Arise as a Result of Exposure to: Ozone Sulfur dioxide High light Herbicides Extremes of temperature Salinity Drought

Free Radicals

Responses to Environmental Signals

Redox Regulation of Cellular Systems Membrane Receptors Environmental Stress Metabolite Defense Protein kinases; phosphatases Transcription factors Gene Expression Defense, Repair, Apoptosis Prooxidants (ROS) Antioxidants

Scenarios for Effects of Abiotic Stress on Gene Expression in Plants

Drought Stress Responses in Loblolly Pine: Questions to be Addressed Can a hierarchy of drought stress resistance mechanisms be identified ? Can a clear distinction be made between rapidly responding and long term adaptational mechanisms? Can particular subgroups within gene families be associated with drought tolerance?

Hypotheses There is a group of genes whose expression confers resistance to drought stress. Based on previous work increased expression of defense genes is co-regulated and is correlated with resistance to oxidative stress. Failure to cope is correlated with little or no defense gene activation. Candidate resistance genes follow this pattern of expression. A common core of defense genes exists, which responds to several different stresses.

Components of Stress Study Pine Drought Stress Experiments Expresso Prototype Design and Print Microarrays Select Pine cDNAs 384, 2400 (1999, 2001) Design Functional Hierarchy Capture Spot Intensities Integrate and Analyze Inductive Logic Programming (ILP)

Imposition of Successive Cycles of Mild or Severe Drought Stress on 1-year-old Loblolly Pine Seedlings DAYS  water potential (bars) RNA Harvest I RNA Harvest II RNA Harvest III RNA Harvest IV Cycles of Mild Drought Stress DRY DOWN = P S (photosynthesis) DAYS  water potentional (bars) Cycle I Cycle II Cycle III RNA Harvest I RNA Harvest II RNA Harvest III Cycles of Severe Drought Stress DRY DOWN Water withheld Water given Water withheld Water given Water withheld RECOVERY

Categories within Protective and Protected Processes Plant Growth Regulation Environmental Change Gene Expression Signal Transduction Protective Processes Protected Processes ROS and Stress Cell Wall Related Phenylpropanoid Pathway Development Metabolism Chloroplast Associated Carbon Metabolism Respiration and Nucleic Acids Mitochondrion Cells Tissues Cytoskeleton Secretion Trafficking Nucleus Protease-associated

Protective Processes Stress Cell Wall Related Phenylpropanoid Pathway Abiotic Biotic Antioxidant Processes Drought Heat Non-Plant Xenobiotics NADPH/Ascorbate/ Glutathione Scavenging Pathway Cytosolic ascorbate peroxidase Dehydrins, Aquaporins Heat shock proteins (Chaperones) superoxide dismutase-Fe superoxide dismutase-Cu-Zn glutathione reductase Sucrose Metabolism Cellulose Arabionogalactan proteins Hemicellulose Pectins Xylose Other Cell Wall Proteins isoflavone reductases phenylalanine ammonia-lyases S-adenosylmethionine decarboxylases glycine hydromethyltransferases Lignin Biosynthesis CCoAOMTs 4-coumarate-CoA ligases cinnamyl-alcohol dehydrogenase Chaperones “Isoflavone Reductases” GSTs Extensins and proline rich proteins Categories within “Protective Processes”

Hypotheses versus Results – 1999 Expt oAmong the genes responding positively to mild stress, there exists a population of genes whose expression is negative or unchanged under severe stress. –Candidate stress resistance genes. Genes in 69 categories ( e.g. HSP70s and 100s, aquaporins, but not HSP80s) responded positively to mild stress. Effect of severe stress was not detectable or negative.

Genes associated with other stresses responded to drought stress –Isoflavone reductase homologs and GSTs responded positively to mild drought stress. –These categories are previously documented to respond to biotic stress and xenobiotics, respectively. –However, both isoflavone reductase homologs and GSTs responded positively also to severe drought stress. Thus, they do not fall into the category of candidate stress resistance genes. Hypotheses versus Results – 1999 Experiment

Candidate Categories Include –Aquaporins –Dehydrins –Heat shock proteins/chaperones Exclude –Isoflavone reductases

Flow of a Microarray Experiment Hypotheses Select cDNAs PCR Test of Hypotheses Extract RNA Replication and Randomization Reverse Transcription and Fluorescent Labeling Robotic Printing HybridizationIdentify SpotsIntensitiesStatisticsClusteringData Mining, ILP

Selected 384 archived ESTs Organized into four 96-well microtitre source plates after PCR Pipetted into 8 sets of four randomized microtitre plates Each set is a different randomized arrangement of the 384 ESTs Design of Microarrays I --- Randomization

Printed type A microarrays from first four sets (16 plates); printed type B microarrays from second four sets Each array type has four replicates of each EST, randomly placed Each comparison was performed with four different hybridizations, with dyes reversed in two Total of 16 replicates of each EST in each comparison Design of Microarrays II --- Replication

Image Analysis: gridding, spot identification, intensity and background calculation, normalization Statistics: Fold or ratio estimation Combining replicates Higher-level Analysis: Clustering methods Inductive logic programming (ILP) Spot and Clone Analysis

Spot Identification and Intensity Analysis Microarray Suite: Manual grid; extract intensities for each spot; compute ratios; compute calibrated ratios Spot Statistics: –Every calibrated ratio is divided by the mean of all the uncalibrated ratios; the result is simply that the mean of the calibrated ratios is 1.0 –Our tools use the logarithm of each calibrated ratio –Positive: expression increase –Negative: expression decrease –Zero: no change in expression

Analysis of Expression Data The multiple (typically 16) log calibrated ratios for a replicated clone do NOT follow a normal distribution. Distribution is spread relatively evenly over a large range. Statistical analysis based on mean and standard deviation will be overly pessimistic in identifying clones that are up- or down-expressed. From the observation of an even spread of the log ratios, we assume that a clone whose expression is not different from a probe pair will show a distribution centered at a mean log ratio of 0.0.

Computational Methods --- Alternate Assumptions Our more general assumption avoids the trap of having to classify the response of each SPOT; rather, we classify the response of an EST as one of –Up-regulated –Down-regulated –No clear change Response CLASSIFICATION rather than QUANTIFICATION allows us to develop unified relationships among genes and among treatments. Provides sufficient results for the use of inductive logic programming (ILP).

Data Mining: Inductive Logic Programming ILP is a data mining algorithm expressly designed for inferring relationships. By expressing relationships as rules, it provides new information and resultant testable hypotheses. ILP groups related data and chooses in favor of relationships having short descriptions. ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).

Infers rules relating gene expression levels to categories, both within a probe pair and across probe pairs, without explicit direction Example Rule: [Rule 142] [Pos cover = 69 Neg cover = 3] level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive). Interpretation: “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.” Rule Inference in ILP

ILP subsumes two forms of reasoning Unsupervised learning –“Find clusters of genes that have similar/consistent expression patterns” Supervised learning –“Find a relationship between a priori functional categories and gene expression” Hybrid reasoning: Information Integration –“Is there a relationship between genes in a given functional category and genes in a particular expression cluster?” –ILP mines this information in a single step

NSF-Supported Work of 2001: Expresso Progress to Date Margaret Ellis and Logan Hanks (computer science graduate students): MEL: Semistructured data model for experiment capture Parsing: Automatic parser generators to drive archival storage Database: Loading and cataloging MEL data in a Postgres RDBMS Pipeline: Linkages to data analysis and data mining software

Imposition of Successive Cycles of Mild or Severe Drought Stress on 1-year-old Loblolly Pine Seedlings DAYS  water potential (bars) RNA Harvest I RNA Harvest II RNA Harvest III RNA Harvest IV Cycles of Mild Drought Stress DRY DOWN = P S (photosynthesis) DAYS  water potentional (bars) Cycle I Cycle II Cycle III RNA Harvest I RNA Harvest II RNA Harvest III Cycles of Severe Drought Stress DRY DOWN Water withheld Water given Water withheld Water given Water withheld RECOVERY

Cy3 TIFF Image Final Harvest; Control versus Mild Stress; 2001 Cy5 TIFF Image Replication Differential Expression

Final Harvest; Control versus Mild Stress; 2001 Cy5 to Cy3 ratios. Final harvest after four drought cycles. RNA harvested 24 hours after final watering. Cy5 = treated; Cy3 = control. Aquaporins responded positively. HSP 80’s were unaffected (same as in 1999 results).

Drought Stress Responses in Loblolly Pine: Questions to be Addressed Can a hierarchy of drought stress resistance mechanisms be identified ? Can a clear distinction be made between rapidly responding and long term adaptational mechanisms? Can particular subgroups within gene families be associated with drought tolerance?

Proposed Project: Plant Biology (with co-PIs: Ron Sederoff, NCSU; Carol Loopstra, TAMU) An investigation of drought stress responses in lobolly pine in a variety of provenances. Quantitative RT-PCR to confirm and expand results obtained with microarrays. In situ hybridization to stressed and unstressed cell and tissue types.

Proposed Project: Sources of cDNAs for arrays NCSU ESTs selected on the basis of function. Stressed cDNA libraries from roots and stems of drought tolerant families from East Texas and Lost Pines, and from the Atlantic Coastal Plain (humid conditions). Homologs of drought-responsive Arabidopsis genes.

Drought Stress Responses in Loblolly Pine: Future Bioinformatics Goals Support incorporation of biological information in the form of functional hierarchies and gene families. Close the computational and experimental loop to support iterative experimental regimes. Integrate information from multiple experiments involving multiple provenances, drought stresses, and EST sets.

Gene Discovery in the Arabidopsis Transcriptome Data Capture Postgres Database Database Queries Statistical Analysis and Clustering Data Mining, ILP Possible Identification of Novel Drought Responsive Genes in Arabidopsis Drought Stress (short and long term) Hybridize to Arabidopsis Transcriptome Scanning, Image Processing

Select Pine cDNAs Via Contigs Robotic Replication and Printing Identification of Drought Responsive Genes and Pathways Across Provenances in Loblolly Pine Data Capture Postgres Database Database Queries Statistical Analysis and Clustering Data Mining, ILP Drought Stress Experiments on NC, TX Pine Hybridization Scanning, Image Processing Identification of Drought Responsive Pine Genes Close The Loop Arabidopsis Drought Responsive genes

Proposed Project: Bioinformatics I (Alscher, Heath, Ramakrishnan) Constraint-based selection of cDNAs, including intelligent use of contigs. Assignment of pine ESTs to subgroups within protein families (ProDom, Pfam). Extend information integration in ILP to include Mendel classification of gene families. Integrating data across provenances and known degrees of drought tolerance.

Proposed Project: Bioinformatics II (Ramakrishnan, Heath) Specialize ILP for particular biological information sources. Automatic tuning of ILP parameters. Pushing data mining functionality into the database. Interleaving and iteration of query, data analysis, and data mining operations.