Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA 24061.

Slides:

Advertisements

Similar presentations

BiGCaT Bioinformatics Hunting strategy of the bigcat.

Advertisements

Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.

August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.

Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA

1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.

Microarray technology and analysis of gene expression data Hillevi Lindroos.

Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath.

Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.

Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Virginia Tech.

Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.

Bioinformatics and Phylogenetic Analysis

Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

Evaluating Hypotheses

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.

The Power of Microarray Technology Ruth G. Alscher.

Similar Sequence Similar Function Charles Yan Spring 2006.

December 14, 2001Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA 24061

Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Naren Ramakrishnan.

Unit 1 Biology Notes Characteristics of Life

ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.

Applications of Functional Genomics and Bioinformatics Towards an Understanding of Oxidative Stress Resistance in Plants: Expresso and Chips.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Quantitative Genetics

Analysis of microarray data

(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,

5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.

Chapter 1: Introduction to Statistics

بسم الله الرحمن الرحيم * this presentation about :- “experimental design “ * Induced to :- Dr Aidah Abu Elsoud Alkaissi * Prepared by :- 1)-Hamsa karof.

Expresso and Chips Studying Drought Stress in Plants with cDNA Microarrays Lenwood S. Heath Department of Computer Science Virginia Tech, VA

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

CDNA Microarrays MB206.

Probes can be designed in an evolutionary hierarchy.

Library screening Heterologous and homologous gene probes Differential screening Expression library screening.

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.

November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA University.

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

Gene expression analysis

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

A Short Overview of Microarrays Tex Thompson Spring 2005.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.

Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.

Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

IFAFS Meeting Gene Expression – Disease and Water Deficit John Davis.

Microarray Data Analysis The Bioinformatics side of the bench.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.

Microarray: An Introduction

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Understanding Stress Response Mechanisms with Expresso

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Presentation transcript:

Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA ORNL Workshop on Genomics Duke University May 1, 2001

Who’s Who Ruth Alscher Plant Stress Boris Chevone Plant Stress Ron Sederoff, Ross Whetten Len van Zyl Y-H.Sun Forest Biotechnology Plant Biology Computer Science Lenwood Heath (CS) Algorithms Naren Ramakrishnan (CS) Data Mining Problem Solving Environments Craig Struble, Vincent Jouenne (CS) Image Analysis Statistics Ina Hoeschele (DS) Statistical Genetics Keying Ye (STAT) Bayesian Statistics Virginia Tech North Carolina State Univ. Virginia Tech Dawei Chen Molecular Biology Bioinformatics

People Ross Whetten Boris Chevone Ron Sederoff Y-H.SunDawei Chen Lenny Heath Ruth Alscher Vincent Jouenne Naren Ramakrishnan Keying Ye Len van Zyl Craig Struble

Overview Plant responses to environmental stress Stress on a chip Summary of results obtained Expresso –Managing expression experiments –Analyzing expression data –Reaching conclusions Where we go from here –Modeling experiments –Modeling pathways

Plant-Environment Interactions Several defense systems that respond to environmental stress are known. Their relative importance is not known. Mechanistic details are not known. Redox sensing may be involved.

Scenarios for Effect of Abiotic Stress on Plant Gene Expression

The 1999 Experiment: A Measure of Long Term Adaptation to Drought Stress Loblolly pine seedlings (two unrelated genotypes “C” and “D”) were subjected to mild or severe drought stress for four (mild) or three (severe) cycles. –Mild stress: needles dried down to –10 bars; little effect on growth, new flushes as in control trees. –Severe stress: needles dried down to –17 bars; growth retardation, fewer new flushes compared to controls. Harvest RNA at the end of growing season, determine patterns of gene expression on DNA microarrays. With algorithms incorporated into Expresso, identify genes and groups of genes involved in stress responses.

Hypotheses There is a group of genes whose expression confers resistance to drought stress. Expression of this group of genes is lower under severe than under mild stress. Individual members of gene families show distinct responses to drought stress.

Selection of cDNAs for Arrays 384 ESTs (xylem, shoot tip cDNAs of loblolly) were chosen on the basis of function and grouped into categories. Major emphasis was on processes known to be stress responsive. In cases where more than one EST had similar BLAST hits, all ESTs were used.

Categories within Protective and Protected Processes Plant Growth Regulation Environmenta l Change Gene Expression Signal Transduction Protective Processes Protected Processes ROS and Stress Cell Wall Related Phenylpropanoid Pathway Development Metabolism Chloroplast Associated Carbon Metabolism Respiration and Nucleic Acids Mitochondrion Cells Tissues Cytoskeleton Secretion Trafficking Nucleus Protease-associated

A Note about Categories Categories are not mutually exclusive; gene(s) may be assigned to more then one category. For example, heat shock proteins have been grouped under these different categories and subcategories –Abiotic stress – heat –Gene expression – post-translational processing – chaperones –Abiotic stress - chaperones

Protective Processes Stress Cell Wall Related Phenylpropanoid Pathway Abiotic Biotic Antioxidant Processes Drought Heat Non-Plant Xenobiotics NADPH/Ascorbate/ Glutathione Scavenging Pathway Cytosolic ascorbate peroxidase Dehydrins, Aquaporins Heat shock proteins (Chaperones) superoxide dismutase-Fe superoxide dismutase-Cu-Zn glutathione reductase Sucrose Metabolism Cellulose Arabionogalactan proteins Hemicellulose Pectins Xylose Other Cell Wall Proteins isoflavone reductases phenylalanine ammonia-lyases S-adenosylmethionine decarboxylases glycine hydromethyltransferases Lignin Biosynthesis CCoAOMTs 4-coumarate-CoA ligases cinnamyl-alcohol dehydrogenase Chaperones “Isoflavone Reductases” GSTs Extensins and proline rich proteins Categories within “Protective Processes”

Quality Control Positive: LP-3, a loblolly gene known to respond positively to drought stress in loblloly pine, was included. LP-3 was positive in the moist versus mild comparison, and unchanged in the moist versus severe comparison. Negative: Four clones of human genes used as negative controls in the Arabidopsis Functional Genomics project were included. The clones did not respond.

Protective Processes ROS and Stress Cell Wall Related Phenylpropanoid Pathway Abiotic Biotic Antioxidant Processes Drought Heat Non-Plant Xenobiotics NADPH/Ascorbate/ Glutathione Scavenging Pathway Cystosolic ascorbate peroxidase Dehydrins, Aquaporins Heat shock proteins superoxide dismutase-Fe superoxide dismutase-Cu-Zn glutathione reductase Sucrose Metabolism Cellulose Extensins, Arabionogalactan, and Proline Rich Proteins Hemicellulose Pectins Xylose Other Cell Wall Proteins isoflavone reductases phenylalanine ammonia-lyase S-adenosylmethionine decarboxylase glycine hydromethyltransferase Lignin Biosynthesis CCoAOMT 4-coumarate-CoA ligase cinnamyl-alcohol dehydrogenase Chaperones “Isoflavone Reductases” GSTs Categories that contained positives in genotypes C and D (Control versus Mild) Data from two slides (4 arrays) for C and two slides (4 arrays) for D were collected.

Hypotheses versus Results Among the genes responding to mild stress, there exists a population of genes whose expression confers resistance. –Genes in 69 categories responded positively to mild stress in Genotypes C and D (the positive response was not observed in the severe stress condition in Genotype D). There is evidence for a response to drought among genes associated with other stresses. –Isoflavone reductase homologs and GSTs responded positively to mild drought stress. –These categories are previously documented to respond to biotic stress and xenobiotics, respectively.

Relationships among HSP homologs In control versus mild stress, HSP 100, 70, and 23 responded in C and D; HSP 80s did not respond in either C or D.

Candidate Categories Include –Aquaporins –Dehydrins –Heat shock proteins/chaperones Exclude –Isoflavone reductases

Numerous sources of error in microarray experiments: identify, control, and analyze Clones on a microarray need to be replicated and randomly placed (Lee et al., PNAS 97, August 29, 2000, ) Differing results among replicates can indicate sources of error; consistency gives confidence Experimental Design: Computational and Statistical Issues

Integration of design and procedures Integration of image analysis tools and statistical analysis (via Perl scripts) Connections to web database and sequence alignment tools The software Aleph was used for inductive logic programming (ILP). Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis

Expresso: A Microarray Experiment Management System

Selected 384 archived ESTs Organized into 4 microtitre source plates after PCR Pipetted into 8 sets of 4 randomized microtitre plates; each set a different arrangement of the 384 ESTs Printed type A microarrays from first 4 sets (16 plates); printed type B microarrays from second 4 sets Each array type has 4 replicates of each EST, randomly placed Design of Microarrays I

Each slide contained 2 identical arrays (of type A or B), 4 replicates of each EST per array Each slide, therefore, has a total of 8 replicates of each EST A second slide also contained 2 arrays of the other type, 4 replicates of each EST Total of 16 replicates of each EST for a 2 slide set Design of Microarrays II

Image Analysis: gridding, spot identification, intensity and background calculation, normalization Statistics: fold or ratio estimation, combining replicates Higher-level Analysis: a slew of clustering methods, inductive logic programming (ILP) Spot and Clone Analysis

Analysis of Expression Data Microarray Suite: Manual grid; extract intensities for each spot; compute ratios; compute calibrated ratios Spot Statistics: –Every calibrated ratio is divided by the mean of all the uncalibrated ratios; the result is simply that the mean of the calibrated ratios is 1.0 –Our tools use the logarithm of each calibrated ratio –Positive: expression increase –Negative: expression decrease –Zero: no change in expression

Analysis of Expression Data The multiple (typically 16) log calibrated ratios for a replicated clone do NOT follow a normal distribution. Distribution is spread relatively evenly over a large range. Statistical analysis based on mean and standard deviation will be overly pessimistic in identifying clones that are up- or down-expressed. From the observation of an even spread of the log ratios, we assume that a clone whose expression is not different from a probe pair will show a distribution centered at a mean log ratio of 0.0.

Computational Methods (A Probabilistic Analysis) In a zero-centered distribution, the probability that any particular log ratio is positive (or negative) is 0.5. The number of positive (or negative) log ratios follows a binomial distribution with parameters 16 and 0.5. The probability of 12 positive log ratios (or 12 negative log ratios), out of 16, for a clone whose expression was unaffected by drought stress is A clone with 12 or more positive log ratios is up- expressed with a probability 0.96.

Computational Methods (Alternate Assumptions) Our more general assumption avoids the trap of having to classify the response of each SPOT; rather, we classify the response of an EST as one of –Up-regulated –Down-regulated –No clear change Response CLASSIFICATION rather than QUANTIFICATION allows us to develop unified relationships among genes and among treatments. Provides sufficient results for the use of inductive logic programming (ILP).

Related Statistical Results Chen et al. (J. Biomed. Optics 2, 1997, ) –Assume a normal distribution and normalize ratios –No replicates –Estimate a confidence interval for ratios that applies to each spot Lee et al. (PNAS 97, August 29, 2000, ) emphasize need for replication Black and Doerge (PNAS, to appear) –Investigate distributional assumptions of log-normal and gamma distributions on intensities –Determine the number of replicates needed for a particular confidence level under each distribution –Assume that normalization and location-dependent noise have been eliminated.

Clustering Techniques Attribute-Value Methods Clustering Conceptual Clustering SVMsSOMs Similarity-Metric Agglomerative Divisive (bottom-up) (top-down)

Inductive Logic Programming ILP is a data mining algorithm expressly designed for inferring relationships. By expressing relationships as rules, it provides new information and resultant testable hypotheses. ILP groups related data and chooses in favor of relationships having short descriptions. ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).

ILP subsumes two forms of reasoning Unsupervised learning –“Find clusters of genes that have similar/consistent expression patterns” Supervised learning –“Find a relationship between a priori functional categories and gene expression” Hybrid reasoning –“Is there a relationship between genes in a given functional category and genes in a particular expression cluster?” –ILP mines this information in a single step

Rule Inference in ILP Infers rules relating gene expression levels to categories, both within a probe pair and across probe pairs, without explicit direction Example Rule: [Rule 142] [Pos cover = 69 Neg cover = 3] ~level(A,moist_vs_severe,positive) :- level(A,moist_vs_mild,positive). Interpretation: “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”

More Rules we Obtained [Rule 6] level(A,moist_vs_mild,positive) :- category(A, transport_protein). level(A,mild_vs_severe,negative) :- category(A, transport_protein). [Rule 13] level(A,moist_vs_mild,positive) :- category(A, heat). [Rule 17] level(A,moist_vs_mild,positive) :- category(A, cellwallrelated).

ILP in a Data Mining Context Attribute-Value Methods Clustering Conceptual Clustering SVMsSOMs Similarity-Metric Agglomerative Divisive (bottom-up) (top-down) ILP combines the expressiveness of conceptual clustering with the efficiency of attribute-value techniques.

Current Status of Expresso Completely automated and integrated –Statistical analysis –Data mining –Experiment capture in MEL Current Work: Integrating –Image processing –Querying by semi-structured views –Automatic experiment composition Future Work –Model-based design and management –Randomized experiment layout with constraints –Closing-the-loop

Future Directions Next Generation Stress Chips 1.Time course, short and long term, to capture gene expression events underlying “emergency” and adaptive events following drought stress imposition. (Use all available ESTs for candidate stress resistance genes.) 2.Generate cDNA library from stressed seedlings. Screen for full-length clones. Repeat Step 1. 3.Initiate modeling of kinetics of drought stress responses.

Expresso: Future Directions An open, integrated system for design, process, analysis, data mining, data storage, and integration of information from web-based resources. Supports closing the experimental loop. Accumulated results influence later experiments, as well as enable construction of testable models of pathways. Multiple models are refined and evaluated within Expresso. Biologists have interactive access to models and control Expresso’s components.