Expresso and Chips Studying Drought Stress in Plants with cDNA Microarrays Lenwood S. Heath Department of Computer Science Virginia Tech, VA 24061
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 EXPRESSO TEAM VT Ron Sederoff Lenny HeathNaren RamakrishnanLayne Watson Cecilia Vasquez-Robinet Shrinivasrao ManeAllan SiosonMaulik Shukla Harsha Rajasimha Jonathan WatkinsonBoris ChevoneRuth Grene Andrew McElrone Catarina Moura Duke NCSU
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 Grand Goal Develop explanatory and predictive models of phenomena occurring within plant cells in response to drought and other oxidative stresses
Expresso and Chips Fordham UniversityMay 6, 2003 Questions Currently Addressed in the Grene Lab 1.Big picture: What makes a plant successfully acclimate to drought stress? 2.Specifically: Which changes in gene expression are associated with physiological acclimation to drought stress? 3.Goal: Using Expresso and the smarts of several computer scientists, can we construct, or amend, pathways depicting the perception of drought stress, and successive events which culminate in acclimation? 4.Future Work: Which changes in the metabolite population are associated with acclimation?
Expresso and Chips Fordham UniversityMay 6, 2003 Long term objective of drought experiments in Expresso Develop explanatory and predictive models of phenomena occurring within plant cells in response to drought using cDNA microarrays and metabolomics. Gene Expression Stress perception Metabolic acclimatory responses Protective responses - LEAs, antioxidants
Expresso and Chips Fordham UniversityMay 6, 2003 Responses to Environmental Signals
Expresso and Chips Fordham UniversityMay 6, DAYS = water potentional (bars) Cycle I Cycle II Cycle III Experiment2: Cycles of Severe Drought Stress DRY DOWN Water given Water withheld RECOVERY DAYS = water potential (bars) Experiment 1: Cycles of Mild Drought Stress DRY DOWN Water withheld Water given Water withheld RECOVERY Cycle I Cycle II Cycle III DRY DOWN RECOVERY DRY DOWN Water given Water withheld RECOVERY Cycle IV = P S (photosynthesis) = Needles harvest
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 How the microarray process works (courtesy J.M. Trent)
Expresso and Chips Fordham UniversityMay 6, 2003 Flow of Procedures Hypotheses Select cDNAs PCR Extract RNA Replication and Randomization Reverse Transcription and Fluorescent Labeling Robotic Printing HybridizationIdentify SpotsIntensitiesStatisticsClusteringData Mining, ILP CS and Biologists CS Confirm with RT-PCR Experiment PS, water pot.
Expresso and Chips Fordham UniversityMay 6, 2003 Key Steps in cDNA Microarrays Probe generation and microarray design –What to put on the chip? –How to amplify desired genetic material? –Where should selected probes be placed? Target preparation and hybridization –How to isolate samples from control and treated tissues? –How to ensure suitable conditions for hybridization? Data generation and analysis –What methods are available for image processing? –How to accommodate errors in downstream analysis? –How to validate results from microarray studies?
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 Integration of design and procedures Integration of image analysis tools and statistical analysis Data mining using inductive logic programming (ILP) Closing the loop Integrating models Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis
Expresso and Chips Fordham UniversityMay 6, 2003 Probe Selection Biologists provide keywords Keywords used to search Arabidopsis database at TIGR Arabidopsis proteins used to BLAST against pine EST database –Cut-off value of 10e-4 –Select ESTs close to 3’ end of Arabidopsis protein (without compromising match)
Expresso and Chips Fordham UniversityMay 6, 2003 Example of cDNA Selection: bZIPs At3g60320 e-210 At2g21230 e-190 At3g51960 e-190 At3g56660 e-185 At3g60320 e-150 At1g58110 e-134 At5g06960 e-100 At3g60320 e-200 At1g At3g At3g60320 e-1 At2g At4g Arabidopsis gene At3g60320 Best hit pine contig Pine ESTs
Expresso and Chips Fordham UniversityMay 6, 2003 Elements of Array Design Precise tracking of clones from NCSU archive to deposition on the slide Spiking controls: –Orient layout of spots –Generate standard curves –Normalize laser focus and intensity between channels Replication of deposits Printing by more than one pin Placement at different positions on the slide
Expresso and Chips Fordham UniversityMay 6, 2003 cDNA libraries at NCSU Juvenile and normal wood 96 Well Archive Plates (VT) Addition of blanks, and spiking controls 96 Well PCR Plates 96 Well Storage Plates Cleaning 384 Well Printing Plates Transfer 4 to 1
Expresso and Chips Fordham UniversityMay 6, x 24 Subarray of deposits Slide Microarray Printing Plates
Expresso and Chips Fordham UniversityMay 6, 2003 Reciprocal labelings Modified loop design (Kerr and Churchill, 2001) Hybridization C3 T1 C1 T3 C2 T2
Expresso and Chips Fordham UniversityMay 6, 2003 Image Capture and Analysis Image capture on ScanArray 5000 –Model laser and photomultiplier tube –Model inconsistencies in slide and spot Image analysis –Currently using ScanArray Express –Incorporate into Expresso
Expresso and Chips Fordham UniversityMay 6, 2003 Wolfinger Statistical Approach Assumption: Biological phenomena are in terms of multiplicative effects [Kerr, Churchill, 2001] Two Stage Analysis Method [Wolfinger, et al., 2001] –Normalization Step ANOVA Mixed Model as the Normalization Model Removes the Global Effects from Array, Dye, Pin, Treatment, etc. –Gene Treatment Interaction Estimation ANOVA Mixed Model as the Gene Model Multiple Comparisons per Gene
Expresso and Chips Fordham UniversityMay 6, 2003 Analysis: The Wolfinger Model Two-phase analysis to remove global effects and estimate the interaction between gene and treatment –y is the log of intensity value of a specific spot on a specific array – accounts for the overall mean of values in a specific comparison –T, A, D, P and AP are constant and represent variation in different factors – accounts for residual from the ANOVA model –G is the overall mean of the residual for each gene in a comparison and GT is the overall mean of the residual in treatment or control –A t-test is used to test whether the GT between treated and control is different or equal ANOVA normalization model Gene model
Expresso and Chips Fordham UniversityMay 6, 2003 Log of 2 fold 1.4 fold
Expresso and Chips Fordham UniversityMay 6, 2003 Analysis: Data mining by redescription (ILP) Based on a collection of 15 relational databases implemented using Postgres –Experimental conditions –cDNA details –Physiological measurements –Gene expression levels
Expresso and Chips Fordham UniversityMay 6, 2003 Inductive Logic Programming A more expressive way to mine patterns than attribute- based clustering Traditional clustering (SOMs, agglomerative etc.) Clusters are difficult to interpret Clusters may not correspond to biological knowledge Difficult to incorporate a priori information ILP Mines only clusters that are “describable” in terms of prior knowledge, e.g., functional categories
Expresso and Chips Fordham UniversityMay 6, 2003 How ILP is used in Expresso Infers rules relating gene expression levels to categories, or to other expression levels, without explicit direction Example Rule: [Rule 142] [Pos cover = 69 Neg cover = 3] level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive). Interpretation: “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”
Expresso and Chips Fordham UniversityMay 6, 2003 Another example This one relates expression level to functional categories level(A,moist_vs_mild,positive) :- category(A, transport_protein). What ILP needs as input Training data Genes placed in functional categories (can be a many- many relationship) Expression levels, physiological data (can be multi- dimensional) What ILP produces as output Rules “redescribing” sets of genes defined using one facet in terms of another – (it finds sets automatically!)
Expresso and Chips Fordham UniversityMay 6, 2003 How ILP works Searches through every possible subset of genes that can be redescribed, from one facet to another Uses clever pruning strategies to pick out the best redescriptions (rules) Evaluates promising rules in terms of Support: how many genes are being considered in the rule? Confidence: of the genes that satisfy the body, how many also satisfy the head? Arranges rules in terms of support, confidence, or other metric
Expresso and Chips Fordham UniversityMay 6, 2003 Current Work on Models Populate library of models for various stages –biophysics (PCR, hybridization) –molecular biology (sequence selection) –robotics (pipetting and transfers) –statistics (error propagation and assessment of treatment effects) –surrogate models (all stages) Configure suitable sequences of models –“run” or “optimize” Example scenarios –“perform end-to-end validation of gene expression” –“design a chip that hybridizes to cDNAs from closely related species” –“where should I sample next for improving data mining results?”
Expresso and Chips Fordham UniversityMay 6, 2003 Why is this problem difficult? Model-based optimization of compositional codes –sequential refinement and optimization infeasible! –models are of various fidelities –errors compound further into the design cycle! Current approaches –“hand tuning” or “word of mouth” protocols –lack of understanding of functional relationships –do not harness existing biological knowledge Need to judiciously –configure virtual experiments to give realistic estimates –minimize cost of additional data collection –maximize information content per experiment
Expresso and Chips Fordham UniversityMay 6, 2003 An example of Expresso modeling Capture PCR reaction dynamics –to model gene quantification computationally –e.g., a Markov model Factors –reaction temperature –rate coefficients –number of reaction cycles –activation energy for nucleotide addition Optimize PCR model to pose –“how many RNA molecules were there in the start of the system?” –leads to full-scale physics-based validation of microarray experiments
Expresso and Chips Fordham UniversityMay 6, 2003 Closing-the-loop in Data Mining Redesign probe set to clarify functional patterns –discrete optimization problem minimizing cross-hybridization maximizing specificity
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, DAYS = water potentional (bars) Cycle I Cycle II Cycle III Experiment2: Cycles of Severe Drought Stress DRY DOWN Water given Water withheld RECOVERY DAYS = water potential (bars) Experiment 1: Cycles of Mild Drought Stress DRY DOWN Water withheld Water given Water withheld RECOVERY Cycle I Cycle II Cycle III DRY DOWN RECOVERY DRY DOWN Water given Water withheld RECOVERY Cycle IV = P S (photosynthesis) = Needles harvest
Expresso and Chips Fordham UniversityMay 6, 2003 Net Photosynthesis ( mol CO 2 m -2 s -1 ) Significant Gene Expression ConditionCycleControlStressedPositiveNegative Mild Severe
Expresso and Chips Fordham UniversityMay 6, 2003 Positive Change in Expression Negative Change in Expression
Expresso and Chips Fordham UniversityMay 6, 2003
Expresso and Chips Fordham UniversityMay 6, 2003
Expresso and Chips Fordham UniversityMay 6, 2003
Expresso and Chips Fordham UniversityMay 6, 2003
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 Glycolytic Pathway, Citric Acid Cycle, and Related Metabolic Processes
Expresso and Chips Fordham UniversityMay 6, 2003 Carbon Metabolism
Expresso and Chips Fordham UniversityMay 6, 2003 Responses to Environmental Signals
Expresso and Chips Fordham UniversityMay 6, 2003 ROS Response
Expresso and Chips Fordham UniversityMay 6, 2003 Network of Munnik and Meijer
Expresso and Chips Fordham UniversityMay 6, 2003 Network of Shinozaki and Yamaguchi-Shinozaki
Expresso and Chips Fordham UniversityMay 6, 2003 Partial differential equations Boolean networks Bayesian networks Logic programs Neural networks Petri nets Fuzzy cognitive maps Weak or none (ad hoc) Mathematical Models for Biological Networks
Expresso and Chips Fordham UniversityMay 6, 2003 Chemical Reaction Molecules: proteins (enzymes and others), DNA, RNA, organic molecules, water, etc. Cellular components: membranes, chromosomes, nucleus, ribosomes, etc. Processes: metabolism, environmental sensing Environmental Condition Time or Stage What a Node Might Represent
Expresso and Chips Fordham UniversityMay 6, 2003 Transformation in a Chemical Reaction: Substrate to product Catalytic Relationship: Enzyme to substrate or reaction Protein/Protein Interaction Signal Transduction Regulation of Transcription Regulation of Translation Activation and Deactivation What an Edge Might Represent
Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Networks in Biology Future
Expresso and Chips Fordham UniversityMay 6, 2003 Ongoing Expresso Work Increase model library coverage –New biophysics models of hybridization and spotting A heterologous chip –Pinus taeda (Loblolly Pine) –Picea abies (Norway Spruce) Multimodal networks –Represent and manipulate biological networks –Incorporate into Expresso and biologists’ work
Expresso and Chips Fordham UniversityMay 6, 2003 Missing biological data is a fact of life As a consequence, a network can be lacking in some details, biologically wrong, or even self- contradictory Ability to reason computationally with uncertainty and with probabilities is essential Uncertainty can suggest hypotheses that can be tested experimentally to refine a network Uncertainty in Networks
Expresso and Chips Fordham UniversityMay 6, 2003 Reconciling Networks
Expresso and Chips Fordham UniversityMay 6, 2003 Nodes and edges have flexible semantics to represent: - Time - Uncertainty - Cellular decision making; process regulation - Cell topology and compartmentalization - Rate constants, etc. Hierarchical Multimodal Networks
Expresso and Chips Fordham UniversityMay 6, 2003 Help biologists find new biological knowledge Visualize and explore Generating hypotheses and experiments Predict regulatory phenomena Predict responses to stress Incorporate into Expresso as part of closing the loop Using Multimodal Networks
Expresso and Chips Fordham UniversityMay 6, 2003 Supported by: Next Generation Software Information Technology Research NSF