Expresso and Chips Studying Drought Stress in Plants with cDNA Microarrays Lenwood S. Heath Department of Computer Science Virginia Tech, VA 24061.

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Gene Ontology John Pinney
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Modeling and Understanding Stress Response Mechanisms with Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath.
Gene expression analysis summary Where are we now?
Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Virginia Tech.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
The Power of Microarray Technology Ruth G. Alscher.
December 14, 2001Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA 24061
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants Ruth Grene Alscher Lenwood S. Heath Naren Ramakrishnan.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
Analysis of microarray data
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
歐亞書局 PRINCIPLES OF BIOCHEMISTRY Chapter 9 DNA-Based Information Technologies.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Library screening Heterologous and homologous gene probes Differential screening Expression library screening.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson, D. Berleant, Z. Cox, W. Qi, and E. Wurtele Iowa State University.
November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA University.
Affymetrix/BioCarta comparison & Java-based pathway analysis Michael Edmonson 2/26/2003.
3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.
A Biology Primer Part IV: Gene networks and systems biology Vasileios Hatzivassiloglou University of Texas at Dallas.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
May 23, 2002Slide 1 Networks in Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA, USA I-SPAN’02 Manila, Philippines May 23, 2002.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to biological molecular networks
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Microarray Data Analysis The Bioinformatics side of the bench.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Gene Expression Analysis
Microarray Technology and Applications
Understanding Stress Response Mechanisms with Expresso
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Presentation transcript:

Expresso and Chips Studying Drought Stress in Plants with cDNA Microarrays Lenwood S. Heath Department of Computer Science Virginia Tech, VA 24061

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 EXPRESSO TEAM VT Ron Sederoff Lenny HeathNaren RamakrishnanLayne Watson Cecilia Vasquez-Robinet Shrinivasrao ManeAllan SiosonMaulik Shukla Harsha Rajasimha Jonathan WatkinsonBoris ChevoneRuth Grene Andrew McElrone Catarina Moura Duke NCSU

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 Grand Goal Develop explanatory and predictive models of phenomena occurring within plant cells in response to drought and other oxidative stresses

Expresso and Chips Fordham UniversityMay 6, 2003 Questions Currently Addressed in the Grene Lab 1.Big picture: What makes a plant successfully acclimate to drought stress? 2.Specifically: Which changes in gene expression are associated with physiological acclimation to drought stress? 3.Goal: Using Expresso and the smarts of several computer scientists, can we construct, or amend, pathways depicting the perception of drought stress, and successive events which culminate in acclimation? 4.Future Work: Which changes in the metabolite population are associated with acclimation?

Expresso and Chips Fordham UniversityMay 6, 2003 Long term objective of drought experiments in Expresso Develop explanatory and predictive models of phenomena occurring within plant cells in response to drought using cDNA microarrays and metabolomics. Gene Expression Stress perception Metabolic acclimatory responses Protective responses - LEAs, antioxidants

Expresso and Chips Fordham UniversityMay 6, 2003 Responses to Environmental Signals

Expresso and Chips Fordham UniversityMay 6, DAYS  = water potentional (bars) Cycle I Cycle II Cycle III Experiment2: Cycles of Severe Drought Stress DRY DOWN Water given Water withheld RECOVERY DAYS  = water potential (bars) Experiment 1: Cycles of Mild Drought Stress DRY DOWN Water withheld Water given Water withheld RECOVERY Cycle I Cycle II Cycle III DRY DOWN RECOVERY DRY DOWN Water given Water withheld RECOVERY Cycle IV = P S (photosynthesis) = Needles harvest

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 How the microarray process works (courtesy J.M. Trent)

Expresso and Chips Fordham UniversityMay 6, 2003 Flow of Procedures Hypotheses Select cDNAs PCR Extract RNA Replication and Randomization Reverse Transcription and Fluorescent Labeling Robotic Printing HybridizationIdentify SpotsIntensitiesStatisticsClusteringData Mining, ILP CS and Biologists CS Confirm with RT-PCR Experiment PS, water pot.

Expresso and Chips Fordham UniversityMay 6, 2003 Key Steps in cDNA Microarrays Probe generation and microarray design –What to put on the chip? –How to amplify desired genetic material? –Where should selected probes be placed? Target preparation and hybridization –How to isolate samples from control and treated tissues? –How to ensure suitable conditions for hybridization? Data generation and analysis –What methods are available for image processing? –How to accommodate errors in downstream analysis? –How to validate results from microarray studies?

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 Integration of design and procedures Integration of image analysis tools and statistical analysis Data mining using inductive logic programming (ILP) Closing the loop Integrating models Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis

Expresso and Chips Fordham UniversityMay 6, 2003 Probe Selection Biologists provide keywords Keywords used to search Arabidopsis database at TIGR Arabidopsis proteins used to BLAST against pine EST database –Cut-off value of 10e-4 –Select ESTs close to 3’ end of Arabidopsis protein (without compromising match)

Expresso and Chips Fordham UniversityMay 6, 2003 Example of cDNA Selection: bZIPs At3g60320 e-210 At2g21230 e-190 At3g51960 e-190 At3g56660 e-185 At3g60320 e-150 At1g58110 e-134 At5g06960 e-100 At3g60320 e-200 At1g At3g At3g60320 e-1 At2g At4g Arabidopsis gene At3g60320 Best hit pine contig Pine ESTs

Expresso and Chips Fordham UniversityMay 6, 2003 Elements of Array Design Precise tracking of clones from NCSU archive to deposition on the slide Spiking controls: –Orient layout of spots –Generate standard curves –Normalize laser focus and intensity between channels Replication of deposits Printing by more than one pin Placement at different positions on the slide

Expresso and Chips Fordham UniversityMay 6, 2003 cDNA libraries at NCSU Juvenile and normal wood 96 Well Archive Plates (VT) Addition of blanks, and spiking controls 96 Well PCR Plates 96 Well Storage Plates Cleaning 384 Well Printing Plates Transfer 4 to 1

Expresso and Chips Fordham UniversityMay 6, x 24 Subarray of deposits Slide Microarray Printing Plates

Expresso and Chips Fordham UniversityMay 6, 2003 Reciprocal labelings Modified loop design (Kerr and Churchill, 2001) Hybridization C3 T1 C1 T3 C2 T2

Expresso and Chips Fordham UniversityMay 6, 2003 Image Capture and Analysis Image capture on ScanArray 5000 –Model laser and photomultiplier tube –Model inconsistencies in slide and spot Image analysis –Currently using ScanArray Express –Incorporate into Expresso

Expresso and Chips Fordham UniversityMay 6, 2003 Wolfinger Statistical Approach Assumption: Biological phenomena are in terms of multiplicative effects [Kerr, Churchill, 2001] Two Stage Analysis Method [Wolfinger, et al., 2001] –Normalization Step ANOVA Mixed Model as the Normalization Model Removes the Global Effects from Array, Dye, Pin, Treatment, etc. –Gene Treatment Interaction Estimation ANOVA Mixed Model as the Gene Model Multiple Comparisons per Gene

Expresso and Chips Fordham UniversityMay 6, 2003 Analysis: The Wolfinger Model Two-phase analysis to remove global effects and estimate the interaction between gene and treatment –y is the log of intensity value of a specific spot on a specific array –  accounts for the overall mean of values in a specific comparison –T, A, D, P and AP are constant and represent variation in different factors –  accounts for residual from the ANOVA model –G is the overall mean of the residual for each gene in a comparison and GT is the overall mean of the residual in treatment or control –A t-test is used to test whether the GT between treated and control is different or equal ANOVA normalization model Gene model

Expresso and Chips Fordham UniversityMay 6, 2003 Log of 2 fold 1.4 fold

Expresso and Chips Fordham UniversityMay 6, 2003 Analysis: Data mining by redescription (ILP) Based on a collection of 15 relational databases implemented using Postgres –Experimental conditions –cDNA details –Physiological measurements –Gene expression levels

Expresso and Chips Fordham UniversityMay 6, 2003 Inductive Logic Programming A more expressive way to mine patterns than attribute- based clustering Traditional clustering (SOMs, agglomerative etc.) Clusters are difficult to interpret Clusters may not correspond to biological knowledge Difficult to incorporate a priori information ILP Mines only clusters that are “describable” in terms of prior knowledge, e.g., functional categories

Expresso and Chips Fordham UniversityMay 6, 2003 How ILP is used in Expresso Infers rules relating gene expression levels to categories, or to other expression levels, without explicit direction Example Rule: [Rule 142] [Pos cover = 69 Neg cover = 3] level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive). Interpretation: “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”

Expresso and Chips Fordham UniversityMay 6, 2003 Another example This one relates expression level to functional categories level(A,moist_vs_mild,positive) :- category(A, transport_protein). What ILP needs as input Training data Genes placed in functional categories (can be a many- many relationship) Expression levels, physiological data (can be multi- dimensional) What ILP produces as output Rules “redescribing” sets of genes defined using one facet in terms of another – (it finds sets automatically!)

Expresso and Chips Fordham UniversityMay 6, 2003 How ILP works Searches through every possible subset of genes that can be redescribed, from one facet to another Uses clever pruning strategies to pick out the best redescriptions (rules) Evaluates promising rules in terms of Support: how many genes are being considered in the rule? Confidence: of the genes that satisfy the body, how many also satisfy the head? Arranges rules in terms of support, confidence, or other metric

Expresso and Chips Fordham UniversityMay 6, 2003 Current Work on Models Populate library of models for various stages –biophysics (PCR, hybridization) –molecular biology (sequence selection) –robotics (pipetting and transfers) –statistics (error propagation and assessment of treatment effects) –surrogate models (all stages) Configure suitable sequences of models –“run” or “optimize” Example scenarios –“perform end-to-end validation of gene expression” –“design a chip that hybridizes to cDNAs from closely related species” –“where should I sample next for improving data mining results?”

Expresso and Chips Fordham UniversityMay 6, 2003 Why is this problem difficult? Model-based optimization of compositional codes –sequential refinement and optimization infeasible! –models are of various fidelities –errors compound further into the design cycle! Current approaches –“hand tuning” or “word of mouth” protocols –lack of understanding of functional relationships –do not harness existing biological knowledge Need to judiciously –configure virtual experiments to give realistic estimates –minimize cost of additional data collection –maximize information content per experiment

Expresso and Chips Fordham UniversityMay 6, 2003 An example of Expresso modeling Capture PCR reaction dynamics –to model gene quantification computationally –e.g., a Markov model Factors –reaction temperature –rate coefficients –number of reaction cycles –activation energy for nucleotide addition Optimize PCR model to pose –“how many RNA molecules were there in the start of the system?” –leads to full-scale physics-based validation of microarray experiments

Expresso and Chips Fordham UniversityMay 6, 2003 Closing-the-loop in Data Mining Redesign probe set to clarify functional patterns –discrete optimization problem minimizing cross-hybridization maximizing specificity

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, DAYS  = water potentional (bars) Cycle I Cycle II Cycle III Experiment2: Cycles of Severe Drought Stress DRY DOWN Water given Water withheld RECOVERY DAYS  = water potential (bars) Experiment 1: Cycles of Mild Drought Stress DRY DOWN Water withheld Water given Water withheld RECOVERY Cycle I Cycle II Cycle III DRY DOWN RECOVERY DRY DOWN Water given Water withheld RECOVERY Cycle IV = P S (photosynthesis) = Needles harvest

Expresso and Chips Fordham UniversityMay 6, 2003 Net Photosynthesis (  mol CO 2 m -2 s -1 ) Significant Gene Expression ConditionCycleControlStressedPositiveNegative Mild Severe

Expresso and Chips Fordham UniversityMay 6, 2003 Positive Change in Expression Negative Change in Expression

Expresso and Chips Fordham UniversityMay 6, 2003

Expresso and Chips Fordham UniversityMay 6, 2003

Expresso and Chips Fordham UniversityMay 6, 2003

Expresso and Chips Fordham UniversityMay 6, 2003

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Biological Results Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 Glycolytic Pathway, Citric Acid Cycle, and Related Metabolic Processes

Expresso and Chips Fordham UniversityMay 6, 2003 Carbon Metabolism

Expresso and Chips Fordham UniversityMay 6, 2003 Responses to Environmental Signals

Expresso and Chips Fordham UniversityMay 6, 2003 ROS Response

Expresso and Chips Fordham UniversityMay 6, 2003 Network of Munnik and Meijer

Expresso and Chips Fordham UniversityMay 6, 2003 Network of Shinozaki and Yamaguchi-Shinozaki

Expresso and Chips Fordham UniversityMay 6, 2003 Partial differential equations Boolean networks Bayesian networks Logic programs Neural networks Petri nets Fuzzy cognitive maps Weak or none (ad hoc) Mathematical Models for Biological Networks

Expresso and Chips Fordham UniversityMay 6, 2003 Chemical Reaction Molecules: proteins (enzymes and others), DNA, RNA, organic molecules, water, etc. Cellular components: membranes, chromosomes, nucleus, ribosomes, etc. Processes: metabolism, environmental sensing Environmental Condition Time or Stage What a Node Might Represent

Expresso and Chips Fordham UniversityMay 6, 2003 Transformation in a Chemical Reaction: Substrate to product Catalytic Relationship: Enzyme to substrate or reaction Protein/Protein Interaction Signal Transduction Regulation of Transcription Regulation of Translation Activation and Deactivation What an Edge Might Represent

Expresso and Chips Fordham UniversityMay 6, 2003 Outline Expresso Team Drought Stress in Plants Microarray Technology Expresso System Networks in Biology Future

Expresso and Chips Fordham UniversityMay 6, 2003 Ongoing Expresso Work Increase model library coverage –New biophysics models of hybridization and spotting A heterologous chip –Pinus taeda (Loblolly Pine) –Picea abies (Norway Spruce) Multimodal networks –Represent and manipulate biological networks –Incorporate into Expresso and biologists’ work

Expresso and Chips Fordham UniversityMay 6, 2003 Missing biological data is a fact of life As a consequence, a network can be lacking in some details, biologically wrong, or even self- contradictory Ability to reason computationally with uncertainty and with probabilities is essential Uncertainty can suggest hypotheses that can be tested experimentally to refine a network Uncertainty in Networks

Expresso and Chips Fordham UniversityMay 6, 2003 Reconciling Networks

Expresso and Chips Fordham UniversityMay 6, 2003 Nodes and edges have flexible semantics to represent: - Time - Uncertainty - Cellular decision making; process regulation - Cell topology and compartmentalization - Rate constants, etc. Hierarchical Multimodal Networks

Expresso and Chips Fordham UniversityMay 6, 2003 Help biologists find new biological knowledge Visualize and explore Generating hypotheses and experiments Predict regulatory phenomena Predict responses to stress Incorporate into Expresso as part of closing the loop Using Multimodal Networks

Expresso and Chips Fordham UniversityMay 6, 2003 Supported by: Next Generation Software Information Technology Research NSF