Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung Eric Olson

Slides:



Advertisements
Similar presentations
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Advertisements

Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison.
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Identification of network motifs in lung disease Cecily Swinburne Mentor: Carol J. Bult Ph.D. Summer 2007.
BIOMARKER STUDIES IN CLINICAL TRIALS Vicki Seyfert-Margolis, PhD.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Gene Set Enrichment Analysis (GSEA)
Gene Expression Omnibus (GEO)
GenMAPP and MAPPFinder for Systems Biology Education Kam Dahlquist Vassar College June 12-20, 2004 BioQUEST Summer Workshop Beloit College.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Agenda Introduction to microarrays
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Gene expression analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
UBio Training Courses Micro-RNA web tools Gonzalo
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
The Stanley Neuropathology Consortium Integrative Database: A novel web-based tool for exploring neuropathological traits, gene expression and associated.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Statistics for Differential Expression Naomi Altman Oct. 06.
Gene Expression Omnibus (GEO)
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Statistical Testing with Genes Saurabh Sinha CS 466.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
A collaborative tool for sequence annotation. Contact:
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Microarray Data Analysis The Bioinformatics side of the bench.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
David Amar, Tom Hait, and Ron Shamir
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Using ArrayExpress.
Statistical Testing with Genes
The Omics Dashboard.
Statistical Testing with Genes
Cancer Cell Line Encyclopedia
Presentation transcript:

Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung Eric Olson

General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung

Experimental Design Number of groups, factors, replicates Data management Data, sample annotation, gene annotation, databases Differential Expression Comparison statistics, Correction for multiple testing, Clustering Biological significance Individual genes, Biological themes Platform Selection One-color, two-color, platform comparisons System access Ease of you, accessibility Making data public and using public data MIAME, Journals, GEO, meta-analysis The Microarray Data Analysis Process

Experimental Design Number of groups, factors, replicates Data management Data, sample annotation, gene annotation, databases Differential Expression Comparison statistics, Correction for multiple testing, Clustering Biological significance Individual genes, Biological themes Platform Selection One-color, two-color, platform comparisons System access Ease of you, accessibility Making data public and using public data MIAME, Journals, GEO, meta-analysis The Microarray Data Analysis Process

Experiment Design Type of experiment –Two groups Normal vs. cancer Control vs. treated –Three or more groups, single factor Time series Dose response Multiple treatment –Four or more groups, multiple factors Time series with control and treated cells The type of experiment and number of groups and factors will determine the statistical methods needed to detect differential expression Replicates –The more the better, but at least 3 –Biological better than technical Rigorous statistical inferences cannot be made with a sample size of one. The more replicates, the stronger the inference. Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics Sep 1;19(13): Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr -

Differential Expression The fundamental goal of microarray experiments is to identify genes that are differentially expressed in the conditions being studied. Comparison statistics can be used to help identify differentially expressed genes and cluster analysis can be used to identify patterns of gene expression and to segregate a subset of genes based on these patterns. Statistical Significance –Fold change Fold change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance. –Comparison statistics 2 group –t-test, Welch’s t-test, Wilcoxon Rank Sum, 3 or more groups, single factor –One-way ANOVA, Kruskal-Wallis 4 or more groups, multiple factors –Two-way ANOVA Comparison tests require replicates and use the variability within the replicates to assign a confidence level as to whether the gene is differentially expressed. Supporting material - Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today, 7(11 Suppl).: S55-63.

difference between groups difference within groups t-test for comparison of two groups Calculate t statistic t = Determine confidence level for t (probability that t could occur by chance) df = n 1 + n Mean grp 1 – Mean grp 2 ((s 1 2 /n 1 ) + (s 2 2 /n 2 )) 1/2 = s = variance n = size of sample The larger the difference between the groups and the lower the variance the bigger t will be and the lower p will be

Gene 1 Fold Change = 5.3 p = 0.19 Gene 2 Fold Change = 5.3 p = 0.03 Mean Signal Fold change vs. p value 2 groups, 4 replicates each Mean, standard deviation, fold change and p-value calculated Differential Expression

Analysis of Variance (ANOVA) Like t-test, identifies genes with large differences between groups and small differences within groups For use with 3 or more groups One-way and two-way One-way examines effects of one factor on gene expression Two-way can examine effects of two factors on gene expression as well as the interaction of the two factors Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods Dec;31(4): Glantz S. Primer of Biostatistics. 5 th Edition. McGraw-Hill. Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.

Two-way ANOVA Example WT - WT + R6/2 - R6/2 + Triple treatment in Huntington’s Disease model (R6/2 mice, GSE857, Affymetrix U74Av2) Treatment - + Disease WT R6/2 3 Disease effect Treatment effect Interaction Disease and treatment effect (no Interaction) Gene expression pattern 3

Pavlidis P, Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2001;2(10):RESEARCH0042. Two-way ANOVA compared to t-test t-testTwo-way Disease Differences Treatment - + Disease WT R6/2 3 3 Triple treatment in Huntington’s Disease model (R6/2 mice, GSE857, Affymetrix U74Av2)

Analysis Workflow Examples 2 groups (apoE -/- aorta vs. wt aorta) 5 groups, single factor (Drosophila Innate Immune Response Time Series) 12 groups, two factors (Immune response to hookworms in mouse lung) t-test BH (FDR) Up regulated Down regulated Gene Lists One-way ANOVA BH (FDR) Clustering Gene Lists Two-way ANOVA BH (FDR) Clustering Gene Lists Individual genes of interest Biological themes (Pathways, molecular functions, etc.)

General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung

Accessibility Web-based Secure Data management Data Annotation (MIAME) Multiple upload tools CodeLink Affymetrix Illumina Agilent Custom Differential Expression - Powerful, accessible tools for determining Statistical Significance R based statistics Bioconductor Comparison Tests t-test, Welch’s t-test, Wilcoxon Rank sum test, one-way ANOVA, two-way ANOVA Correction for Multiple Testing Bonferroni, Holm, Westfall and Young maxT, Benjamini and Hochberg Unsupervised Clustering PAM, CLARA, Hierarchical clustering Silhouettes GeneSifter – Microarray Data Analysis

Integrated tools for determining Biological Significance One Click Gene Summary™ Ontology Report Pathway Report Search by ontology terms Search by KEGG terms or Chromosome

The GeneSifter Data Center Free resource Training Research Publishing 6 areas Cardiovascular Cancer Endocrinology Neuroscience Immunology Oral Biology Access to : Data Analysis summary Tutorials WebEx

The GeneSifter Data Center

Using the Gene Expression Omnibus ( The Gene Expression Omnibus (GEO) Gene expression data repository (mostly microarrays) Over 3000 data sets All array platforms represented Searchable by Platform Species Experiment annotation Downloadable data

General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung

Project Analysis : Two-way ANOVA Scott lab, Johns Hopkins University (Bloomberg School of Public Health ) Affymetrix Mouse Wild type and SCID mice Control and 5 time points after infection CEL files available (loaded and MAS5 processed in GeneSifter) Alex Loukas, and Paul Prociv. Immune Responses in Hookworm Infections. Clinical Microbiology Reviews, October 2001, p , Vol. 14, No. 4

Analysis of Variance (ANOVA) Like t-test, identifies genes with large differences between groups and small differences within groups For use with 3 or more groups One-way and two-way One-way examines effects of one factor on gene expression Two-way can examine effects of two factors on gene expression as well as the interaction of the two factors Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods Dec;31(4): Glantz S. Primer of Biostatistics. 5 th Edition. McGraw-Hill. Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.

Project Analysis : Two-way ANOVA Factor One: Strain (2 levels, SCID, WT) Factor Two: Time after infection (6 levels, con, 2,3,4,8,12 dpi) Gene expression pattern WT SCIDStrain: Time: Strain Effect Time Effect Interaction

Project Analysis : Two-way ANOVA

Identify Factors Indicate number of levels for each Identify levels for each factor

Project Analysis : Two-way ANOVA Assign levels for each factor to cells Include fold-change cutoff if desired Select effect to filter on first (you can switch later)

Two-way ANOVA : Strain Effects

Biological Significance Gene Annotation Sources UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters. Gene titles are assigned to the clusters and these titles are commonly used by researchers to refer to that particular gene. LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive information, including function, about genes. Gene Ontologies – The Gene Ontology™ Consortium provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products, that can be used by databases such as Entrez Gene. KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory and metabolic pathways for genes. Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference sequences for both the mRNA and protein products of included genes. GeneSifter maintains its own copies of these databases and updates them automatically.

One-Click Gene Summary

Two-way ANOVA : Strain Effects

Ontology Report

Ontology Report : z-score R = total number of genes meeting selection criteria N = total number of genes measured r = number of genes meeting selection criteria with the specified GO term n = total number of genes measured with the specific GO term Reference: Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7

Z-score Report

KEGG Report

Two-way ANOVA : Strain Effects

Strain effects - Visualization Visualization of 517 genes (strain effect p < 0.001)

Segregation of expression patterns using k-medoids clustering Strain effects - Partitioning

Silhouette widths are used to find “best” number of clusters kmean sil. width Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25. Strain effects - Partitioning

Strain : Cluster 1

Strain : Cluster 2

Two-way ANOVA : Time Effects

Time : Cluster 1

Time : Cluster 2

Two-way ANOVA : Interaction

Interaction : Cluster 3

Interaction : Cluster 2

Two-way ANOVA : Summary Immune response to hookworms in mouse lung 12 groups (3 biological replicates) 2 factors (Strain and Time) ~39,000 genes56 genes Z-scores Pattern selection – Hierachical clustering, PAM (Interaction) Two-way ANOVA Interaction Strain Time 517 genes 1054 genes Biological process Transcription (4) Circadian Rhythm (3) Biological process Immune response (8) Chitin catabolism (4)

Strain effects, time effects and interaction

GeneSifter Workflow Examples 2 groups (apoE -/- aorta vs. wt aorta) 5 groups, single factor (Drosophila Innate Immune Response Time Series) 12 groups, two factors (Immune response to hookworms in mouse lung) t-test BH (FDR) Up regulated Down regulated Gene Lists One-way ANOVA BH (FDR) Clustering Gene Lists Two-way ANOVA BH (FDR) Clustering Gene Lists Individual genes of interest Biological themes (Pathways, molecular functions, etc.)

Resources Monthly Webinar Series 8/10/06 - Microarray analysis of gene expression in Huntington's Disease peripheral blood - a platform comparison Archived - Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice Archived - Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung Archived - The microarray data analysis process - from raw data to biological significance Archived - Microarray analysis of gene expression in androgen-independent prostate cancer Archived - Microarray analysis of gene expression in male germ cell tumors

Eric Olson Thank You Trial account, tutorials, sample data and Data Center