Group testing: global tests Ulrich Mansmann Department of Medical Biometrics and Informatics University of Heidelberg.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Statistics for AP Biology. Understanding Trends in Data Mean: The average or middle of the data Range: The spread of the data Standard deviation: Variation.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Statistical Analysis of Microarray Data
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
11 Comparison of Two Means Tests involving two samples – comparing variances, F distribution TOH - x A = x B ? Step 1 - F-test  s A 2 = s B 2 ? Step.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Working with enriched gene sets in R Peter Svensson Micheline Giphart-Gassler Harry Vrieling.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Apostolos Zaravinos and Constantinos C Deltas Molecular Medicine Research Center and Laboratory of Molecular and Medical Genetics, Department of Biological.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Warm-up Ch.11 Inference for Linear Regression Day 2 1. Which of the following are true statements? I. By the Law of Large Numbers, the mean of a random.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Identification of co-expression networks by comparison of a multitude of different functional states of genome activity Marc Bonin 1, Stephan Flemming.
The Broad Institute of MIT and Harvard Differential Analysis.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
CGH Data BIOS Chromosome Re-arrangements.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
LOGO Integrative Gene Set Analysis and Visualization in Genome-Wide Gene Expression Profiling Chen-An Tsai Department of Agronomy, Biometry Division National.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Differential Gene Expression
Part Three. Data Analysis
Significance analysis of microarrays (SAM)
Impact of Formal Methods in Biology and Medicine
Impact of Formal Methods in Biology and Medicine
Correlation of log-transformed signal intensity from two Affymetrix microarray hybridizations using platelet RNA. Plotted are those probesets with an average.
Immunity-related genes and enhancers are enriched for eQTLs.
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Patient neural progenitors show a neurodevelopmental phenotype.
Project Work Problem formulation.
Individual genes exhibit distinct stage-specific expression among female gametocytes. Individual genes exhibit distinct stage-specific expression among.
Individual genes exhibit distinct female-enriched expression patterns.
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Increased CCL5 expression, infiltration of Treg cells, and apoptosis of CD8+ T cells in human colorectal tissues. Increased CCL5 expression, infiltration.
M-Wnt and E-Wnt cells cluster tightly with claudin-low and basal-like breast tumors, respectively, by microarray analysis. M-Wnt and E-Wnt cells cluster.
Transcriptional and genomic targets of EN1 in TNBC cells.
EN1 expression in breast cancer and clinical outcome.
Gene expression profiles of T cells.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Transcriptome profiling of PD-L1 antibody–treated macrophages showed inflammatory phenotype, increased survival and proliferation, and decreased apoptosis.
Presentation transcript:

Group testing: global tests Ulrich Mansmann Department of Medical Biometrics and Informatics University of Heidelberg

Group testing2 Overview Gene set enrichment Lamb J et al. (2003) A mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer, Cell, 114: Global test: Goeman JJ. Et al. (2003) A global test for groups of genes: Testing association with a clinical outcome, Bioinformatics, 20:93-99 Bioconductor package: globaltest Example: Differential gene expression between UICC stages II / III colon cancer patients (Groene / Mansmann).

Group testing3 Two questions about group of genes Question 1: Two groups of genes have to be compared with respect to gene expression: Is the gene expression in gene group A different from the expression in gene group B. Question 2: Is there differential gene expression between different biological entities not in terms of single genes but with respect to a defined group of genes. Genes of group AGenes of group B Entity IEntity II Well defined group of genes

Group testing4 Example: Colon Cancer Study: 18 patients with UICC II colon cancer, 18 patients with UICC III colon cancer, HG-U133A, probesets representing ~ genes. Snap-frozen material, laser microdisection. Question 1: Is the differential gene expression between UICC II /III patients more distinct for genes in cancer related pathways compared to genes in other pathways? Question 2: Is there differential gene expression in the p53 signalling pathway?

Group testing5 Gene set enrichment Problem: Two groups of genes have to be compared with respect to gene expression: Is the gene expression in gene group A different from the expression in gene group B. Basic idea: n A genes in group A, n B genes in group B Order the genes with respect to the expression value. If there is a difference between both groups, the expression values will be separated. The position of a value in group A will have the tendency to be high or low. In case of no difference, the values will be nicely mixed.

Group testing6 Gene set enrichment Basic idea: n A genes in group A, n B genes in group B. Order the genes with respect to expression values. Create a vector vv of (n A +n B ) components with value – n B at each position where a value from group A is sitting and with value n A at each position where a value from group B is sitting. Calculate yy = cumsum(vv). Draw a line starting at (0,0) through points (i, yy[i]). The line will end in (n A +n B,0) because (-n B )  n A + n A  n B =0. Look at M vv = max{|min(yy)|,max(yy)} which will be large in case of a good separation between both groups. Permute the vector vv to get vv*, calculate yy* and M vv*. Use permutation to calculate the distribution of M vv under the Null hypothesis, determine the permutation based p-value: p perm = #{M vv*  M vv }/ # permutations.

Group testing7 Gene set enrichment Simple Example Gene e xpression in group A: {2, 3}, n A = 2 Gene e xpression in group B: {1, 4, 6}, n B = 3 Order the genes with respect to expression values. {1, 2, 3, 4, 6} vv = {-2, 3, 3, -2, -2} yy = {-2, 1, 4, 2, 0} M vv = 4 Distribution of M vv under the Null hypothesis 2 ~ 0.1; 3 ~ 0.3, 4 ~ 0.4, 6 ~ permutations p perm = #{M vv*  M vv }/ # permutations = = 0.6

Group testing8 Gene set enrichment – Colon cancer 1407 probe sets are studied which belong to 9 cancer specific pathways. androgen_receptor_signalling122 apoptosis 245 cell_cycle_control51 notch_delta_signalling50 p53_signalling45 ras_signalling316 tgf_beta_signalling100 tight_junction_signalling425 wnt_signalling 214

Group testing9 Gene set enrichment – Colon cancer group.A group.B M yy p.value androgen_receptor_signaling Apoptosis cell_cycle_control notch_delta_signalling p53_signalling ras_signalling tgf_beta_signaling tight_junction_signaling wnt_signaling

Group testing10 Goeman’s Global Test Test if global expression pattern of a group of genes is significantly related to some outcome of interest (groups, continuous phenotype). If this relationship exists, then the knowledge of gene expression helps to improve the prediction of the phenotype of interest. If the prediction can not improved by knowing the gene expression then there will not be differential gene expression. Test statistic: Q~ (Y-µ)’R (Y-µ) ~  [X i ’(Y-µ)]²sum over genes of the pathway ~   R ij (Y i -µ) (Y j -µ) sum over subjects µ: Mean of phenotype, X mi Expression for gene m in subject i R : X’X: IxI matrix of correlation between gene expression of subjects

Group testing11 Goeman’s Global Test - Example Test for differential gene expression in p53 signalling pathway 45 probesets Global Test result: 45 out of 45 genes used; 36 samples p value = based on permutations Test statistic Q = with expectation EQ = and standard deviation sdQ = under the null hypothesis Informative plots: Sample plot: how good fits a sample to its phenotype Checkerboard: Correlation between samples Gene plot: Influence of single genes to test statistics

Group testing12 Goeman’s Global Test - Example

Group testing13 Goeman’s Global Test - Example