Download presentation
Presentation is loading. Please wait.
Published byMelinda Eaton Modified over 8 years ago
1
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
2
Three Main Components in GSEA Algorithm Software implementation (Broad Institute) Database of gene sets: o Molecular signature database (MSigDB at the Broad Institute) containing collections of gene sets of interest o Utilities mapping chip features to genes (e.g., Illumina or Affymetrix probe set IDs to HUGO gene symbols)
3
Start with Gene List ranked by t- statistics (L) (e.g. Tumor vs. Normal) ES<0 ES>0 bands are locations of S genes in L running sum L ALG3 CKAP4 CPLX2 CXCL1 DAD1 DNER ECH1 EZH2 GNAI2 GNAS HNRPA3 HNRPUL1 HSPCB IER3 MAPK8 METAP2 MRPS22 MYC MYCN NFKB1 PSMD2 PTTG1 RXRA RXRB SLC16A9 SNRPF STAT1 TFAP2A TMSB4X TP53 TUBA1 TUBA2 TUBA3D TUBB UBE1 Gene Set (S) (e.g. Metastasis) GSEA: Compares Gene List with a number of Gene Sets
4
ES(S) value of maximum deviation from 0 of the running sum Enrichment Score (ES) Calculation = sum of fold changes for genes in gene set (S) (e.g., 100) N = no. of genes in the array (e.g., 1020) N H = no. of genes in the gene set (S) (e.g., 20) Hits: Genes (L) S+|FC| / Misses: Genes (L) S -1/(N-N H ) Contribution to running sum for ES Hits +|FC| / Misses -1/(N-N H ) Running sum for ES ……… … Start with ranked list (L) of genes that are in (Hit) or not in (Miss) a gene set (S), using fold change (FC) as example metric Hit+0.15 +0.15 0.15 Hit+0.12+0.12 0.27 Miss -0.001 -0.0010.269 Hit +0.09 +0.090.359 Hit +0.08+0.080.439 Miss-0.001-0.0010.438 Ranked List (L) 15 12 10 9 8 6 FC running sum L
5
A positive ES gene set (Genelist is comparison between p53 mutant and WT) Zero crossing of ranking metric values ES(S) running enrichment score + - locations of genes in S p53 WT p53 MUT
6
Zero crossing of ranking metric values ES(S) running enrichment score + - locations of genes in S p53 WT p53 MUT A negative ES gene set (Genelist is comparison between p53 mutant and WT)
7
2 Ways of Testing the Significance of ES 1. Phenotype permutation: randomly shuffle phenotype T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 N1N1 N2N2 N3N3 N4N4 N5N5 N6N6 N7N7 : 1000 x Histogram of 1000 ES(S, ) Scores ES(S, 1 ) ES(S, 2 ) ES(S, 3 ) : ES(S, 1000 ) ES(S) N7N7 T5T5 N3N3 T2T2 N6N6 N1N1 T4T4 N5N5 T1T1 N4N4 T7T7 T3T3 T6T6 N2N2 The empirical, nominal p-value for each ES(S) is then calculated relative to the null distribution for ES(S): p = fraction of ES(S, ) values ≥ ES(S) T5T5 N6N6 T3T3 N2N2 T6T6 T1T1 N4N4 N5N5 N1N1 T4T4 N7N7 N3N3 T7T7 T2T2 N3N3 T6T6 N7N7 N1N1 N5N5 T3T3 T7T7 T5T5 N6N6 T1T1 N4N4 T2T2 N2N2 T4T4 Need >= 7 samples/phenotype
8
T1T2T3T4N1N2N3N4 Histogram of 1000 ES(S, ) Scores ES(S, 1 ) ES(S, 2 ) ES(S, 3 ) : ES(S, 1000 ) ES(S) The empirical, nominal p-value for each ES(S) is then calculated relative to the null distribution for ES(S): p = fraction of ES(S, ) values ≥ ES(S) ACBGDXMPQ KYWLFHG IP CUKTVZWRS 2. Gene set permutation: randomly select genes for gene set 2 Ways of Testing the Significance of ES When n <7 samples/phenotype
9
How normalized enrichment scores (NES) are calculated from ES (using the NES helps normalize out effect of different gene set sizes) mean {ES(S, ) values with the same sign as ES(S, k )} ES(S, k ) NES(S, k ) For each permutation and gene set S, compute NES(S, ) to use in computing the FDR: ES(S, ) ES(S, 1 ) ES(S, 3 ) ES(S, 2 ) Histogram of NES(S, ) Scores NES(S, ) NES* NES(S, ) ≥ NES* FDR q-value (<0.05)
10
MSigDB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.