Course on Functional Analysis

Slides:



Advertisements
Similar presentations
Protein network analysis Network motifs Network clusters / modules Co-clustering networks & expression Network comparison (species, conditions) Integration.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Charlie Whittaker – BIG meeting 12/3/14
1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
CS324e - Elements of Graphics and Visualization Color Histograms.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Statistics Tools in GeneSpring The Center for Bioinformatics UNC at Chapel Hill Jianping Jin Ph.D. Bioinformatics Scientist Phone: (919)
Using Gene Ontology Models and Tests Mark Reimers, NCI.
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
Differentially expressed genes
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.
Protein network analysis Network motifs Network clusters / modules Co-clustering networks & expression Network comparison (species, conditions) Integration.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Differential Analysis & FDR Correction
Gene Set Enrichment Analysis (GSEA)
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
5-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 5 Discrete Probability Distributions.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
UBio Training Courses Micro-RNA web tools Gonzalo
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Statistical Testing with Genes Saurabh Sinha CS 466.
Statistics in IB Biology Error bars, standard deviation, t-test and more.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Environmental Modeling Basic Testing Methods - Statistics II.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
Volume 44, Issue 1, Pages (January 2016)
Genesets and Enrichment
Volume 11, Issue 2, Pages (August 2012)
Anastasia Baryshnikova  Cell Systems 
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Cancer Cell Line Encyclopedia
Extended analysis of differential expression datasets.
The CREBBP-modulated network is enriched in signaling pathways upregulated in the light zone (LZ). The CREBBP-modulated network is enriched in signaling.
Distinct subtypes of CAFs are detected in human PDAC
Characteristic gene expression patterns distinguish LCH cells from other immune cells present in LCH lesions. Characteristic gene expression patterns distinguish.
Presentation transcript:

Course on Functional Analysis ::: Gene Set Enrichment Analysis - GSEA - Madrid, Feb 16th, 2009. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

GSEA ::: Introduction. MIT Broad Institute Gene Set Enrichment Analysis - GSEA - GSEA MIT Broad Institute v 2.0 available since Jan 2007 v 2.0.1 available since Feb 16th 2007 Version 2.0 includes Biocarta, Broad Institute, GeneMAPP, KEGG annotations and more... Platforms: Affymetrix, Agilent, CodeLink, custom... (Subramanian et al. PNAS. 2005.)

::: Introduction. ::: How works GSEA? Gene Set Enrichment Analysis - GSEA - ::: How works GSEA? GSEA applies Kolmogorov-Smirnof test to find assymmetrical distributions for defined blocks of genes in datasets whole distribution. Is this particular Gene Set enriched in my experiment? Genes selected by researcher, Biocarta pathways, GeneMAPP sets, genes sharing cytoband, genes targeted by common miRNAs …up to you…

::: Introduction. ::: K-S test Dataset distribution Gene Set Enrichment Analysis - GSEA - ::: K-S test The Kolmogorov–Smirnov test is used to determine whether two underlying one-dimensional probability distributions differ, or whether an underlying probability distribution differs from a hypothesized distribution, in either case based on finite samples. The one-sample KS test compares the empirical distribution function with the cumulative distribution functionspecified by the null hypothesis. The main applications are testing goodness of fit with the normal and uniform distributions. The two-sample KS test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. Dataset distribution Gene set 2 distribution Gene set 1 distribution Number of genes Gene Expression Level

::: Introduction. ::: How works GSEA? Gene Set Enrichment Analysis - GSEA - ::: How works GSEA? ClassA ClassB FDR<0.05 ttest cut-off ...testing genes independently... Biological meaning?

- + ::: Introduction. ::: How works GSEA? ES/NES statistic Gene Set Enrichment Analysis - GSEA - ::: How works GSEA? Gene Set 1 Gene Set 2 Gene Set 3 - ClassA ClassB Gene set 3 enriched in Class B ttest cut-off ES/NES statistic Gene set 2 enriched in Class A +

::: Introduction. ES examples ::: Gene Set Enrichment Analysis - GSEA - ES examples :::

::: Introduction. The Enrichment Score ::: NES pval FDR Gene Set Enrichment Analysis - GSEA - The Enrichment Score ::: NES pval FDR Benjamini-Hochberg

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: GSEA software. http://www.broad.mit.edu/gsea/ Download ::: Gene Set Enrichment Analysis - GSEA - Download ::: http://www.broad.mit.edu/gsea/

::: GSEA software. Main Window ::: Gene Set Enrichment Analysis - GSEA - Main Window :::

::: GSEA software. !!! Loading data ::: Gene Set Enrichment Analysis - GSEA - Loading data ::: !!!

::: GSEA software. Running GSEA ::: Gene Set Enrichment Analysis - GSEA - Running GSEA :::

::: GSEA software. Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis :::

::: GSEA software. MSigDB ::: Chip to Chip Mapping ::: Gene Set Enrichment Analysis - GSEA - MSigDB ::: Chip to Chip Mapping :::

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: Data Formats. Gene Set Enrichment Analysis - GSEA -

::: Data Formats. Gene Set Enrichment Analysis - GSEA -

::: Data Formats. *.gct Expression datasets ::: Gene Set Enrichment Analysis - GSEA - Expression datasets ::: *.gct

::: Data Formats. *.res Expression datasets ::: Gene Set Enrichment Analysis - GSEA - Expression datasets ::: *.res

::: Data Formats. *.pcl Expression datasets ::: Gene Set Enrichment Analysis - GSEA - Expression datasets ::: *.pcl

::: Data Formats. *.txt Expression datasets ::: Gene Set Enrichment Analysis - GSEA - Expression datasets ::: *.txt

For categorical phenotypes (e.g. Tumor vs Control) ::: Data Formats. Gene Set Enrichment Analysis - GSEA - Phenotype datasets ::: *.cls For categorical phenotypes (e.g. Tumor vs Control)

For continuous phenotypes (e.g. Gene correlated to GeneSet) ::: Data Formats. Gene Set Enrichment Analysis - GSEA - Phenotype datasets ::: For continuous phenotypes (e.g. Gene correlated to GeneSet) Time serie (each 30 minutes) Peak profile wanted For continuous phenotypes (e.g. Gene vs Time Series)

::: Data Formats. *.gmx Gene Set Database ::: Gene Set Enrichment Analysis - GSEA - Gene Set Database ::: *.gmx

::: Data Formats. *.gmt Gene Set Database ::: Gene Set Enrichment Analysis - GSEA - Gene Set Database ::: *.gmt

::: Data Formats. *.chip *.grp Other formats::: Gene Set Enrichment Analysis - GSEA - Other formats::: *.chip *.grp

::: Data Formats. *.rnk Ranked list format ::: Gene Set Enrichment Analysis - GSEA - Ranked list format ::: *.rnk

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: Using GSEA. Loading data ::: Gene Set Enrichment Analysis - GSEA - Loading data ::: Los archivos se cargan automaticamente en la pantalla Run GSEA

::: Using GSEA. Loading data ::: Seleccionar archivo ya importados Gene Set Enrichment Analysis - GSEA - Loading data ::: Seleccionar archivo ya importados

::: Using GSEA. Running GSEA ::: Don´t forget help Option Gene Set Enrichment Analysis - GSEA - Running GSEA ::: Don´t forget help Option

::: Using GSEA. ::: MSigDB. gsea_home Gene Set Enrichment Analysis - GSEA - ::: MSigDB. gsea_home

::: Using GSEA. Running GSEA ::: Gene Set Enrichment Analysis - GSEA - Running GSEA ::: 1. Choose true (default) to have GSEA collapse each probe set in your expression dataset into a single gene vector, which is identified by its HUGO gene symbol. In this case, you are using HUGO gene symbols for the analysis. The gene sets that you use for the analysis must use HUGO gene symbols to identify the genes in the gene sets. 2. Choose false to use your expression dataset "as is." In this case, you are using the probe identifiers that are in your expression dataset for the analysis. The gene sets that you use for the analysis must also use these probe identifiers to identify the genes in the gene sets.

::: Using GSEA. Running GSEA ::: Phenotype Gene Sets (few samples) Gene Set Enrichment Analysis - GSEA - Running GSEA ::: Phenotype Gene Sets (few samples)

::: Using GSEA. Gene Set Enrichment Analysis - GSEA - Running GSEA :::

to the probe identifiers for a selected DNA chip. ::: Using GSEA. Gene Set Enrichment Analysis - GSEA - Chip2Chip mapping ::: Chip2Chip translates the gene identifiers in a gene sets from HUGO gene symbols to the probe identifiers for a selected DNA chip.

::: Using GSEA. Enrichment statistic ::: Gene Set Enrichment Analysis - GSEA - Enrichment statistic ::: To calculate the enrichment score, GSEA first walks down the ranked list of genes increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not. The enrichment score is the maximum deviation from zero encountered during that walk. This parameter affects the running-sum statistic used for the analysis.

::: Using GSEA. Ranking Metric ::: Signal2Noise tTest Cosine Euclidean Gene Set Enrichment Analysis - GSEA - Ranking Metric ::: Signal2Noise tTest Cosine Euclidean Manhatten Pearson (time series) Ratio of Classes Diff of Classes Log2_Ratio_of_Classes Categorical phenotypes Continuous phenotypes

::: Using GSEA. Ranking Metric ::: Gene Set Enrichment Analysis - GSEA - Ranking Metric :::

::: Using GSEA. Ranking Metric ::: Gene Set Enrichment Analysis - GSEA - Ranking Metric :::

::: Using GSEA. More parameters ::: real abs Gene Set Enrichment Analysis - GSEA - More parameters ::: real 8.2 8.1 8.0 … -7.5 -7.7 -7.9 abs 7.9 7.7 7.5 parameter to determine whether to sort the genes in descending (default) or ascending order.

::: Using GSEA. Launching Analysis ::: Gene Set Enrichment Analysis - GSEA - Launching Analysis :::

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

By default in gsea_home ::: GSEA output. Gene Set Enrichment Analysis - GSEA - Results Accession ::: By default in gsea_home C:\Documents and settings\username\gsea_home /Users/yourhome/gsea_home

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: GSEA results. Index.html ::: Gene Set Enrichment Analysis - GSEA - Heat map of the top 50 features for each phenotype and a plot showing the correlation between the ranked genes and the phenotypes. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).

::: GSEA results. Enrichment results in html ::: Gene Set Enrichment Analysis - GSEA - Enrichment results in html :::

::: GSEA results. Enrichment results in html ::: Gene Set Enrichment Analysis - GSEA - Enrichment results in html :::

How can I decide about my results? ::: GSEA results. Gene Set Enrichment Analysis - GSEA - Enrichment results in html ::: How can I decide about my results? FDR ≤ 0.25 NOM p-val ≤ 0.05

::: Contents. Introduction. GSEA Software Data Formats Using GSEA GSEA Output GSEA Results Leading Edge Analysis

::: GSEA results. Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis :::

::: GSEA results. Leading Edge Analysis ::: Set-to-Set HeatMap Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis ::: Set-to-Set HeatMap Histogram Gene in Subsets

::: GSEA results. Heat Map Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis ::: Heat Map The heat map shows the (clustered) genes in the leading edge subsets. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).

::: GSEA results. Set-to-Set Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis ::: Set-to-Set The graph uses color intensity to show the overlap between subsets: the darker the color, the greater the overlap between the subsets.. When you compare a leading edge subset to itself, its members completely overlap so the corresponding cell is dark green. When you compare two subsets that have no overlapping members, the corresponding cell is white.

::: GSEA results. Gene in Subsets Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis ::: Gene in Subsets The graph shows each gene and the number of subsets in which it appears.

::: GSEA results. Histogram Leading Edge Analysis ::: Gene Set Enrichment Analysis - GSEA - Leading Edge Analysis ::: Histogram The last plot is a histogram, where the Jacquard is the intersection divided by the union for a pair of leading edge subsets. Number of Occurrences is the number of leading edge subset pairs in a particular bin. In this example, most subset pairs have no overlap (Jacquard = 0).

::: GSEA & FatiScan. Gene Set Enrichment Analysis - GSEA - Detects significant functions with Gene Ontology InterPro motifs, Swissprot KW and KEGG pathways in lists of genes ordered according to differents characteristics.

::: GSEA & Whichgenes. http://www.whichgenes.org - Retrieve miRNAs targets for Gene Set Enrichment Analysis (miRBase, TargetScan) - Always updated ! Enter if you simply want to download gene sets. Login whether you want to download and store your gene sets

Try a preloaded example!!! ::: GSEA & Whichgenes. http://www.whichgenes.org Create Sets 1. Choose oraanism. -Human - Mouse Looking for examples ? 2. Select source: - miRBase, TScan - Other sources Try a preloaded example!!! 3. Copy and paste miRNAs identifiers. Create set per items. Retrieving targets 4. Job name.

::: GSEA & Whichgenes. http://www.whichgenes.org Gene Sets Cart 1. Choose gene sets for downloading. 2. Select output format. e.g. .CSV, .TSV, .gmt, .gmx 3. Select identifier. e.g. Agilent, Affy, Mgi… 4. DOWNLOAD GENE SETS !!!

T H A N K S ggomez@cnio.es