BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Charlie Whittaker – BIG meeting 12/3/14
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
Tables Data collected during an experiment should be recorded in a Table The First column of the table contains the manipulated variable The second column.
RNA-seq analysis case study Anne de Jong 2015
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
Differentially expressed genes
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.
Scaffold Download free viewer:
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Gene Set Enrichment Analysis (GSEA)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Quantitative Skills 1: Graphing
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Gene Expression Data Analysis Lab Session CAD course Jian Li
Course on Functional Analysis
Scenario 6 Distinguishing different types of leukemia to target treatment.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
Microarray Data Analysis The Bioinformatics side of the bench.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.
Date of download: 6/18/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Association of BRCA1 and BRCA2 Mutations With Survival,
Copyright © Cengage Learning. All rights reserved. 10 Inferences about Differences.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
Descriptive Statistics
Volume 44, Issue 1, Pages (January 2016)
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
Genesets and Enrichment
Volume 137, Issue 4, Pages (October 2009)
Volume 20, Issue 13, Pages (September 2017)
Ruth B. McCole, Jelena Erceg, Wren Saylor, Chao-ting Wu  Cell Reports 
Volume 6, Issue 5, Pages e5 (May 2018)
Volume 14, Issue 1, Pages (July 2011)
Volume 9, Issue 3, Pages (September 2017)
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Volume 3, Issue 1, Pages (July 2016)
Reward Circuitry Activation by Noxious Thermal Stimuli
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Miquel Duran-Frigola, Patrick Aloy  Chemistry & Biology 
Optimal gene expression analysis by microarrays
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Volume 42, Issue 2, Pages (April 2004)
Volume 29, Issue 5, Pages (May 2016)
Supervised Calibration Relies on the Multisensory Percept
Volume 1, Issue 1, Pages (July 2015)
Extended analysis of differential expression datasets.
The CREBBP-modulated network is enriched in signaling pathways upregulated in the light zone (LZ). The CREBBP-modulated network is enriched in signaling.
Distinct subtypes of CAFs are detected in human PDAC
The Molecular Signatures Database Hallmark Gene Set Collection
Characteristic gene expression patterns distinguish LCH cells from other immune cells present in LCH lesions. Characteristic gene expression patterns distinguish.
Presentation transcript:

BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014

Stearman Assessment 1. Genome annotation used was mm9 (from 7/2007). There’s more recent annotation mm10 (12/2011). Was the chr19.fa sequence file derived from mm9 or mm10? 2.It would be nice to show a mapping table that included: Chr19 reads100% Reads mapped to exonsX% Reads not mapping to exonsY% One report I think had the reads mapped to exons numbers but didn’t do anything with it. Of course, what does RNA-seq reads not mapping to exons mean? 3.Four methods used to get overlap gene list. No one had a final table that summarized the values and ranges of fold-change and adjusted p-values. (Some of the FC values were inverse others so this needs to be consistent). No one considered a 3 out of 4 overlap list. No one had a heatmap of the summarized expression values. 4.No one had a supervised cluster based on the overlap genes. 5.Not everyone had R session recorded for reproducible research. 6.Most had a final heatmap of the TOP 15 genes from just the edgeglm method rather than the 14 gene overlap. 7.Several people didn’t get the message about the spliceAlignment argument needed to be on to maximize the reads mapped (~60% vs 90%). 8.If QC reports run and graphs shown but not much in way of interpretation. 9.Some of the heatmap coloring schemes were hard to read and non-standard. 10.Not everyone included a workflow chart to show the analysis path.

Tzu Assessment PDF can not be “knitr” Try to give more description of what you observe on plot result …

SocialismSuperStar GSEA

Problems with the SuperStar approach Case 1: No significant genes; because the relevant biological differences are modest relative to the noise inherent to the microarray technology Case 2: Too many significant genes; difficult to interpret and ad hoc approach depends on biologist’s area of expertise Case 3: Single-gene analysis may miss important effects on pathways which normally comprised of sets of genes acting in concert Case 4: Gene lists produced from different labs seldom shown concordances.

Gene Set Enrichment Analysis (GSEA) Considers an a priori defined GeneSet (e.g., members of a metabolic pathway), and determines where these members are significantly over-represented or enriched at the top (or bottom) of a list of markers ranked by the degree of correlation with a specific phenotype or class distinction

The rows represent the samples or chips, and the columns represent the genes Samples Genes

 Genes on the left side are highly expressed on the top half (indicated by red color) and lowly expressed on the bottom half (indicated by blue color). The reverse is shown on the right-most genes  Created a gradient or ranked list corresponding to the degree of correlation with the two phenotypes Diseased Normal Highly expressed in diseased Lowly expressed in diseased

 This is depicted nicely by the graph on the bottom of the figure, where the positive ranks on the left represent the correlation to the Disease phenotype and the negative ranks on the right signify the correlation to the Normal phenotype  The graph also generates a rank gradient that represents the order of the most up-regulated genes for the Disease sample on the left-most, and the most up-regulated genes for the Normal samples on the right- most Diseased Normal

 Now, let’s hide the heatmap and replace the middle part of the figure with genes from a specific geneset, say genes from the Glycolysis pathway.  Each vertical blue bars represents a gene from the pathway, being mapped on the same location as the whole dataset  Again, genes that are located on the left side are highly expressed on the Disease samples, and the opposite is true for the right-most genes

 Now, we are ready to demonstrate the GSEA algorithm.  The walk down algorithm basically scans the ranked gene list L, and when a member of S is encountered, an Enrichment Score (ES) is registered. This is illustrated on the top part of the figure below; when the ES started to build upon encountering more genes from the GeneSet S.

 The more S genes is found, the higher the ES

 But, when no S genes were encountered for a long walk down, as indicated on the middle section of the middle plot, the ES will decrease accordingly. In other words, a high ES relies intimately with the clustering of S genes in close proximity. In this example, we would conclude that the S genes have high degree of correlation with the Disease phenotype since most of the ES was gained from the left portion of the plot

Advantages of GSEA Agnostic to the type of gene set and the source of annotation Operates on any ordered gene list Does not require the choice of a gene selection threshold or the explicit definition of a statistically significant marker set Uses distribution-free, non-parametric, permutation-based test procedures with increased statistical power Incorporates the permutation of phenotype labels thereby preserving the “biological” correlation structure of the markers Takes into account multiple hypotheses testing of multiple gene sets

References Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. & Mesirov, J. P. (2005) Proc. Natl. Acad. Sci. USA 102, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

GSEA Broad Institute (MIT)

GSEA download

BioC: gage package BIOS6660_Share/Week12_13_shRNAseq –Week12_13_shRNAseq_Day2.R –Gage.pdf –We will be using built in dataset. Direct Download

Now, a Demo

Mark’s data BIOS6660_Share/Week12_13_shRNAseq –cep701_AllshRNA_readCounts.txt –Jihye_shRNA_lib_ALL_new.txt

cep701_AllshRNA_readCounts.txt

Jihye_shRNA_lib_ALL_new.txt

Convert Symbol to Entrez