EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Tom Hait Dorit Sagir Eyal David Roded.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Putting genetic interactions in context through a global modular decomposition Jamal.
Gene Set Enrichment Analysis (GSEA)
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
Automatic Identification of ROIs (Regions of interest) in fMRI data.
APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Microarray Data Preprocessing and Clustering Analysis
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Fuzzy K means.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
GO enrichment and GOrilla
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
EXPression ANalyzer and DisplayER
David Amar, Tom Hait, and Ron Shamir
Networks and Interactions
Exploring and Presenting Results
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Two études on modularity
Genesets and Enrichment
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Schedule for the Afternoon
Anastasia Baryshnikova  Cell Systems 
Working with RNA-Seq Data
EXPression ANalyzer and DisplayER
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Predicting Gene Expression from Sequence
Cancer Cell Line Encyclopedia
Presentation transcript:

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Tom Hait Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh Ron Shamir Ron Shamir ’ s Computational Genomics Group Rani Elkon’s Group

Schedule Data, preprocessing, grouping (10:15-10:45) Data, preprocessing, grouping (10:15-10:45) Hands-on part I (10:45-11:00) Hands-on part I (10:45-11:00) Coffee Break (11:00 – 11:15) Coffee Break (11:00 – 11:15) Grouping analysis (11:15-11:40) Grouping analysis (11:15-11:40) Hands-on part II (11:40-13:00) Hands-on part II (11:40-13:00) Coffee Break (13:00 – 13:15) Coffee Break (13:00 – 13:15) Enrichment analysis (13:15-13:40) Enrichment analysis (13:15-13:40) Hands-on part III (11:40-13:00) Hands-on part III (11:40-13:00) Coffee Break (13:00 – 13:10) Coffee Break (13:00 – 13:10) Expander new features – GSEA/ChIP-Seq/RNA-Seq (13:10- 13:30) Expander new features – GSEA/ChIP-Seq/RNA-Seq (13:10- 13:30) Hands-on part IV (13:30-14:00) Hands-on part IV (13:30-14:00)

EXPANDEREXPANDER – an integrative package for analysis of gene expression and NGS data Built-in support for 18 organisms: human, mouse, rat, chicken, fly, zebrafish, C.elegans, yeast (s. cereviciae and s. pombe), arabidopsis, tomato, listeria, leishmania, E. coli (two strains), aspargillus, rice. And v.vinifera (grape) Demonstration on human CAL51 cell line experiments*: RNA-Seq data, which contains expression profiles measured in several time points after IR-induction. P53 ChIP-Seq data after 2 hours of IR-induction. *Data from Rashi-Elkeles, Warnatz and Elkon et al, 2014, Science Signaling, DOI: /scisignal

EXPANDER status 691 citations since citations since ,544 downloads since since 2015

What can it do?  Low level analysis Data adjustments (missing values, merging, divide by base, log) Normalization Probes & condition filtering  High level analysis Group detection (supervised clustering, differential expression, clustering, bi-clustering, network based grouping). Ascribing biological meaning to patterns via enrichment analysis

Input data Functional enrichment Preprocessing Promoter analyses Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping pathway enrichment Gene sets enrichment analysis

EXPANDER – Data  Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data, in which data is log R/G (e.g. cDNA microarrays) ‘.cel’ files RNA-Seq counts OR absolute/relative intensities data  ChIP-Seq data: in BED or GFF3 formats ChIP-Seq data:  ID conversion file: maps probes to genes ID conversion file  Gene groups data: defines gene groups Gene groups data  Gene ranks data: defines gene ranking for GSEA Gene ranks data  Network information (e.g. PPI network) -.sif format

First steps with the data – load, define, preprocess  Load dialog box, “Data menu”, “Preprocessing menu”  Data definitions  Defining condition subsets  Data type & scale (log)  Define genes of interest  Data Adjustments  Missing value estimation (KNN or arbitrary)  Flooring  Condition reordering  Merging conditions  Merging probes by gene IDs  Assigning genes to ChIP-Seq peaks  Divide by base  Log data (base 2)

Data preprocessing  Normalization= removal of systematic biases – Quantile – Quantile = equalizes distributions – Lowess – Lowess (locally weighted scatter plot smoothing) = a non linear regression to a base array  Visualizations to inspect normalization: – box plots box plots – Scatter plots (simple and M vs. A) M=log 2 (A1/A2) A = 0.5*log 2 (A1*A2)

 Probe filtering Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests: T-test, SAM ( Significance Analysis of Microarrays)  It is possible to define “VIP genes”.  Standardization : Mean=0, STD=1 (visualization) Standardization  Condition filtering  Order of operations Data preprocessing

Hands-on (1-2)

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping pathway enrichment Gene sets enrichment analysis

Supervised Grouping  Differential expression: a) Under normality assumption: t-test, SAM b) No normality assumption (RNA Seq data): Wilcoxon rank sum test, Negative binomial (edgeR/DESeq2)  Similarity group (correlation to a selected probe/gene)  Rule based grouping (define a pattern)

Unsupervised grouping - cluster Analysis Partition into distinct groups, each with a particular expression pattern  co-expression → co-function  co-expression → co-regulation Partition the genes attempts to maximize:  Homogeneity within clusters  Separation between clusters

Cluster Analysis within Expander Implemented algorithms: – CLICK, K-means, SOM, Hierarchical Visualization: – Mean expression patternsMean expression patterns – Heat-mapsHeat-maps – Chromosomal positions Chromosomal positions – Network sub-graph (Cytoscape integration) – PCA – Clustered heat map

 Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions  Novel algorithmic approach is needed: Biclustering Clustering seeks global partition according to similarity across ALL conditions >> becomes too restrictive on large datasets. Biclustering

* Bicluster = subset of genes with similar behavior under a subset of conditions Computationally challenging: has to consider many combinations Biclustering methods in EXPANDER: ISA (Iterative Signature Algorithm) - Ihmels et.al Nat Genet 2002 SAMBA = Statistical Algorithmic Method for Bicluster Analysis ( A. Tanay, R. Sharan, R. Shamir RECOMB 02) Biclustering II

Drawbacks/ limitations: Useful only for over 20 conditions Parameters How to asses the quality of Bi-clusters

Biclustering Visualization

Network based grouping Goal: to identify modules using gene expression data and interaction networks GE data + Interactions file (.sif) MATISSE (Module Analysis via Topology of Interactions and Similarity SEts) I. Ulitsky and R. Shamir. BMC Systems Biology 2007) DEGAS (DysrEgulated Gene set Analysis via Subnetworks ) I. Ulitsky et. Al. Plos One 2010

Motivation Detect functional modules, i.e. groups of – interacting proteins – co-expressed genes Integrative analysis - can identify weaker signals Identifies a group of genes as well as the connections between them

Front vs Back nodes Only variant genes (front nodes) have meaningful similarity values These can be linked by non regulated genes (back nodes). Back nodes correspond to: – Post-translational regulation – Partially regulated pathways – Unmeasured transcripts

Advantages of MATISSE Works even when only a fraction of the genes expression patterns are informative No need to prespecify the number of modules

DEGAS Supervised network based clustering Identifies connected gene sub-networks enriched for genes that are dysregulated in specimens of a disease Receives as input expression profiles of the disease patients and of controls and a global network Identifies sub-networks that can provide a signature of the disease

Network based clustering visualization Similar to clustering visualization (gene list, mean patterns, heat maps, etc.). Interactions map – Cytoscape interface

Deciding how to cluster Interest in specific gene or pattern Large number of conditions + Overlap expectation (Bi-clustering) Network of interest (Matisse/Degas) Control vs. case profile? (Degas) Hypothesis re. number of expected clusters Hierarchical clustering as a way to evaluate how to cluster

Hands-on (3-5)

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping pathway enrichment Gene sets enrichment analysis

Functional enrichment analysis - Ascribing functional meaning to gene groups Gene Ontology Gene Ontology (GO) annotations for all supported organisms TANGO TANGO: Apply statistical tests that seek over- represented GO functional categories in the groups TANGO

Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems: – High redundancy – Multiple testing corrections assume independent tests TANGO

Functional Enrichment - Visualization Can be saves as a tabular.txt file

Pathway analysis Searches for biological pathways that are over- represented in gene groups KEGG: Kyoto Encyclopedia of Genes and Genomes (mainly metabolic),all 18 orgs WikiPathways – various biological pathways(~20 species, 1765 pathways) – open resource Statistical hyper-geometric (HG) cumulative distribution score + multiple testing correction

Pathway enrichment visualization Same as functional enrichment visualization, plus pathway map in browser

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA/ AMADEUS) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering) Gene sets enrichment analysis

miRNA Enrichment Analysis Goal: to predict micorRNAs (miRNAs) regulation by detecting miRNAs whose binding sites are over/under represented in the 3' UTRs of gene groups. FAME = Functional Assignment of MiRNAs via Enrichment

FAME miRNA targets Genes sharing a common function Significance of overlap Addresses the uneven distribution of 3’ UTR lengths Considers confidence values assigned to individual miRNA target sites (context scores)

Input data Functional enrichment Preprocessing Promoter Signals (PRIMA/ AMADEUS) Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping pathway enrichment Gene sets enrichment analysis

Inferring regulatory mechanisms from gene expression data Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Computational identification of cis-regulatory elements over-represention PRIMA - PRomoter Integration in Microarray Analysis (Elkon, et. Al, Genome Research, 2003) AMADEUS – novel motif enrichment analysis

PRIMA – general description Input: – Target set (e.g. co-expressed genes) – Background set (e.g. all genes on the chip) Analysis: Detects TFs with high target set prevalence TF binding site models – TRANSFAC DB Default: From bp to 200 bp relative the TSS

Promoter Analysis - Visualization Frequency ratio

Amadeus A Motif Algorithm for Detecting Enrichment in multiple Species Supports diverse motif discovery tasks: 1.Finding over-represented motifs in one or more given sets of genes. 2.Identifying motifs with global spatial features given only the genomic sequences. Possible Gene-sets: 1.Identified gene sets clusters vs. all genes promoters. 2.ChIP-Seq peaks sequences using Expander built-in FASTA sequences generation.

Binding sites distribution TF binding site map AMADEUS on ChIP-Seq peaks

FAME TargetScan predictions of miRNA targets, weighted by context scores ` Target gene set Individual miRNA miRNA Targets Degree preserving random permutations Expected weight Actual weight P-value Sum of edge weights between miRNA and the target set The same method can also be used for a group of miRNA Accounts for the distribution of 3’ UTR lengths Used to rank miRNA-target set pairs

Implementation in Expander Usage very similar to that of TANGO and PRIMA Currently uses TargetScan5 predictions Main parameters: – Number of random iterations (random graphs created) – Enrichment direction: over- or under- representation – Multiple testing correction – The use of the context score weights is optional

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA/ AMADEUS) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering) Gene sets enrichment analysis

Location analysis Goal: Detect genes that are located in the same area and are co-expressed (e.g. chromosomal abberations) Search for over represented chromosomal areas within gene groups Statistical test Redundancy filter Ignoring known gene clusters

Location analysis visualization Enrichment analysis visualization Positions view with color assignments

Hands-on (6-8)

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA/ AMADEUS) Visualization utilities miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering) Gene sets enrichment analysis Location enrichment ChIP-Seq

Goal: Determine whether an a priori defined set of genes shows concordance with a biological pattern (e.g. differences between two phenotypes) Gene set sources: MSigDB (Broad molecular signature database) KEGG Wiki pathways Gene rank sources Phenotype labels Imported Selected condition Significance estimated with permutations FDR correction for multiple comparisons Gene Sets Enrichment analysis

Gene Sets Enrichment visualization

ChIP-Seq enrichment analysis Searches for over-representation of genes closest to ChIP-Seq data peaks Uses hyper geometric test Multiple testing correction (Bonferroni) Enrichment results visualization (same as other group analysis results)

ChIP-Seq visualization Peaks to genomic region distributions Closest gene to peak chromosome visualization Peaks enrichment in genomic regions Peaks annotation table including closest gene and genomic region (e.g., 5UTR, Exon etc)

Custom enrichment analysis Loads an annotation file supplied by the user Searches for annotations over-representation Uses hyper geometric test Multiple testing correction (Bonferroni) Enrichment results visualization (same as other group analysis results)

RNA-Seq counts data Perform differential expression between two groups of conditions RNA-Seq edgeR/DESeq2

Analysis wizard Allows performing multiple analysis steps at a push of a button Incorporates most of the tools available in EXPANDER All parameters are set in advance Standard default values are provided After performing analysis, all corresponding visualizations are automatically generated

GE data (Microarray/RNA- Seq) ChIP-Seq peaks data Integration between different technologies ChIP-Seq vs. GE analysis: GSEA – ChIP-Seq target genes as a single set ChIP-Seq enrichment of GE’s clusters ChIP-Seq target genes distribution in GE Expander enrichment tools (e.g., TANGO, PRIMA) – select ChIP-Seq target genes as a single cluster ?

GSEA – ChIP-Seq vs. GE

Rank distribution – ChIP-Seq vs. GE

Hands-on (New features)

Gene Sets Enrichment visualization

Expression Data – Input File probes conditions

ID Conversion File

Gene Groups File

Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

Standardization of Expression Levels After standardization Before standardization

Cluster Analysis: Visualization (I)

BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)

Positions visualization

Gene Set Enrichment Analysis (GSEA) Ranks genes based on correlation between their expression and the differential expression between two classes Given an a priori set of genes S, GSEA determines whether members of S are randomly distributed throughout the ranked gene list (L) or primarily found at the top or bottom Supports sets of genes from Wikipathways and KEGG