EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Slides:



Advertisements
Similar presentations
Control Case Common Always active
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Putting genetic interactions in context through a global modular decomposition Jamal.
Gene Set Enrichment Analysis (GSEA)
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Automatic Identification of ROIs (Regions of interest) in fMRI data.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.
FISHING FOR NOVEL CAROTENOID BIOSYNTHESIS RELATED GENES OrenTzfadia>_.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Microarray Data Preprocessing and Clustering Analysis
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Analysis of High-throughput Gene Expression Profiling
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Tom Hait Dorit Sagir Eyal David Roded.
EXPression ANalyzer and DisplayER
David Amar, Tom Hait, and Ron Shamir
EXPression ANalyzer and DisplayER
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Predicting Gene Expression from Sequence
Presentation transcript:

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh Ron Shamir Ron Shamir ’ s Computational Genomics Group

Schedule  Data, preprocessing, grouping (10:15-11:00)  Hands-on part I (11:00-11:20)  Group analysis (11:20-11:45)  Hands-on part II (11:45-12:05)  Coffee Break (12:05 – 12:20)  Spike (12:20-12:50)  Hands-on part III (12:50-13:10)  Lunch Break (13:10 – 14:10)  Amadeus (14:10 – 14:40)  Hands-on part IV (14:40 – 15:00)

EXPANDER EXPANDER – an integrative package for analysis of gene expression data Built-in support for 15 organisms: human, mouse, rat, chicken, fly, zebrafish, C.elegans, yeast (s. cereviciae and s. pombe), arabidopsis, tomato, listeria, E. coli, aspargillus* and rice. Demonstration - on oligonucleotide array data, which contains expression profiles measured in several time points after serum stimulation of human cell line.

What can it do?  Low level analysis:  Missing data estimation (KNN or manual)  Data adjustments (merge conditions, divide by base, take log)  Normalization  Probes & condition filtering  High level analysis  Detecting patterns/groups in the data (supervised clustering, differential expression, clustering, bi- clustering, network based grouping).  Ascribing biological meaning to patterns (searching for enrichment within groups).

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping KEGG pathway enrichment

EXPANDER – Data Input data: ­ Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data, in which data is log R/G (e.g. cDNA microarrays) ‘.cel’ files ­ ID conversion file: maps probes to genes ID conversion file ­ Gene groups data: defines gene groups Gene groups data

EXPANDER – Data (II)  Data definitions:  Defining condition subsets  Data type & scale (log)  Data Adjustments:  Missing value estimation (KNN or arbitrary)  Reorder conditions  Merging conditions  Merging probes by gene IDs  Divide by base  Log data (base 2)

EXPANDER – Preprocessing  Normalization: removal of systematic biases from the analyzed chips quantilelowess  Implemented methods: quantile, lowess  Visualization: box plots, scatter plots (simple, M vs. A)box plots  Filtering: Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests: T-test, SAM ( Significance Analysis of Microarrays)  It is possible to define “VIP genes”.  Standardization : Mean=0, STD=1 (visualization) Standardization

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping KEGG pathway enrichment

Cluster Analysis partition the responding genes into distinct groups, each with a particular expression pattern  co-expression → co-function  co-expression → co-regulation Partition the genes to achieve:  High Homogeneity within clusters  High Separation between clusters

Cluster Analysis (II) Implemented algorithms:  CLICK, K-means, SOM, Hierarchical Visualization:  Mean expression patternsMean expression patterns  Heat-mapsHeat-maps  Chromosomal positions Chromosomal positions  Network sub-graph  PCA  Clustered matrix

Biclustering  Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions  Novel algorithmic approach is needed: Biclustering Clustering seeks global partition according to similarity across ALL conditions >> becomes too restrictive on large datasets.

* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering II SAMBA = Statistical Algorithmic Method for Bicluster Analysis * A. Tanay, R. Sharan, R. Shamir RECOMB 02

Drawbacks/ limitations: Usefull only for over 20 conditions Parameters How to asses the quality of Bi- clusters

Biclustering Visualization

Network based grouping Goal: to identify modules using gene expression data and interaction networks. GE data + Interactions file (.sif). MATISSE (Module Analysis via Topology of Interactions and Similarity SEts). I. Ulitsky and R. Shamir. BMC Systems Biology (2007)

Motivation Detect functional modules: groups of  interacting proteins  co-expressed genes Integrative analysis - can identify weaker signals Identifies a group of genes as well as the connections between them

Front vs Back nodes Only variant genes (front nodes) have meaningful similarity values These can be linked by non regulated genes (back nodes). Back nodes correspond to:  Post-translational regulation  Partially regulated pathways  Unmeasured transcripts

Advantages of MATISSE Works even when only a fraction of the genes expression patterns are informative No need to prespecify the number of modules

Network based clustering visualization Similar to clustering visualization (gene list, mean patterns, heat maps, etc.). Interactions map

Supervised Grouping Differential expression: t-test, SAM (Significance Analysis of Microarrays) Similarity group (correlation to one probe/gene) Rule based grouping (define a pattern)

Hands-on part I (1-3)

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping KEGG pathway enrichment

Ascribe functional meaning to gene groups Gene Ontology Gene Ontology (GO) a database of functional annotations. We extract files for all supported organisms. TANGO TANGO: Apply statistical tests that seek over-represented GO functional categories in the groups. TANGO

Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems:  High redundancy  Multiple testing corrections assume independent tests TANGO

Functional Enrichment - Visualization

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping KEGG pathway enrichment

Inferring regulatory mechanisms from gene expression data Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Computational identification of cis-regulatory elements that are over-represented in promoters of the co-expressed gene PRIMA - PRomoter Integration in Microarray Analysis * Elkon, et. Al, Genome Research (2003)

PRIMA – general description Input:  Target set (e.g., co-expressed genes)  Background set (e.g., all genes on the chip) Analysis:  Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From bp to 200 bp relative the TSS

Promoter Analysis - Visualization

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering)

miRNA Enrichment Analysis Goal: to predict micorRNAs (miRNAs) regulation by detecting miRNAs whose binding sites are over/under represented in the 3' UTRs of gene groups. FAME = Functional Assignment of MiRNAs via Enrichment

FAME miRNA targets Genes sharing a common function Significance of overlap Rather than using a simple hyper- geometric test, FAME also considers: The uneven distribution of 3’ UTR lengths Confidence values assigned to individual miRNA target sites (context scores)

FAME TargetScan predictions of miRNA targets, weighted by context scores ` Target gene set Individual miRNA miRNA Targets Degree preserving random permutations Expected weight Actual weight P-value Sum of edge weights between miRNA and the target set The same method can also be used for a group of miRNA Accounts for the distribution of 3’ UTR lengths Used to rank miRNA-target set pairs

Implementation in Expander Usage very similar to that of TANGO and PRIMA Currently uses TargetScan5 predictions Main parameters: –Number of random iterations (random graphs created) –Enrichment direction: over- or under- representation –Multiple testing correction –The use of the context score weights is optional

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering)

Location analysis Goal: Detect genes that are located in the same area and are co-expressed. Search for over represented chromosomal areas within gene groups. Statistical test. Redundancy filter Ignoring known gene clusters

Location analysis visualization Enrichment analysis visualization Positions view with color assignments

KEGG pathway analysis Searches for KEGG pathways that are over-represented in gene groups (I.e. in a target set with respect to a background set). Uses hyper geometric test. Multiple testing correction (Bonferroni) Enrichment results visualization (same as other group analysis results).

Custom enrichment analysis Loads an annotation file supplied by the user (provides genes with custom annotations). Searches for annotations (features) that are over-represented in gene groups (I.e. in a target set with respect to a background set). Uses hyper geometric test. Multiple testing correction (Bonferroni) Enrichment results visualization (same as other group analysis results).

Hands-on part II (3-11)

SPIKE…

Expression Data – Input File probes conditions

ID Conversion File

Gene Groups File

Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

Standardization of Expression Levels After standardization Before standardization

Cluster Analysis: Visualization (I)

BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)

Positions visualization