EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh Ron Shamir Ron Shamir ’ s Computational Genomics Group

Schedule  Data, preprocessing, grouping (10:15-11:00)  Hands-on part I (11:00-11:30)  Coffee Break (11:30 – 11:45)  Group analysis (11:45-12:10)  Spike (12:10-12:30)  Hands-on part III (12:30-13:00)

EXPANDER EXPANDER – an integrative package for analysis of gene expression data Built-in support for 16 organisms: human, mouse, rat, chicken, fly, zebrafish, C.elegans, yeast (s. cereviciae and s. pombe), arabidopsis, tomato, listeria, leishmania, E. coli, aspargillus* and rice. Demonstration - on oligonucleotide array data, which contains expression profiles measured in several time points after serum stimulation of human cell line.

What can it do?  Low level analysis:  Missing data estimation (KNN or manual)  Data adjustments (merge conditions, divide by base, take log)  Normalization  Probes & condition filtering  High level analysis  Detecting patterns/groups in the data (supervised clustering, differential expression, clustering, biclustering, network based grouping).  Ascribing biological meaning to patterns (searching for enrichment within groups).

Input data Functional enrichment Preprocessing Promoter signals Visualization utilities Location enrichment miRNA Targets enrichment Links to public annotation databases Grouping KEGG pathway enrichment

EXPANDER – Data Input data: Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data, in which data is log R/G (e.g. cDNA microarrays) ‘.cel’ files ID conversion file: maps probes to genes ID conversion file Gene groups data: defines gene groups Gene groups data

EXPANDER – Data (II)  Data definitions:  Defining condition subsets  Data type & scale (log)  Define genes of interest  Data Adjustments:  Missing value estimation (KNN or arbitrary)  Flooring  Condition reordering  Merging conditions  Merging probes by gene IDs  Divide by base  Log data (base 2)

EXPANDER – Preprocessing  Normalization: removal of systematic biases from the analyzed chips  Quantile =  Quantile = a technique for making two distributions identical in statistical properties  Lowess  Lowess (locally weighted scatter plot smoothing) = a non linear regression to a base array  Visualizations to inspect normalization:  box plots box plots  Scatter plots (simple and M vs. A) M=log 2 (A1/A2) A = 0.5*log 2 (A1*A2)

EXPANDER – Preprocessing  Filtering: Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests: T-test, SAM ( Significance Analysis of Microarrays)  It is possible to define “VIP genes”.  Standardization : Mean=0, STD=1 (visualization) Standardization

Cluster Analysis partition the responding genes into distinct groups, each with a particular expression pattern  co-expression → co-function  co-expression → co-regulation Partition the genes attempts to maximize:  Homogeneity within clusters  Separation between clusters

Cluster Analysis (II) Implemented algorithms:  CLICK, K-means, SOM, Hierarchical Visualization:  Mean expression patternsMean expression patterns  Heat-mapsHeat-maps  Chromosomal positions Chromosomal positions  Network sub-graph  PCA  Clustered heat map

Biclustering  Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions  Novel algorithmic approach is needed: Biclustering Clustering seeks global partition according to similarity across ALL conditions >> becomes too restrictive on large datasets.

* Bicluster (=module) : subset of genes with similar behavior under a subset of conditions Computationally challenging: has to consider many combinations Biclustering methods in EXPANDER: ISA (Iterative Signature Algorithm) - Ihmels et.al Nat Genet 2002 SAMBA = Statistical Algorithmic Method for Bicluster Analysis ( A. Tanay, R. Sharan, R. Shamir RECOMB 02) Biclustering II

Drawbacks/ limitations: Useful only for over 20 conditions Parameters How to asses the quality of Bi- clusters

Biclustering Visualization

Network based grouping Goal: to identify modules using gene expression data and interaction networks. GE data + Interactions file (.sif). MATISSE (Module Analysis via Topology of Interactions and Similarity SEts). I. Ulitsky and R. Shamir. BMC Systems Biology (2007)

Motivation Detect functional modules: groups of  interacting proteins  co-expressed genes Integrative analysis - can identify weaker signals Identifies a group of genes as well as the connections between them

Front vs Back nodes Only variant genes (front nodes) have meaningful similarity values These can be linked by not regulated genes (back nodes). Back nodes correspond to:  Post-translational regulation  Partially regulated pathways  Unmeasured transcripts

Advantages of MATISSE Works even when only a fraction of the genes expression patterns are informative No need to prespecify the number of modules

Network based clustering visualization Similar to clustering visualization (gene list, mean patterns, heat maps, etc.). Interactions map

Supervised Grouping Differential expression: t-test, SAM (Significance Analysis of Microarrays) Similarity group (correlation to a selected probe/gene) Rule based grouping (define a pattern)

Hands-on part I (1-3)

Ascribe functional meaning to gene groups Gene Ontology Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, arabidopsis, tomato, rice, zebra-fish, yeast (sce and pombe), e.coli, listeria, leishmania and aspergillus. TANGO TANGO: Apply statistical tests that seek over-represented GO functional categories in the groups. TANGO

Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems:  High redundancy  Multiple testing corrections assume independent tests TANGO

Functional Enrichment - Visualization

Inferring regulatory mechanisms from gene expression data Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Computational identification of cis-regulatory elements that are over-represented in promoters of the co-expressed gene PRIMA - PRomoter Integration in Microarray Analysis * Elkon, et. Al, Genome Research (2003)

PRIMA – general description Input:  Target set (e.g., co-expressed genes)  Background set (e.g., all genes on the chip) Analysis:  Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative the TSS

Promoter Analysis - Visualization Frequency ratio

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering)

miRNA Enrichment Analysis Goal: to predict micorRNAs (miRNAs) regulation by detecting miRNAs whose binding sites are over/under represented in the 3' UTRs of gene groups. FAME = Functional Assignment of MiRNAs via Enrichment

FAME miRNA targets Genes sharing a common function Significance of overlap The hyper-geometric test is usually used for this task, but it does not address: The uneven distribution of 3’ UTR lengths Confidence values assigned to individual miRNA target sites (context scores)

FAME TargetScan predictions of miRNA targets, weighted by context scores ` Target gene set Individual miRNA miRNA Targets Degree preserving random permutations Expected weight Actual weight P-value Sum of edge weights between miRNA and the target set The same method can also be used for a group of miRNA Accounts for the distribution of 3’ UTR lengths Used to rank miRNA-target set pairs

Implementation in Expander Usage very similar to that of TANGO and PRIMA Currently uses TargetScan5 predictions Main parameters: –Number of random iterations (random graphs created) –Enrichment direction: over- or under- representation –Multiple testing correction –The use of the context score weights is optional

Input data Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Visualization utilities Location enrichment miRNA Targets enrichment (FAME) Links to public annotation databases Grouping (Clustering/ Biclustering/ Network based clustering)

Location analysis  Goal: Detect genes that are located in the same area and are co-expressed.  Search for over represented chromosomal areas within gene groups.  Statistical test  Redundancy filter  Ignoring known gene clusters

Location analysis visualization  Enrichment analysis visualization  Positions view with color assignments

KEGG pathway analysis  Searches for KEGG pathways that are over-represented in gene groups (I.e. in a target set with respect to a background set)  Uses hyper geometric test  Multiple testing correction (Bonferroni)  Enrichment results visualization (same as other group analysis results).

Custom enrichment analysis  Loads an annotation file supplied by the user (provides genes with custom annotations).  Searches for annotations (features) that are over-represented in gene groups (I.e. in a target set with respect to a background set).  Uses hyper geometric test.  Multiple testing correction (Bonferroni)  Enrichment results visualization (same as other group analysis results).

Analysis wizard  Allows performing a full analysis at a push of a button  Incorporates most of the tools availble in EXPANDER  All parameters are set in advance  Standard default values are provided  After performing analysis, all corresponding visualizations are automatically added

Hands-on part II (4-13)

SPIKE…

Expression Data – Input File probes conditions

ID Conversion File

Gene Groups File

Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

Standardization of Expression Levels After standardization Before standardization

Cluster Analysis: Visualization (I)

BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)

Positions visualization

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Similar presentations

Presentation on theme: "EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Similar presentations

Presentation on theme: "EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel."— Presentation transcript:

Similar presentations

About project

Feedback